DLA

  • NVDLA based
  • High-performance convolution core with 2048 MACs
  • Support various image input formats
  • Dedicated Depth-wise Convolution engine
  • Acceleration engine for Activation functions
  • Acceleration engine for Pooling
  • Acceleration engine for advanced Normalization functions
  • Memory-to-memory transformation acceleration for tensor reshape and copy operations
  • 2MB local on-chip SRAM, shared by AXI slave port accessed by other BUS master