DLA
- NVDLA based
- High-performance convolution core with 2048 MACs
- Support various image input formats
- Dedicated Depth-wise Convolution engine
- Acceleration engine for Activation functions
- Acceleration engine for Pooling
- Acceleration engine for advanced Normalization functions
- Memory-to-memory transformation acceleration for tensor reshape and copy operations
- 2MB local on-chip SRAM, shared by AXI slave port accessed by other BUS master