Origin E2
Optimal performance in a power-optimized package
The Origin™ E2 is designed for power-starved applications such as mobile phones and edge nodes. By eliminating the need for external DRAM access, the E2 deep learning accelerator (DLA) saves system power while increasing performance, reducing latency, and cutting system BOM costs. Its highly efficient engine uses less than 1W for 18 TOPS performance. Origin E2 is tunable for specific workloads to provide an optimal performance profile for unique application requirements.
Features
Specifications
- Performance efficient 18 TOPS/Watt
- Scalable performance from 2-9K INT8 MACS
- Capable of processing real-time HD video and images on-chip
- Advanced activation memory management
- Low latency
- Tunable for specific workloads
- Hardware scheduler for NN
- Support for standard NN functions including Convolution, Deconvolution, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, Bilinear
- Processes model as trained, no need for software optimizations
- Use familiar open-source platforms like TFlite
- Delivered as soft IP: portable to any process
Compute Capacity | 2.25K, 4.5K, or 9K INT8 MACs |
Power Efficiency | 18 Effective TOPS/W (INT8) |
Number of Jobs | Single |
NN Support | CNN, and other NN architectures |
Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, Bilinear etc. |
Data types | INT8/INT16 Activations INT8 Weights |
Quantization | Channel-wise Quantization (TFLite Specification) Optional custom Quantization based on workload needs |
Latency | Optimized for smallest Latency with Deterministic Guarantees. |
Memory | Smart On-chip Dynamic Memory Allocation Algorithms |
Frameworks | TensorFlow, TFlite, ONNX |
Workloads | Capable of processing 4K video & 8K images on-chip |
Advantages
Industry-leading performance and power efficiency (up to 18 TOPS/W)
Architected to serve wide range of compute requirements.
Drastically reduces memory requirements, no off-chip DRAM required.
Run trained models unchanged without the need for hardware dependent optimizations.
Deterministic, real-time performance.
Improved performance for your workloads, while still running breadth of models.
Simple software stack.
Achieve same accuracy your trained model.
Simplifies deployment to end customers.
Benefits
- Efficiency: industry-leading 18 TOPS/W enables greater processing efficiencies with lower power consumption
- Simplicity: eliminates complicated compilers, easing design complexity, reducing cost, and speeding time-to-market
- Configurability: independently configurable building blocks allow for design optimization– right sized deployments
- Predictability: deterministic, QoS
- Scalability: from 1 to 20 TOPS a single scalable architecture addresses a wide range of application performance requirements
- Deployability: best-in-market TOPS/mm2 assures ideal processing/chip size designs
Download our White Papers
Get in Touch With Us
STAY INFORMED
Subscribe
to our News
Sign up today and receive helpful
resources delivered directly
to your inbox.