Origin E6
A high-performance workhorse engine for everyday inference applications
The Origin™ E6 is designed for applications where performance and power consumption are primary design goals, including smartphones, tablets, and edge servers. Expedera’s advanced memory management ensures sustained DRAM bandwidth and optimal total system performance. Featuring from 16 to 32 TOPS performance with up to 80% real-world utilization (measured on-chip running common workloads such as ResNet), the Origin E6 deep learning accelerator (DLA) excels at image-related tasks like computer vision, image classification, and object detection. Additionally, it is capable of NLP (Natural Language Processing)-related tasks like machine translation, sentence classification, and generation.
- 16 – 32 TOPS performance, performance efficiency up to 18 TOPS/Watt
- Scalable performance to 18K MACS
- Capable of processing HD images on chip
- Advanced activation memory management
- Low latency
- Compatible with various DNN models
- Hardware scheduler for NN
- Processes model as trained, no need for software optimizations
- Use familiar open-source platforms like TFlite
- Delivered as soft IP: portable to any process
Compute Capacity | 4.5K, 9K, or 18K INT8 MACs |
Multi-tasking | Run up to 2 Simultaneous Jobs |
Power Efficiency | 18 Effective TOPS/W (INT8) |
NN Support | CNN, RNN, LSTM and other NN architectures |
Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax etc. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, etc. |
Data types | INT8/INT16 Activations/Weights Optional FP16/BFloat16 Activations/Weights |
Quantization | Channel-wise Quantization (TFLite Specification) Optional custom Quantization based on workload needs |
Latency | Deterministic Performance Guarantees |
Memory | Smart System Memory Allocation and Scheduling |
Frameworks | TensorFlow, TFlite, ONNX |
Workloads | Capable of running large DNN networks |
Advantages
Industry-leading performance and power efficiency.
Architected to serve wide range of compute requirements.
On Chip, L3 and DRAM work together to improve bandwidth.
Drastically reduces memory requirements.
Deterministic, real-time performance.
Flexible for changing applications.
Simple software stack.
Achieve same accuracy your trained model.
Simplifies deployment to end customers.
Benefits
- Efficiency: industry-leading 18 TOPS/W enables greater processing efficiencies with lower power consumption
- Simplicity: eliminates complicated compilers, easing design complexity, reducing cost, and speeding time-to-market
- Configurability: independently configurable building blocks allow for design optimization– right sized deployments
- Predictability: deterministic, QoS
- Scalability: from 16 to 32 TOPS a single scalable architecture addresses a wide range of application performance requirements
- Deployability: best-in-market TOPS/mm2 assures ideal processing/chip size designs
Download our White Papers
Get in Touch With Us
STAY INFORMED
Subscribe
to our News
Sign up today and receive helpful
resources delivered directly
to your inbox.