
Origin E8
A high-performance inference engine
for the most demanding environments
The Expedera Origin™ E8 family of Neural Processing Unit (NPU) AI Inference engines is designed for performance-intensive applications such as automotive/ADAS and data center applications. With performance ranging from 32 to 128 TOPS, the Origin E8 excels at image-related tasks such as computer vision, point cloud detection, image classification, and object detection.
Multi-job Support
In high-performance applications, OEMs increasingly require NPUs that can run multiple Neural Networks (NN) concurrently and efficiently. The Origin E8 family is designed to enable multi-job support for better utilization of hardware resources and reductions in system costs.
Support for a Wide Variety of Neural Networks
Artificial Intelligence is a fast-developing science, and new neural networks are released almost daily. Expedera’s Origin E8 NPU supports RNN, CNN, LSTM, and other neural networks. Native support is provided for Inception, MobileNet, YOLO v3, FSRCNN, EfficientNet, Unet, and many other neural networks with input resolutions up to 8K, including support for transformers. In addition, Origin includes support for custom and proprietary networks and offers provisions for future-proofing your design.
Native Execution: a New NPU Paradigm
Typical AI accelerators—often repurposed CPUs (Central Processing Units) or GPUs (Graphic Processing Units)—rely on a complex software stack that converts a neural network into a long sequence of basic instructions. Execution of these instructions tends to be inefficient, with low processor utilization ranging from 20 to 40%. Taking a new approach, Expedera designed Origin specifically as an NPU that efficiently executes the neural network directly using metadata and achieves sustained utilization averaging 80%. The metadata indicates the function of each layer (such as convolution or pooling) and other important details, such as the size and shape of the convolution. No changes to your trained neural networks are required, and there is no perceivable reduction in model accuracy. This approach greatly simplifies the software, and Expedera provides a robust stack based on Apache TVM. Expedera’s native execution eases the adoption of new models and reduces time to market.
Optimized for Your Specific Needs
While there are many general-purpose AI processors, a one-size-fits-all solution is rarely the most efficient. General-purpose AI processors are often larger than needed for a specific application and consume more power than necessary. The Origin E8 family of IP cores are optimized for a customer’s application-specific area and power requirements. Whether performance-optimized for a specific single or series of networks, sized to meet silicon area constraints or configured to meet system power requirements, the E8 will provide optimal PPA (power, performance, area). Expedera can typically achieve superior performance in about half the silicon area required by other NPUs. During the design process, Expedera works with clients to understand their specific application needs and constraints and provides cycle accurate PPA estimations before delivery of the IP.
Market-leading Power Efficiency
Understanding the comparative power efficiencies of NPUs can be complicated. Ours isn’t— Expedera’s Origin family averages a market-leading 18 TOPS/W, where we assume a TSMC 7nm process, running ResNet50 at an INT8 precision throughout with a 1GHz system clock. No sparsity, compression or pruning is applied, though all are supported and may further increase power efficiency. Origin has repeatedly been cited as the most power efficient NPU available.
Silicon-Proven and Deployed in Millions of Consumer Products
Choosing the right AI processor can ‘make or break’ a design. The Origin architecture is silicon-proven in leading-edge process nodes and successfully shipped in millions of consumer devices worldwide.
- 32 to 128 TOPS, performance efficiency up to 18 TOPS/Watt
- 16K to 64K INT8 MACs
- Support for 8 concurrent jobs
- Advanced activation memory management
- Low latency
- Predictable deterministic performance
- Compatible with various DNN models
- Hardware scheduler for NN
- Processes model as trained, no need for software optimizations
- Use familiar open-source platforms like TFlite
- Delivered as soft IP: portable to any process
Compute Capacity | 16K to 64K INT8 MACs |
Multi-tasking | Support for 8 concurrent jobs |
Power Efficiency | 18 Effective TOPS/W (INT8) |
NN Support | CNN, RNN, LSTM and other NN architectures |
Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax etc. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, etc. |
Data types | INT4/INT8/INT10/INT12/INT16 Activations/Weights FP16/BFloat16 Activations/Weights |
Quantization | Channel-wise Quantization (TFLite Specification) Optional custom Quantization based on workload needs |
Latency | Deterministic Performance Guarantees |
Memory | Advanced System Memory Allocation and Scheduling with Virtualization |
Frameworks | TensorFlow, TFlite, ONNX |
Workloads | Capable of running large DNN networks |
Advantages
Industry-leading performance and power efficiency.
Architected to support demanding workloads with maximum efficiency.
Reduces hardware requirement for multiple models.
Drastically reduces memory requirements.
Deterministic, real-time performance.
Flexible, future proof support.
Simple software stack.
Achieve same accuracy your trained model.
Simplifies deployment to end customers.
Benefits
- Efficiency: industry-leading 18 TOPS/W enables greater processing efficiencies with lower power consumption
- Simplicity: eliminates complicated compilers, easing design complexity, reducing cost, and speeding time-to-market
- Configurability: independently configurable building blocks allow for design optimization– right sized deployments
- Predictability: deterministic, QoS
- Scalability: from 16 to 128 TOPS a single scalable architecture addresses a wide range of application performance requirements
- Deployability: best-in-market TOPS/mm2 assures ideal processing/chip size designs

Download our White Papers

Get in Touch With Us
STAY INFORMED
Subscribe
to our News
Sign up today and receive helpful
resources delivered directly
to your inbox.