Origin Evolution for Edge
Edge-friendly LLM and CNN AI Inference processing
Edge devices are increasingly equipped with advanced AI processing capabilities that enhance functionality and improve the user experience. While many of these devices previously depended on cloud-based inference, manufacturers are now shifting towards on-device inference. This transition helps to lower latency, reduce overall power consumption, and minimize the need for cloud processing, thereby cutting costs.

Perfect-Fit Solutions
Origin Evolution™ for Edge offers out-of-the-box compatibility with today's most popular LLM and CNN networks. Attention-based processing optimization and advanced memory management ensure optimal AI performance across a variety of networks and representations. Featuring a hardware and software co-designed architecture, Origin Evolution for Edge scales to 32 TFLOPS in a single core to address the most advanced edge inference needs.
Bringing AI to the Edge
Always-sensing cameras continuously sample and analyze visual data to identify specific triggers relevant to the user experience. They enable a seamless, more natural user experience. However, always-sensing data requires specialized AI processing due to the quantity and complexity of data generated. OEMs are turning to specialized AI engines like Expedera’s LittleNPU. The LittleNPU is optimized to process the low-power, high-quality neural networks used by leading OEMs in always-sensing applications. It runs at low power—often as low as 10-20mW—and keeps all camera data securely within the LittleNPU subsystem to preserve user privacy.
Innovative Architecture
Origin Evolution uses Expedera’s unique packet-based architecture to achieve unprecedented NPU efficiency. Packets, which are contiguous fragments of neural networks, are an ideal way to overcome the hurdle of large memory movements and differing network layer sizes, which are exacerbated by LLMs. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different LLM and CNN networks. Origin Evolution includes a high-speed external memory streaming interface that is compatible with the latest memory standards.
Choose the Features You Need
Reducing Memory Bandwidth
Efficient Resource Utilization
Full Software Stack
Origin Evolution offers out-of-the-box support for 100+ popular neural networks, including Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others.
Unique Packet Architecture
Ultra-Efficient Neural Network Processing
Accepting standard, custom, and black box networks in a variety of AI representations, Origin Evolution offers a wealth of user features such as mixed precision quantization. Expedera’s unique packet-based processing reduces much larger networks into smaller, contiguous fragments, overcoming the hurdle of large memory movements and offering much higher processor utilization. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different types of networks. Internal memory handles intermediate needs, while the memory streaming interface interfaces with off-chip storage.
|
|
|
|
|
|
Compute Capacity | up to 16K FP16 MACs |
---|---|
Multi-tasking | Run Simultaneous Jobs |
Example Networks Supported | Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others, including proprietary/black box networks |
Example Performance | 80 tokens per second, Llama 3.1 1B (INT4 weights, INT16 Act), 1 TOPS engine, 2MB internal memory, 64GB external peak bandwidth. Specified in TSMC 7nm, 1 GHz system clock, no sparsity/compression/pruning applied (though supported) |
Layer Support | Standard NN functions, including Transformers, Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Support for custom operators. |
Data types | FP16/FP32/INT4/INT8/INT10/INT12/INT16 Activations/Weights |
Quantization | Software toolchain supports Expedera, customer-supplied, or third-party quantization. Mixed precision supported. |
Latency | Deterministic performance guarantees, no back pressure |
Frameworks | Hugging Face, Llama.cpp, PyTorch, TVM, ONNX. Tensor Flow and others supported |