Origin Evolution for Mobile

Bringing LLMs to Smartphones

Smartphone makers are adding more AI to their products, including advanced LLM and VLM capabilities, as they enable a new set of applications, including personal assistants, language translation and learning, content generation, and advanced productivity tools. The desire for a better user experience, the lowest latency, and increased privacy is driving a reduction in reliance on the cloud for inference. However, as LLMs may be 20 to 50X larger than more traditional AI networks, there are significant memory and processor hurdles to overcome before these networks can be deployed fully on smartphones in a power-friendly manner.

Innovative Architecture

Origin Evolution uses Expedera’s unique packet-based architecture to achieve unprecedented NPU efficiency. Packets, which are contiguous fragments of neural networks, are an ideal way to overcome the hurdle of large memory movements and differing network layer sizes, which are exacerbated by LLMs. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different LLM and CNN networks. Origin Evolution includes a high-speed external memory streaming interface that is compatible with the latest memory standards.

Customizable

Highly Memory Efficient

Sustainable Performance

Easy to Deploy

LLM, CNN, and other Network Support

Choose the Features You Need

Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with smartphone customers to understand their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin Evolution to create a customized solution that perfectly fits the application.

Accepting standard, custom, and black box networks in a variety of AI representations, Origin Evolution offers a wealth of user features such as mixed precision quantization. Expedera’s unique packet-based processing reduces much larger networks into smaller, contiguous fragments, overcoming the hurdle of large memory movements and offering much higher processor utilization. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different types of networks. Internal memory handles intermediate needs, while the memory streaming interface interfaces with off-chip storage.

Features

Specifications

Compute Capacity	up to 32K FP16 MACs
Multi-tasking	Run Simultaneous Jobs
Power Efficiency	Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others, including proprietary/black box networks
Example Networks Supported	289 tokens per second, DeepSeek v3 prompt processing. 32 TFLOPS engine, 6MB internal memory, 128GB external peak bandwidth, batch size of 1, 5.67W total power consumption. Specified in TSMC 7nm, 1 GHz system clock, no sparsity/compression/pruning applied (though supported)
Layer Support	Standard NN functions, including Transformers, Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Support for custom operators.
Data types	FP16/FP32/INT4/INT8/INT10/INT12/INT16 Activations/Weights
Quantization	Software toolchain supports Expedera, customer-supplied, or third-party quantization. Mixed precision supported.
Latency	Deterministic performance guarantees, no back pressure
Frameworks	Hugging Face, Llama.cpp, PyTorch, TVM, ONNX. Tensor Flow and others supported

64 TFLOPS performance	Support for standard, custom, and proprietary neural networks
Readily customized to specific use cases and deployment needs	Full software stack provided, including compiler, estimator, scheduler, and quantizer
Runs LLM, CNN and other network types	Delivered as Soft IP (RTL) or GDS

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Search

Origin Evolution for Mobile

Mobile-Centric LLM and CNN AI Inference processing

Consumers are excited about the latest AI features in smartphones. As OEMs increasingly move inference processing to local devices, they seek solutions that can enhance computational capacity while effectively managing memory, power consumption, and privacy.

Perfect-Fit Solutions

Bringing LLMs to Smartphones

Innovative Architecture

Choose the Features You Need

Reducing Memory Bandwidth

Efficient Resource Utilization

Full Software Stack

Unique Packet Architecture

Ultra-Efficient Neural Network Processing

Download our White Papers

Get in Touch With Us

Origin Evolution for Mobile

Mobile-Centric LLM and CNN AI Inference processing

Consumers are excited about the latest AI features in smartphones. As OEMs increasingly move inference processing to local devices, they seek solutions that can enhance computational capacity while effectively managing memory, power consumption, and privacy.

Perfect-Fit Solutions

Bringing LLMs to Smartphones

Innovative Architecture

Choose the Features You Need

Unique Packet Architecture

Ultra-Efficient Neural Network Processing

Download our White Papers

Get in Touch With Us

STAY INFORMED

Subscribe to our News