Eagle-N NN Accelerator
This page provides an overview of the Eagle-N NPU architecture and its key hardware capabilities, as illustrated in the accompanying diagram. It highlights how the system integrates high-performance compute, memory, and interconnect components to deliver efficient AI processing across a wide range of workloads. From detailed hardware specifications to the TensixNeo compute cluster and Network-on-Chip (NoC) design, this section explains how Eagle-N achieves high throughput, scalability, and energy efficiency while supporting modern AI models and data formats.
Eagle-N features a neural network accelerator delivering 250 TOPS performance. It supports high-speed connectivity via PCIe Gen5 (32 GB/s) and UCIe (64 GB/s) for chiplet systems. The SoC is powered by a quad-core ARM Cortex-A53 CPU and LPDDR5/5X memory with 204 GB/s bandwidth. It includes safety features, debugging capabilities, and integrated I/O subsystems. Eagle-N supports INT8, BF16, and MX formats for CNNs, Transformers, and LLMs. Advanced quantization tools reduce model size with minimal (less than 1%) accuracy loss. Efficient use of limited on-chip memory enables streaming-based LLM execution from DRAM. Optimized hardware/software support ensures fast, scalable, and energy-efficient AI inference.
Eagle-N Technical Specification
Eagle-N A0 is a preliminary version of Eagle-N and not a production-qualified product. There are significant differences in performance and functionality compared to the production version of Eagle-N. The main features upgraded in the production product compared to A0 are as follows: WordTable: Comparison of Eagle-N A0 and Eagle N
| Eagle-N A0 | Eagle-N B0 | Remark | |
|---|---|---|---|
| NPU Performance | 64 TOPS | 250 TOPS | Effective fivefold performance improvement |
| DRAM Bandwidth | 120 GB/s | 204 GB/s | |
| DRAM Capacity | 64 GB | 96 GB | |
| PCIe | 2x Gen5 4-lane 32 GT/s | 1x Gen5 8-lane 32 GT/s (4+4 bifurcation) | |
| UCIe | None | Standard Package 2-module 16GT/s (total 32 pin) | 64 GB/s per direction |
Application Scenarios
Eagle-N is designed as a high-performance AI accelerator for automotive and edge systems, enabling efficient deployment of advanced neural network workloads alongside a host application processor (AP).
System Architecture Overview
At a system level, Eagle-N operates as a dedicated AI compute device that works in conjunction with a host processor:
- Functions as an independent auxiliary SoC with its own PMIC, NOR flash, and DRAM
- Connects to the host AP via PCIe (Gen5), acting as an Endpoint (EP)
- The host AP serves as the Root Complex (RC), managing system control and data flow
- Offloads compute-intensive AI workloads (e.g., vision, language processing) from the host
This architecture allows the host system to focus on orchestration and I/O, while Eagle-N handles high-throughput neural network inference.
Achieving Scalable NPU Performance
- Supports a wide performance range thanks to multi devices capability
- Capable of running modern models including VLMs and LLMs (e.g., Qwen, Llama)
Several Eagle-N can be used in parallel to increase TOPS performance and thus run very large models.
Deployment Scenarios
Eagle-N supports flexible deployment across multiple use cases:
ADAS / IVI AI Extension
Enhances existing systems by offloading AI workloads from the main processor.
AI Box
- Acts as an external AI compute module
- Enables advanced features such as:
- Voice recognition
- Vision processing
- Intelligent services
- Agentic AI
3x4 TensixNeo Cluster Architecture
-
TensixNeo Cluster
A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers -
Main Interconnect
The top-level communication fabric that connects multiple chips, clusters, or systems together. Provides high-bandwidth, low-latency data exchange Typically implemented via Ethernet or high-speed links in Tenstorrent systems Enables scaling across multiple devices (multi-chip / multi-node) -
NoC
A communication network inside a chip that connects internal components (cores, memory, engines). Moves tensors, instructions, and activations Designed for parallel, high-throughput data movement -
NoC2AXI
A bridge between the on-chip network (NoC) and AXI-based interfaces. Converts NoC packets → AXI transactions Used to interface with: Memory controllers External I/O AXI = standard high-performance bus protocol -
Fast Dispatch Engine (FDE)
A hardware unit responsible for: Scheduling and dispatching workloads to Tensix cores Managing command streams and execution flow -
TensixNeo Cluster
A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers