Eagle-N NN Accelerator

This page provides an overview of the Eagle-N NPU architecture and its key hardware capabilities, as illustrated in the accompanying diagram. It highlights how the system integrates high-performance compute, memory, and interconnect components to deliver efficient AI processing across a wide range of workloads. From detailed hardware specifications to the TensixNeo compute cluster and Network-on-Chip (NoC) design, this section explains how Eagle-N achieves high throughput, scalability, and energy efficiency while supporting modern AI models and data formats.

Eagle-N features a neural network accelerator delivering 250 TOPS performance. It supports high-speed connectivity via PCIe Gen5 (32 GB/s) and UCIe (64 GB/s) for chiplet systems. The SoC is powered by a quad-core ARM Cortex-A53 CPU and LPDDR5/5X memory with 204 GB/s bandwidth. It includes safety features, debugging capabilities, and integrated I/O subsystems. Eagle-N supports INT8, BF16, and MX formats for CNNs, Transformers, and LLMs. Advanced quantization tools reduce model size with minimal (less than 1%) accuracy loss. Efficient use of limited on-chip memory enables streaming-based LLM execution from DRAM. Optimized hardware/software support ensures fast, scalable, and energy-efficient AI inference.

Eagle-N Technical Specification

Eagle-N A0 is a preliminary version of Eagle-N and not a production-qualified product. There are significant differences in performance and functionality compared to the production version of Eagle-N. The main features upgraded in the production product compared to A0 are as follows: WordTable: Comparison of Eagle-N A0 and Eagle N

	Eagle-N A0	Eagle-N B0	Remark
NPU Performance	64 TOPS	250 TOPS	Effective fivefold performance improvement
DRAM Bandwidth	120 GB/s	204 GB/s
DRAM Capacity	64 GB	96 GB
PCIe	2x Gen5 4-lane 32 GT/s	1x Gen5 8-lane 32 GT/s (4+4 bifurcation)
UCIe	None	Standard Package 2-module 16GT/s (total 32 pin)	64 GB/s per direction

Application Scenarios

Eagle-N is designed as a high-performance AI accelerator for automotive and edge systems, enabling efficient deployment of advanced neural network workloads alongside a host application processor (AP).

System Architecture Overview

At a system level, Eagle-N operates as a dedicated AI compute device that works in conjunction with a host processor:

Functions as an independent auxiliary SoC with its own PMIC, NOR flash, and DRAM
Connects to the host AP via PCIe (Gen5), acting as an Endpoint (EP)
The host AP serves as the Root Complex (RC), managing system control and data flow
Offloads compute-intensive AI workloads (e.g., vision, language processing) from the host

This architecture allows the host system to focus on orchestration and I/O, while Eagle-N handles high-throughput neural network inference.

Achieving Scalable NPU Performance

Supports a wide performance range thanks to multi devices capability
Capable of running modern models including VLMs and LLMs (e.g., Qwen, Llama)

Multiple devices deployment architecture

Several Eagle-N can be used in parallel to increase TOPS performance and thus run very large models.

Deployment Scenarios

Eagle-N supports flexible deployment across multiple use cases:

ADAS / IVI AI Extension

Enhances existing systems by offloading AI workloads from the main processor.

AI Box

Acts as an external AI compute module
Enables advanced features such as:
- Voice recognition
- Vision processing
- Intelligent services
- Agentic AI

AI Box deployment architecture

3x4 TensixNeo Cluster Architecture

NPU Tensix Architecture Overview

TensixNeo Cluster
A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers
Main Interconnect
The top-level communication fabric that connects multiple chips, clusters, or systems together. Provides high-bandwidth, low-latency data exchange Typically implemented via Ethernet or high-speed links in Tenstorrent systems Enables scaling across multiple devices (multi-chip / multi-node)
NoC
A communication network inside a chip that connects internal components (cores, memory, engines). Moves tensors, instructions, and activations Designed for parallel, high-throughput data movement
NoC2AXI
A bridge between the on-chip network (NoC) and AXI-based interfaces. Converts NoC packets → AXI transactions Used to interface with: Memory controllers External I/O AXI = standard high-performance bus protocol
Fast Dispatch Engine (FDE)
A hardware unit responsible for: Scheduling and dispatching workloads to Tensix cores Managing command streams and execution flow
TensixNeo Cluster
A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers

Eagle-N Technical Specification​

Application Scenarios​

System Architecture Overview​

Achieving Scalable NPU Performance​

Deployment Scenarios​

ADAS / IVI AI Extension​

AI Box​

3x4 TensixNeo Cluster Architecture​