Skip to main content

Eagle-N NN Accelerator

This page provides an overview of the Eagle-N NPU architecture and its key hardware capabilities, as illustrated in the accompanying diagram. It highlights how the system integrates high-performance compute, memory, and interconnect components to deliver efficient AI processing across a wide range of workloads. From detailed hardware specifications to the TensixNeo compute cluster and Network-on-Chip (NoC) design, this section explains how Eagle-N achieves high throughput, scalability, and energy efficiency while supporting modern AI models and data formats.

Eagle-N features a neural network accelerator delivering 250 TOPS performance. It supports high-speed connectivity via PCIe Gen5 (32 GB/s) and UCIe (64 GB/s) for chiplet systems. The SoC is powered by a quad-core ARM Cortex-A53 CPU and LPDDR5/5X memory with 204 GB/s bandwidth. It includes safety features, debugging capabilities, and integrated I/O subsystems. Eagle-N supports INT8, BF16, and MX formats for CNNs, Transformers, and LLMs. Advanced quantization tools reduce model size with minimal (less than 1%) accuracy loss. Efficient use of limited on-chip memory enables streaming-based LLM execution from DRAM. Optimized hardware/software support ensures fast, scalable, and energy-efficient AI inference.

Eagle-N Technical Specification

Eagle-N A0 is a preliminary version of Eagle-N and not a production-qualified product. There are significant differences in performance and functionality compared to the production version of Eagle-N. The main features upgraded in the production product compared to A0 are as follows: WordTable: Comparison of Eagle-N A0 and Eagle N

Eagle-N A0Eagle-N B0Remark
NPU Performance64 TOPS250 TOPSEffective fivefold performance improvement
DRAM Bandwidth120 GB/s204 GB/s
DRAM Capacity64 GB96 GB
PCIe2x Gen5 4-lane 32 GT/s1x Gen5 8-lane 32 GT/s (4+4 bifurcation)
UCIeNoneStandard Package 2-module 16GT/s (total 32 pin)64 GB/s per direction

Application Scenarios

Eagle-N is designed as a high-performance AI accelerator for automotive and edge systems, enabling efficient deployment of advanced neural network workloads alongside a host application processor (AP).

System Architecture Overview

At a system level, Eagle-N operates as a dedicated AI compute device that works in conjunction with a host processor:

  • Functions as an independent auxiliary SoC with its own PMIC, NOR flash, and DRAM
  • Connects to the host AP via PCIe (Gen5), acting as an Endpoint (EP)
  • The host AP serves as the Root Complex (RC), managing system control and data flow
  • Offloads compute-intensive AI workloads (e.g., vision, language processing) from the host

This architecture allows the host system to focus on orchestration and I/O, while Eagle-N handles high-throughput neural network inference.


Achieving Scalable NPU Performance

  • Supports a wide performance range thanks to multi devices capability
  • Capable of running modern models including VLMs and LLMs (e.g., Qwen, Llama)

Multiple devices deployment architecture


Several Eagle-N can be used in parallel to increase TOPS performance and thus run very large models.

Deployment Scenarios

Eagle-N supports flexible deployment across multiple use cases:

ADAS / IVI AI Extension

Enhances existing systems by offloading AI workloads from the main processor.


AI Box

  • Acts as an external AI compute module
  • Enables advanced features such as:
    • Voice recognition
    • Vision processing
    • Intelligent services
    • Agentic AI

AI Box deployment architecture

3x4 TensixNeo Cluster Architecture

NPU Tensix Architecture Overview

  • TensixNeo Cluster
    A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers

  • Main Interconnect
    The top-level communication fabric that connects multiple chips, clusters, or systems together. Provides high-bandwidth, low-latency data exchange Typically implemented via Ethernet or high-speed links in Tenstorrent systems Enables scaling across multiple devices (multi-chip / multi-node)

  • NoC
    A communication network inside a chip that connects internal components (cores, memory, engines). Moves tensors, instructions, and activations Designed for parallel, high-throughput data movement

  • NoC2AXI
    A bridge between the on-chip network (NoC) and AXI-based interfaces. Converts NoC packets → AXI transactions Used to interface with: Memory controllers External I/O AXI = standard high-performance bus protocol

  • Fast Dispatch Engine (FDE)
    A hardware unit responsible for: Scheduling and dispatching workloads to Tensix cores Managing command streams and execution flow

  • TensixNeo Cluster
    A compute block consisting of multiple TensixNeo cores: Acts as a building unit of the NPU Handles a portion of the AI workload Connected to other clusters via NoC and routers