Runtime

Discover the Tenstorrent based runtime operations and APIs compatible with the BOS Eagle-N Neural Network accelerator.

TT-NN
TT-Metalium
TT-llk

TT-NN

TT-NN is an open-source C++ and Python library of neural network operations, built on top of the TT-Metalium programming model. It provides a PyTorch-like interface for running machine learning workloads on Tenstorrent AI accelerators, and serves as the primary high-level API for developing and optimizing ML models on Tenstorrent hardware.

TT-NN APIs

Reference guide for all TT-NN neural network operations and APIs. Comprehensive documentation covering available operations, their parameters, execution semantics, and performance characteristics for building efficient AI models on Eagle-N hardware.

https://docs.tenstorrent.com/tt-metal/latest/ttnn/ttnn/api.html

Purpose

Enable ML frameworks targeting Tenstorrent hardware

Developers have access to a vast library of existing models implemented in PyTorch, JAX, TensorFlow, and other frameworks. TT-NN offers a collection of operations and reusable building blocks to support the development of ML compilers and framework backends targeting Tenstorrent hardware.

→ See framework integrations: Forge Compiler PyTorch 2.0 TT-NN Backend

Manual bringup and optimization of ML models

When performance is critical, developers need fine-grained control. Existing ML frameworks and compilers don’t fully expose the capabilities of our hardware. TT-NN lets developers work with familiar high-level operations while tapping into hardware-specific optimizations when necessary. This includes options to specify data format for mixed-precision, tensor layout and distribution, op fusion and other operation specific settings.

→ See production-ready model examples: Model Zoo
→ See some model bringup guides: General | LLMs | CNNs

Key Features

High-Level Neural Network Operations:
Optimized implementations of key neural net components: matrix multiplication, convolution, attention mechanisms, data movement, collective communications (CCLs), element-wise ops, reductions, losses, pooling, and more. APIs are PyTorch-style but expose hardware-specific options.
Tensor Library:
A flexible tensor abstraction for managing multidimensional arrays across host and device. Developers can precisely control data layout across a cluster of Tenstorrent accelerators via Tensor APIs.
Native Multi-Device Support:
TT-NN seamlessly virtualizes multiple Tenstorrent devices into a single logical unit, enabling effortless scaling across device clusters.

Getting Started

Assuming you completed the installation of the hardware and drivers, you can simply install TT-NN from PyPi

pip install ttnn

To check that installation was successful run a simple script

import ttnn
device = ttnn.open_device(device_id=0)
a = ttnn.full([5, 5, 5], fill_value=1.0, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)
b = ttnn.full([1], fill_value=2.0, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)
c = a * b
print(c)

There is a collection of tutorials written with Jupyter Notebooks to help you ramp up your skillset for using TT-NN. These notebooks can be found under ttnn/tutorials.

From within the ttnn/tutorials directory, launch the notebooks with:

jupyter lab --no-browser --port=8888

Tools and Instruments

TT-Triage

The TT-Triage tool diagnoses failures by performing comprehensive health checks on ARC processors, NOC connectivity, L1 memory, and RISC-V cores. It identifies running kernels and provides callstack data for troubleshooting.

TT-NN Visualizer

A comprehensive tool for visualizing and analyzing model execution, offering interactive graphs, memory plots, tensor details, buffer overviews, operation flow graphs, and multi-instance support with file or SSH-based report loading.

TT-Exalens

The TT-Exalens repository describes TT-Lensium, a low-level debugging tool for Tenstorrent hardware. It allows developers to access and communicate with Wormhole and Blackhole devices.

TT-SMI

The TT-SMI repository describes the Tenstorrent System Management Interface. This command line utility can interact with Tenstorrent devices on host. TT-SMI provides an easy to use interface displaying device, telemetry, and firmware information.

Model Explorer

The Model Explorer is an intuitive and hierarchical visualization tool using model graphs. It organizes model operations into nested layers and provides features for model exploration and debugging.

Tracy Profiler

The Tracy Profiler is a real-time nanosecond resolution, remote telemetry, hybrid frame, and sampling tool. Tracy supports profiling CPU, GPU, memory allocation, locks, context switches, and more.

Kernel Print Debug

DPRINT can print variables, addresses, and circular buffer data from kernels to the host terminal or log file. This feature is useful for debugging issues with kernels.

Watcher

Watcher monitors firmware and kernels for common programming errors, and overall device status. If an error or hang occurs, Watcher displays log data of that occurrence.

Inspector

Inspector provides insights into host runtime. It logs necessary data for investigation and allows queries to host runtime data.

TT-Metalium

TT-metalium is the core Python and C++ neural network operator library built on MLIR ny Tenstorrent. This section covers installation, featured model implementations, performance benchmarks, and links to the official repository and API documentation.

Runtime Host APIs

Please refer to the official TT-metalium Host APIs reference.

Reference:

https://docs.tenstorrent.com/tt-metal/latest/tt-metalium/tt_metal/apis/host_apis.html

Key topics covered there:

API entry points for host-side initialization and shutdown
Device discovery, selection, and configuration
Memory allocation, buffer management, and data movement
Command submission, synchronization, and runtime control

For full details and examples, use the official reference link above.

Runtime Kernel APIs

Detailed reference documentation for TT-metalium kernel APIs that enable low-level programming on Tensix cores. Includes API specifications, kernel development patterns, and execution models for creating custom operations on Eagle-N hardware.

https://docs.tenstorrent.com/tt-metal/latest/tt-metalium/tt_metal/apis/kernel_apis.html

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

Matrix Engine (updated Sept 6th, 2024)
Data Formats (updated Sept 7th, 2024)
Reconfiguring Data Formats (updated Oct 17th, 2024)
Handling special floating-point numbers (updated Oct 5th, 2024)
Allocator (Updated Dec 19th, 2024)
Tensor Layouts (updated Sept 6th, 2024)
Saturating DRAM Bandwidth (updated Sept 6th, 2024)
Flash Attention on Wormhole (updated Sept 6th, 2024)
CNNs on TT Architectures (updated Sept 6th, 2024)
Ethernet and Multichip Basics (Updated Sept 20th, 2024)
Blackhole Bring-Up Programming Guide (Updated Dec 18th, 2024)
Sub-Devices (Updated Jan 7th, 2025)

Scaleout Tech Reports

Programming Mesh of Devices (Scale-Up) (updated Jan 6th, 2026)
Programming Multiple Meshes (Scale-Out) (updated Jan 19th, 2026)
TT-Fabric Architecture (updated Dec 1st, 2025)
TT-Distributed Architecture (updated Oct 20th, 2025)

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Dram Loopback Data Movement

Eltwise

Matmul

Tools and Instruments

TT-NN Visualizer

TT-Exalens

The TT-Exalens repository describes TT-Lensium, a low-level debugging tool for Tenstorrent hardware. It allows developers to access and communicate with Wormhole and Blackhole devices.

TT-SMI

Model Explorer

The Model Explorer is an intuitive and hierarchical visualization tool using model graphs. It organizes model operations into nested layers and provides features for model exploration and debugging.

Tracy Profiler

The Tracy Profiler is a real-time nanosecond resolution, remote telemetry, hybrid frame, and sampling tool. Tracy supports profiling CPU, GPU, memory allocation, locks, context switches, and more.

Kernel Print Debug

DPRINT can print variables, addresses, and circular buffer data from kernels to the host terminal or log file. This feature is useful for debugging issues with kernels.

Watcher

Watcher monitors firmware and kernels for common programming errors, and overall device status. If an error or hang occurs, Watcher displays log data of that occurrence.

Inspector

Inspector provides insights into host runtime. It logs necessary data for investigation and allows queries to host runtime data.

Latest Releases

Release	Release Date	FW Version	KMD Version	SMI Version
0.66.0	ETA Jan 30, 2026	19.2.0	2.5.0	3.0.38
0.65.0	Dec 15, 2025	19.2.0	2.5.0	3.0.38
0.64.5	Dec 1, 2025	18.12.0	2.4.1	3.0.32
0.64.4	Nov 24, 2025	18.12.0	2.4.1	3.0.32
0.64.3	Nov 14, 2025	18.12.0	2.4.1	3.0.32
0.64.0	Oct 29, 2025	18.12.0	2.4.1	3.0.32
0.63.0	Sep 22, 2025	18.8.0	2.3.0	3.0.28
0.62.2	Aug 20, 2025	18.6.0	2.0.0	3.0.20
0.61.0	Skipped	-	-	-
0.60.1	Jul 22, 2025	18.6.0	2.0.0	3.0.20
0.59.0	Jun 18, 2025	-	-	-
0.58.0	May 13, 2025	-	-	-
0.57.0	Apr 15, 2025	-	-	-
0.56.0	Mar 7, 2025	-	-	-

Visit the releases folder for details on releases, release notes, and estimated release dates.

TT-LLk

TT-llk is Tenstorrent's low-level kernel library for AI chips such as Wormhole and Blackhole. It provides header-only compute primitives that serve as the foundation for higher-level ML software stacks. The repository also includes a test environment for validating LLK APIs and kernel behavior. This page summarizes installation, dependencies, supporting documentation, and contribution guidance.

Overview

This repository contains header-only low-level kernels (LLK) for Tenstorrent AI chips, including Wormhole, and Blackhole.

These kernels serve as foundational compute primitives, acting as building blocks for higher-level software stacks that implement machine learning (ML) operations.

Additionally, the repository includes a test environment designed to validate LLK APIs.

Install

Clone the repository

Clone this repository to your local computer.
Set up the test environment

Follow the instructions in the testing README to set up the test environment.

Software dependencies

Test environment requires SFPI compiler for building, which is automatically ingested from sfpi

Documentation

The following documentation is available to help you understand and use low-level kernels:

Intro A concise introduction to LLKs, designed for both technical and non-technical audiences. This document outlines the scope of the LLK software stack and its relationship to other Tenstorrent software components.
Top-level Overview Provides a high-level look at the Tensix Core and Tensix Engine architecture, including data organization for efficient LLK usage and operations supported by LLKs. This document is not tied to any specific chip generation (such as Wormhole) and is aimed at engineers and technical readers who want to understand the general architecture and capabilities.
LLK Programming Model This document dives into architectural details to best explain the usage of the LLK API. It is intended for op writers and advanced users, and connects LLK concepts with our runtime stack, tt-metal, providing practical guidance on how to leverage LLKs for efficient kernel development.

TT-NN​

TT-NN APIs​

Purpose​

Enable ML frameworks targeting Tenstorrent hardware​

Manual bringup and optimization of ML models​

Key Features​

Getting Started​

Tools and Instruments​

TT-Metalium​

Runtime Host APIs​

Runtime Kernel APIs​

Getting started​

TT-Metalium Tech Reports​

Scaleout Tech Reports​

TT-Metalium Programming Examples​

Hello World​

Add Integers​

Simple Tensor Manipulation​

DRAM Data Movement​

Eltwise​

Matmul​

Tools and Instruments​

Latest Releases​

TT-LLk​

Overview​

Install​

Software dependencies​

Documentation​

TT-NN

TT-NN APIs

Purpose

Enable ML frameworks targeting Tenstorrent hardware

Manual bringup and optimization of ML models

Key Features

Getting Started

Tools and Instruments

TT-Metalium

Runtime Host APIs

Runtime Kernel APIs

Getting started

TT-Metalium Tech Reports

Scaleout Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

Tools and Instruments

Latest Releases

TT-LLk

Overview

Install

Software dependencies

Documentation