FAQ

This page contains frequently asked questions about the BOS AI Models SDK, covering topics like supported frameworks, graph construction, memory management, debugging, and hardware architecture. Find answers to common questions about model development, compilation, and deployment on Eagle-N NPU hardware.

Model Definition

Which AI frameworks do you support (e.g., PyTorch, ONNX, TensorFlow)?
PyTorch, ONNX, TensorFlow.

Graph Construction

Do you support a hardware-agnostic Intermediate Representation (IR)?
TT-forge compiler has a hardware-agnostic IR layer.

Can the IR be exported or imported from other formats (e.g., MLIR, TVM)?
MLIR is supported through TT-mlir.

Memory Management

How can users configure or control memory layout (e.g., SRAM pinning, DMA)?
TT-metal software stack supports memory layout configuration, such as tiled/row-major, sharded/interleaved, DRAM/SRAM.

Low-Level Programming Interface

How can developers debug or trace execution at the kernel or instruction level?

Tracy shows processing time details through the GUI interface.
L1 visualizer shows memory occupancy during operation.
Watcher assists with tracking data at specific addresses.

Memory Hierarchy (Transformer Execution)

Please describe the memory hierarchy (e.g., scratchpad, SRAM, L0–L2, DRAM): size, bandwidth, latency.

Register
36MB L1
96GB device DRAM
Host DRAM over PCIe

KV-Cache

How is the KV-cache implemented and managed? Where is it stored and how is it accessed per token?
Stored in DRAM.

Compute Units

For each compute unit type, what are the core specifications — supported data types, native bit widths, operating frequency, and number of instances?

1) Matrix / FPU

Compute unit type: INT8 / Low-fidelity — 2k MACs/clk
Supported data types: INT8, FP4, BF16, FP16, MXFP6, MXFP8, MXINT8, MXINT4, MXINT2

2) Vector / SFPU

Compute unit type: 32 FP32 MACs per clk (per core)
Supported data types: FP16, FP32

Number of instances: 48 (4 × 3 clusters, 4 cores per cluster)

Model Definition​

Graph Construction​

Memory Management​

Low-Level Programming Interface​

Memory Hierarchy (Transformer Execution)​

KV-Cache​

Compute Units​

1) Matrix / FPU​

2) Vector / SFPU​