FAQ
This page contains frequently asked questions about the BOS AI Models SDK, covering topics like supported frameworks, graph construction, memory management, debugging, and hardware architecture. Find answers to common questions about model development, compilation, and deployment on Eagle-N NPU hardware.
Model Definition
Which AI frameworks do you support (e.g., PyTorch, ONNX, TensorFlow)?
PyTorch, ONNX, TensorFlow.
Graph Construction
Do you support a hardware-agnostic Intermediate Representation (IR)?
TT-forge compiler has a hardware-agnostic IR layer.
Can the IR be exported or imported from other formats (e.g., MLIR, TVM)?
MLIR is supported through TT-mlir.
Memory Management
How can users configure or control memory layout (e.g., SRAM pinning, DMA)?
TT-metal software stack supports memory layout configuration, such as tiled/row-major, sharded/interleaved, DRAM/SRAM.
Low-Level Programming Interface
How can developers debug or trace execution at the kernel or instruction level?
- Tracy shows processing time details through the GUI interface.
- L1 visualizer shows memory occupancy during operation.
- Watcher assists with tracking data at specific addresses.
Memory Hierarchy (Transformer Execution)
Please describe the memory hierarchy (e.g., scratchpad, SRAM, L0–L2, DRAM): size, bandwidth, latency.
- Register
- 36MB L1
- 96GB device DRAM
- Host DRAM over PCIe
KV-Cache
How is the KV-cache implemented and managed? Where is it stored and how is it accessed per token?
Stored in DRAM.
Compute Units
For each compute unit type, what are the core specifications — supported data types, native bit widths, operating frequency, and number of instances?
1) Matrix / FPU
- Compute unit type: INT8 / Low-fidelity — 2k MACs/clk
- Supported data types: INT8, FP4, BF16, FP16, MXFP6, MXFP8, MXINT8, MXINT4, MXINT2
2) Vector / SFPU
- Compute unit type: 32 FP32 MACs per clk (per core)
- Supported data types: FP16, FP32
Number of instances: 48 (4 × 3 clusters, 4 cores per cluster)