TT-XLA Installation Manual
Tool: TT-XLA (Tenstorrent XLA)
Target hardware: BOS A0 Eagle / Blackhole card
Scope of this revision: verified BOS A0 source-build flow based on the installation logs from
inputs/logs and the local snapshot of the Confluence page.
This version is intentionally more explicit than the original page. The logged failures showed that the previous instructions left several critical assumptions unstated:
/workspace/xla-dev/...is a container path, not a host path- BOS repos must be cloned over SSH with a working GitHub key
- the TT-MLIR bootstrap still expects
python3.11to exist - the TT-XLA build uses the repo-provided
source venv/activate, notsource venv/bin/activate - mounted repositories need
git config --global --add safe.directory ...inside the container - port
8888may already be in use on the host - reusing an existing local
tt-metalcheckout is preferable to downloading it again when that checkout is already known-good - when reusing host
tt-metal, mounting it as/workspace/tt-metalinside the container is the cleanest setup
For compilation artifacts, use tt-xla-practical-example.md.
For runtime execution, use resnet50-run-example.md.
For non-install troubleshooting, use troubleshooting.md.
1. Preflight
1.1 Hardware and Driver Checks
Before building from source, confirm that the card and runtime are already available on the host:
lspci -nnk -s 01:00.0
lspci -nn | grep -E 'tenstorrent|16c3:abcd'
ls /dev/bos/
tt-smi
grep HugePages_Total /proc/meminfo
Expected result:
- the card is visible in
lspci - for the BOS A0 board, the device at
01:00.0shows kernel driverbos /dev/bos/0existstt-smilists the device- hugepages are configured
If these checks fail, stop here and fix the driver or firmware installation first.
1.2 Required Access
You need GitHub SSH access to the private BOS repositories:
git@github.com:bos-semi/tt-mlir.gitgit@github.com:bos-semi/tt-xla.git
For tt-metal, this guide prefers reusing the tt-metal folder that already exists on the
machine and mounting it as /workspace/tt-metal inside the container instead of cloning it again.
Only fetch tt-metal-e2 if you do not already have a usable local checkout.
Verify that your SSH key works before you start Docker:
ssh -T git@github.com
Expected result:
Hi <username>! You've successfully authenticated, but GitHub does not provide shell access.
If that does not work, start an SSH agent and load your key:
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
ssh -T git@github.com
Do not use GitHub HTTPS password authentication for these clones. The logs show that it fails.
1.3 Host vs Container Paths
This matters for almost every failure in the logs:
- on the host, use a normal home-directory workspace such as
$HOME/tt-xla3 - inside the container, use
/workspace/xla-dev
Do not run /workspace/xla-dev/... commands on the host. That path only exists inside the
container.
2. Start the Build Container
Create the host-side directories first:
export IMAGE_NAME=ghcr.io/tenstorrent/tt-xla/tt-xla-ci-ubuntu-22-04:latest
export WORK_ROOT=$HOME/tt-xla3
export HOST_WORKSPACE=$WORK_ROOT/xla-dev
export CONTAINER_WORKSPACE=/workspace/xla-dev
export DATA_DIR=$WORK_ROOT/data
export DOCKER_NAME=tt-xla-a0-dev
export DEVICE_PATH=/dev/bos/0
export SHM_SIZE=32g
export HOST_PORT=8899
export TT_METAL_HOST_DIR=$HOME/tt-metal
mkdir -p "$HOST_WORKSPACE" "$DATA_DIR"
HOST_PORT=8899 is used here on purpose. The logs show that the first attempt with host port
8888 failed because that port was already allocated.
TT_METAL_HOST_DIR should point at the tt-metal folder that already exists on your machine. The
trials page preferred this over downloading tt-metal again.
Start the container:
docker rm -f "$DOCKER_NAME" 2>/dev/null || true
docker run -itd \
--name "$DOCKER_NAME" \
-p ${HOST_PORT}:8888 \
-v "$HOST_WORKSPACE":"$CONTAINER_WORKSPACE" \
-v "$DATA_DIR":/data \
-v "$TT_METAL_HOST_DIR":/workspace/tt-metal \
-v /dev/hugepages:/dev/hugepages \
-v /dev/hugepages-1G:/dev/hugepages-1G \
--device "$DEVICE_PATH":"$DEVICE_PATH" \
--shm-size "$SHM_SIZE" \
--cap-add ALL \
--ipc=host \
-v "$HOME/.ssh:/root/.ssh:ro" \
-v "$SSH_AUTH_SOCK:/ssh-agent" \
-e SSH_AUTH_SOCK=/ssh-agent \
--restart unless-stopped \
"$IMAGE_NAME" bash
Open a shell in the container:
docker exec -it "$DOCKER_NAME" bash
Inside the container:
mkdir -p /workspace/xla-dev
cd /workspace/xla-dev
ssh -T git@github.com
ls -la /workspace/xla-dev/tt-xla 2>/dev/null || true
ls -la /workspace/xla-dev/tt-mlir 2>/dev/null || true
ls -la /workspace/tt-metal
If ssh -T fails inside the container, stop and fix SSH agent forwarding before cloning.
If /workspace/tt-metal is missing, stop and fix the host mount first.
3. Clone the BOS Repositories
Run these commands inside the container:
cd /workspace/xla-dev
git clone --branch develop git@github.com:bos-semi/tt-mlir.git
git clone --branch release/a0 git@github.com:bos-semi/tt-xla.git
Preferred tt-metal handling:
# Preferred: reuse the existing tt-metal checkout already present on the machine
ls -ld /workspace/tt-metal
If /workspace/tt-metal does not exist yet, clone it explicitly:
cd /workspace
git clone --branch develop git@github.com:bos-semi/tt-metal-e2.git tt-metal
Immediately mark the mounted repositories as safe for Git:
git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/tt-metal
This is required because the repos live on a mounted host volume and the container is running as
root. Without these entries, later steps fail with fatal: detected dubious ownership.
If you are reusing an existing host-side tt-metal checkout, keep that path stable and reuse it
through the container mount instead of creating a second copy.
4. Build TT-MLIR
4.1 Install the Missing Bootstrap Dependencies
The Confluence page did not call this out clearly enough: the TT-MLIR environment bootstrap still
invokes python3.11. In the logs, cmake --build env/build failed until python3.11 was
installed explicitly.
Run inside the container:
apt-get update
apt-get install -y \
python3.11 python3.11-venv python3-pip \
clang-17 lld-17 ccache ninja-build cmake
Do not create a fake python3.11 -> python3.12 symlink. Install Python 3.11 properly.
4.2 Configure and Build the TT-MLIR Toolchain
cd /workspace/xla-dev/tt-mlir
export TTMLIR_TOOLCHAIN_DIR=/opt/ttmlir-toolchain
mkdir -p "$TTMLIR_TOOLCHAIN_DIR"
export TTMLIR_PYTHON_VERSION=python3.11
cmake -B env/build env \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
cmake --build env/build
source env/activate
python --version
Expected result:
cmake --build env/buildcompletes successfully- after
source env/activate, the active Python is3.12.x
If you see a dubious ownership error here, re-run the safe-directory commands from Section 3 and
run the build again.
5. Prepare TT-XLA
5.1 Initialize the TT-XLA Submodules
Run inside the container:
cd /workspace/xla-dev/tt-xla
git submodule update --init third_party/tt_forge_models
git config --global --add safe.directory /workspace/xla-dev/tt-xla/third_party/tt_forge_models
git submodule update --init --recursive
If you skip the safe.directory entry for third_party/tt_forge_models, the recursive submodule
step can fail even when the main repository is already marked safe.
5.2 Create the TT-Metal Symlink Expected by the Build
The logs showed a missing Tracy.hpp include until this path was created. Run:
mkdir -p /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src
ln -sfn /workspace/xla-dev/tt-mlir/third_party/tt-metal/src/tt-metal \
/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal
ls -l /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal/tt_metal/third_party/tracy/public/tracy/Tracy.hpp
Do not continue until that final ls command succeeds.
6. Build TT-XLA
6.1 Activate the TT-XLA Environment
Run inside the container:
cd /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/tt-metal
source venv/activate
Use source venv/activate, not source venv/bin/activate. The repo ships its own activation
script and the logged working build used that script.
6.2 Configure and Compile
cmake -G Ninja -S . -B build \
-DTT_MLIR_VERSION=develop \
-DUSE_BOS_SEMI_TTMLIR=ON \
-DUSE_CUSTOM_TT_MLIR_VERSION=ON \
-DUSE_BOS_REPO=ON
cmake --build build --verbose -j"$(nproc)"
This is the BOS A0 source-build configuration that matched the successful log.
The tt-xla trials page also validated the following runtime exports before testing:
export TT_XLA_RUNTIME_ROOT=/workspace/xla-dev/tt-xla
export PROJECT_SOURCE_DIR=/workspace/xla-dev/tt-xla
export TT_MLIR_RUNTIME_ROOT=$TT_XLA_RUNTIME_ROOT/third_party/tt-mlir/src/tt-mlir
export TT_METAL_RUNTIME_ROOT=/workspace/tt-metal
export TTMLIR_TOOLCHAIN_DIR=/opt/ttmlir-toolchain
export PJRT_DEVICE=TT
6.3 Verify the Python Package Install Step
The TT-XLA build ends by installing the editable Python package from python_package. If the
safe.directory entries above are missing, the build fails during pip install -e with metadata
generation errors. When the safe-directory entries are present, the package install completes.
Quick verification:
python -c "import pjrt_plugin_tt; print('pjrt_plugin_tt import OK')"
Expected result:
pjrt_plugin_tt import OK
7. Validation
7.1 Minimum Validation
python -c "import pjrt_plugin_tt; print('pjrt_plugin_tt import OK')"
7.2 Functional Validation
cd /workspace/xla-dev/tt-xla
pytest tests/benchmark/test_vision.py::test_resnet50 -sv
7.3 What "Good" Looks Like
Use this checklist:
-
tt-smisees the BOS device on the host -
ssh -T git@github.comworks on both the host and inside the container -
cmake --build env/buildsucceeds intt-mlir - the
Tracy.hpppath exists under the TT-XLA nestedthird_party/tt-mlirtree -
cmake --build build --verbose -j"$(nproc)"succeeds intt-xla -
python -c "import pjrt_plugin_tt"succeeds -
/workspace/tt-metalis the reused host checkout you intended to use
7.4 Trials Folded In
The Confluence page tt-xla trials has been folded into this manual in four places:
- it prefers reusing the existing host
tt-metalcheckout and mounting it at/workspace/tt-metal - it verifies repo mounts immediately after entering the container
- it confirms
python3.11is required before rebuildingtt-mlir env/build - it captures the missing
pjrt_plugin_tt.sorecovery flow after an otherwise successful build
8. Troubleshooting
docker run fails with port is already allocated
Symptom from the logs:
Bind for 0.0.0.0:8888 failed: port is already allocated
Fix:
export HOST_PORT=8899
docker rm -f tt-xla-a0-dev 2>/dev/null || true
Then re-run the docker run command with -p ${HOST_PORT}:8888.
Permission denied (publickey) or Authentication failed when cloning
Cause:
- SSH key not loaded into the agent
- agent socket not forwarded into the container
- using HTTPS/password auth instead of SSH
Fix:
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
ssh -T git@github.com
Then restart the container with the SSH socket mount:
-v "$SSH_AUTH_SOCK:/ssh-agent" -e SSH_AUTH_SOCK=/ssh-agent
/workspace/xla-dev/... does not exist
Cause:
You are still on the host shell. That path only exists inside the container.
Fix:
docker exec -it tt-xla-a0-dev bash
cd /workspace/xla-dev
python3.11: command not found during cmake --build env/build
Fix:
apt-get update
apt-get install -y python3.11 python3.11-venv python3-pip
export TTMLIR_PYTHON_VERSION=python3.11
Then rebuild:
cd /workspace/xla-dev/tt-mlir
cmake --build env/build
fatal: detected dubious ownership in repository
Fix:
git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/tt-metal
git config --global --add safe.directory /workspace/xla-dev/tt-xla/third_party/tt_forge_models
This error can appear during:
git submodule update --init --recursive- the TT-MLIR bootstrap
- the TT-XLA
pip install -estep
Tracy.hpp not found
Symptom from the logs:
.../third_party/tracy/public/tracy/Tracy.hpp: No such file or directory
Fix:
mkdir -p /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src
ln -sfn /workspace/xla-dev/tt-mlir/third_party/tt-metal/src/tt-metal \
/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal
Then verify:
ls -l /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal/tt_metal/third_party/tracy/public/tracy/Tracy.hpp
pip install -e fails while building TT-XLA
If the failure includes metadata-generation-failed and a nested git rev-parse error, the usual
cause is still Git safety on the mounted repositories. Re-run the safe.directory commands, then
re-run:
cd /workspace/xla-dev/tt-xla
cmake --build build --verbose -j"$(nproc)"
pjrt_plugin_tt.so is missing from python_package
The trials page captured this failure after the build and test startup:
FileNotFoundError: ... /workspace/xla-dev/tt-xla/python_package/pjrt_plugin_tt/pjrt_plugin_tt.so does not exist
Check the build output and the package path:
cd /workspace/xla-dev/tt-xla
source venv/activate
ls -l build/pjrt_implementation/src/pjrt_plugin_tt.so
ls -l python_package/pjrt_plugin_tt/pjrt_plugin_tt.so
If the build output exists and the package path does not, repair it with:
ln -sf \
/workspace/xla-dev/tt-xla/build/pjrt_implementation/src/pjrt_plugin_tt.so \
/workspace/xla-dev/tt-xla/python_package/pjrt_plugin_tt/pjrt_plugin_tt.so
pip install -e python_package --no-deps --no-build-isolation
python -c "from pjrt_plugin_tt import get_library_path; print(get_library_path())"
Then rerun the test:
export TT_METAL_RUNTIME_ROOT=/workspace/tt-metal
export PJRT_DEVICE=TT
pytest /workspace/xla-dev/tt-xla/tests/benchmark/test_vision.py::test_resnet50 -sv
TT-XLA expects a mirrored local tt-mlir path
The trials page also captured a failure where install_ttmlir_requirements.sh expected:
/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/env/CMakeLists.txt
If you are reusing a local tt-mlir checkout, create the expected mirror path:
cd /workspace/xla-dev/tt-xla
mkdir -p third_party/tt-mlir/src
ln -sfn /workspace/xla-dev/tt-mlir third_party/tt-mlir/src/tt-mlir
ls -l /workspace/xla-dev/tt-mlir/env/CMakeLists.txt
Then clean the stale external-project artifacts and rebuild:
rm -rf build third_party/tt-mlir/src/tt-mlir-stamp
cmake -G Ninja -B build \
-DTT_MLIR_VERSION=develop \
-DUSE_BOS_SEMI_TTMLIR=ON \
-DUSE_CUSTOM_TT_MLIR_VERSION=ON \
-DUSE_BOS_REPO=ON
cmake --build build --verbose
If /workspace/xla-dev/tt-mlir/env/CMakeLists.txt is missing in the real checkout, stop there:
that usually means the tt-mlir branch is incompatible with the tt-xla revision you are using.
Verify mounts before debugging the build
This quick check from the trials page is worth repeating whenever the container is recreated:
ls -la /workspace/xla-dev/tt-xla
ls -la /workspace/xla-dev/tt-mlir
ls -la /workspace/tt-metal
If any of these are missing, fix the mount configuration first.
core_assignment.cpp unused-variable build failure
This is a known BOS issue from the local Confluence snapshot. If it appears in your environment,
patch the affected variables with [[maybe_unused]] and rebuild. Keep this as a targeted fix;
do not apply it pre-emptively unless your build actually hits that compiler error.