Skip to main content

TT-XLA Installation Manual

Tool: TT-XLA (Tenstorrent XLA)
Target hardware: BOS A0 Eagle / Blackhole card
Scope of this revision: verified BOS A0 source-build flow based on the installation logs from inputs/logs and the local snapshot of the Confluence page.

This version is intentionally more explicit than the original page. The logged failures showed that the previous instructions left several critical assumptions unstated:

  • /workspace/xla-dev/... is a container path, not a host path
  • BOS repos must be cloned over SSH with a working GitHub key
  • the TT-MLIR bootstrap still expects python3.11 to exist
  • the TT-XLA build uses the repo-provided source venv/activate, not source venv/bin/activate
  • mounted repositories need git config --global --add safe.directory ... inside the container
  • port 8888 may already be in use on the host
  • reusing an existing local tt-metal checkout is preferable to downloading it again when that checkout is already known-good
  • when reusing host tt-metal, mounting it as /workspace/tt-metal inside the container is the cleanest setup

For compilation artifacts, use tt-xla-practical-example.md.
For runtime execution, use resnet50-run-example.md.
For non-install troubleshooting, use troubleshooting.md.


1. Preflight

1.1 Hardware and Driver Checks

Before building from source, confirm that the card and runtime are already available on the host:

lspci -nnk -s 01:00.0
lspci -nn | grep -E 'tenstorrent|16c3:abcd'
ls /dev/bos/
tt-smi
grep HugePages_Total /proc/meminfo

Expected result:

  • the card is visible in lspci
  • for the BOS A0 board, the device at 01:00.0 shows kernel driver bos
  • /dev/bos/0 exists
  • tt-smi lists the device
  • hugepages are configured

If these checks fail, stop here and fix the driver or firmware installation first.

1.2 Required Access

You need GitHub SSH access to the private BOS repositories:

  • git@github.com:bos-semi/tt-mlir.git
  • git@github.com:bos-semi/tt-xla.git

For tt-metal, this guide prefers reusing the tt-metal folder that already exists on the machine and mounting it as /workspace/tt-metal inside the container instead of cloning it again. Only fetch tt-metal-e2 if you do not already have a usable local checkout.

Verify that your SSH key works before you start Docker:

ssh -T git@github.com

Expected result:

Hi <username>! You've successfully authenticated, but GitHub does not provide shell access.

If that does not work, start an SSH agent and load your key:

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
ssh -T git@github.com

Do not use GitHub HTTPS password authentication for these clones. The logs show that it fails.

1.3 Host vs Container Paths

This matters for almost every failure in the logs:

  • on the host, use a normal home-directory workspace such as $HOME/tt-xla3
  • inside the container, use /workspace/xla-dev

Do not run /workspace/xla-dev/... commands on the host. That path only exists inside the container.


2. Start the Build Container

Create the host-side directories first:

export IMAGE_NAME=ghcr.io/tenstorrent/tt-xla/tt-xla-ci-ubuntu-22-04:latest
export WORK_ROOT=$HOME/tt-xla3
export HOST_WORKSPACE=$WORK_ROOT/xla-dev
export CONTAINER_WORKSPACE=/workspace/xla-dev
export DATA_DIR=$WORK_ROOT/data
export DOCKER_NAME=tt-xla-a0-dev
export DEVICE_PATH=/dev/bos/0
export SHM_SIZE=32g
export HOST_PORT=8899
export TT_METAL_HOST_DIR=$HOME/tt-metal

mkdir -p "$HOST_WORKSPACE" "$DATA_DIR"

HOST_PORT=8899 is used here on purpose. The logs show that the first attempt with host port 8888 failed because that port was already allocated.

TT_METAL_HOST_DIR should point at the tt-metal folder that already exists on your machine. The trials page preferred this over downloading tt-metal again.

Start the container:

docker rm -f "$DOCKER_NAME" 2>/dev/null || true

docker run -itd \
--name "$DOCKER_NAME" \
-p ${HOST_PORT}:8888 \
-v "$HOST_WORKSPACE":"$CONTAINER_WORKSPACE" \
-v "$DATA_DIR":/data \
-v "$TT_METAL_HOST_DIR":/workspace/tt-metal \
-v /dev/hugepages:/dev/hugepages \
-v /dev/hugepages-1G:/dev/hugepages-1G \
--device "$DEVICE_PATH":"$DEVICE_PATH" \
--shm-size "$SHM_SIZE" \
--cap-add ALL \
--ipc=host \
-v "$HOME/.ssh:/root/.ssh:ro" \
-v "$SSH_AUTH_SOCK:/ssh-agent" \
-e SSH_AUTH_SOCK=/ssh-agent \
--restart unless-stopped \
"$IMAGE_NAME" bash

Open a shell in the container:

docker exec -it "$DOCKER_NAME" bash

Inside the container:

mkdir -p /workspace/xla-dev
cd /workspace/xla-dev
ssh -T git@github.com
ls -la /workspace/xla-dev/tt-xla 2>/dev/null || true
ls -la /workspace/xla-dev/tt-mlir 2>/dev/null || true
ls -la /workspace/tt-metal

If ssh -T fails inside the container, stop and fix SSH agent forwarding before cloning. If /workspace/tt-metal is missing, stop and fix the host mount first.


3. Clone the BOS Repositories

Run these commands inside the container:

cd /workspace/xla-dev

git clone --branch develop git@github.com:bos-semi/tt-mlir.git
git clone --branch release/a0 git@github.com:bos-semi/tt-xla.git

Preferred tt-metal handling:

# Preferred: reuse the existing tt-metal checkout already present on the machine
ls -ld /workspace/tt-metal

If /workspace/tt-metal does not exist yet, clone it explicitly:

cd /workspace
git clone --branch develop git@github.com:bos-semi/tt-metal-e2.git tt-metal

Immediately mark the mounted repositories as safe for Git:

git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/tt-metal

This is required because the repos live on a mounted host volume and the container is running as root. Without these entries, later steps fail with fatal: detected dubious ownership.

If you are reusing an existing host-side tt-metal checkout, keep that path stable and reuse it through the container mount instead of creating a second copy.


4. Build TT-MLIR

4.1 Install the Missing Bootstrap Dependencies

The Confluence page did not call this out clearly enough: the TT-MLIR environment bootstrap still invokes python3.11. In the logs, cmake --build env/build failed until python3.11 was installed explicitly.

Run inside the container:

apt-get update
apt-get install -y \
python3.11 python3.11-venv python3-pip \
clang-17 lld-17 ccache ninja-build cmake

Do not create a fake python3.11 -> python3.12 symlink. Install Python 3.11 properly.

4.2 Configure and Build the TT-MLIR Toolchain

cd /workspace/xla-dev/tt-mlir

export TTMLIR_TOOLCHAIN_DIR=/opt/ttmlir-toolchain
mkdir -p "$TTMLIR_TOOLCHAIN_DIR"
export TTMLIR_PYTHON_VERSION=python3.11

cmake -B env/build env \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++

cmake --build env/build
source env/activate
python --version

Expected result:

  • cmake --build env/build completes successfully
  • after source env/activate, the active Python is 3.12.x

If you see a dubious ownership error here, re-run the safe-directory commands from Section 3 and run the build again.


5. Prepare TT-XLA

5.1 Initialize the TT-XLA Submodules

Run inside the container:

cd /workspace/xla-dev/tt-xla

git submodule update --init third_party/tt_forge_models
git config --global --add safe.directory /workspace/xla-dev/tt-xla/third_party/tt_forge_models
git submodule update --init --recursive

If you skip the safe.directory entry for third_party/tt_forge_models, the recursive submodule step can fail even when the main repository is already marked safe.

The logs showed a missing Tracy.hpp include until this path was created. Run:

mkdir -p /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src

ln -sfn /workspace/xla-dev/tt-mlir/third_party/tt-metal/src/tt-metal \
/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal

ls -l /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal/tt_metal/third_party/tracy/public/tracy/Tracy.hpp

Do not continue until that final ls command succeeds.


6. Build TT-XLA

6.1 Activate the TT-XLA Environment

Run inside the container:

cd /workspace/xla-dev/tt-xla

git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/tt-metal

source venv/activate

Use source venv/activate, not source venv/bin/activate. The repo ships its own activation script and the logged working build used that script.

6.2 Configure and Compile

cmake -G Ninja -S . -B build \
-DTT_MLIR_VERSION=develop \
-DUSE_BOS_SEMI_TTMLIR=ON \
-DUSE_CUSTOM_TT_MLIR_VERSION=ON \
-DUSE_BOS_REPO=ON

cmake --build build --verbose -j"$(nproc)"

This is the BOS A0 source-build configuration that matched the successful log.

The tt-xla trials page also validated the following runtime exports before testing:

export TT_XLA_RUNTIME_ROOT=/workspace/xla-dev/tt-xla
export PROJECT_SOURCE_DIR=/workspace/xla-dev/tt-xla
export TT_MLIR_RUNTIME_ROOT=$TT_XLA_RUNTIME_ROOT/third_party/tt-mlir/src/tt-mlir
export TT_METAL_RUNTIME_ROOT=/workspace/tt-metal
export TTMLIR_TOOLCHAIN_DIR=/opt/ttmlir-toolchain
export PJRT_DEVICE=TT

6.3 Verify the Python Package Install Step

The TT-XLA build ends by installing the editable Python package from python_package. If the safe.directory entries above are missing, the build fails during pip install -e with metadata generation errors. When the safe-directory entries are present, the package install completes.

Quick verification:

python -c "import pjrt_plugin_tt; print('pjrt_plugin_tt import OK')"

Expected result:

pjrt_plugin_tt import OK

7. Validation

7.1 Minimum Validation

python -c "import pjrt_plugin_tt; print('pjrt_plugin_tt import OK')"

7.2 Functional Validation

cd /workspace/xla-dev/tt-xla
pytest tests/benchmark/test_vision.py::test_resnet50 -sv

7.3 What "Good" Looks Like

Use this checklist:

  • tt-smi sees the BOS device on the host
  • ssh -T git@github.com works on both the host and inside the container
  • cmake --build env/build succeeds in tt-mlir
  • the Tracy.hpp path exists under the TT-XLA nested third_party/tt-mlir tree
  • cmake --build build --verbose -j"$(nproc)" succeeds in tt-xla
  • python -c "import pjrt_plugin_tt" succeeds
  • /workspace/tt-metal is the reused host checkout you intended to use

7.4 Trials Folded In

The Confluence page tt-xla trials has been folded into this manual in four places:

  • it prefers reusing the existing host tt-metal checkout and mounting it at /workspace/tt-metal
  • it verifies repo mounts immediately after entering the container
  • it confirms python3.11 is required before rebuilding tt-mlir env/build
  • it captures the missing pjrt_plugin_tt.so recovery flow after an otherwise successful build

8. Troubleshooting

docker run fails with port is already allocated

Symptom from the logs:

Bind for 0.0.0.0:8888 failed: port is already allocated

Fix:

export HOST_PORT=8899
docker rm -f tt-xla-a0-dev 2>/dev/null || true

Then re-run the docker run command with -p ${HOST_PORT}:8888.

Permission denied (publickey) or Authentication failed when cloning

Cause:

  • SSH key not loaded into the agent
  • agent socket not forwarded into the container
  • using HTTPS/password auth instead of SSH

Fix:

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
ssh -T git@github.com

Then restart the container with the SSH socket mount:

-v "$SSH_AUTH_SOCK:/ssh-agent" -e SSH_AUTH_SOCK=/ssh-agent

/workspace/xla-dev/... does not exist

Cause:

You are still on the host shell. That path only exists inside the container.

Fix:

docker exec -it tt-xla-a0-dev bash
cd /workspace/xla-dev

python3.11: command not found during cmake --build env/build

Fix:

apt-get update
apt-get install -y python3.11 python3.11-venv python3-pip
export TTMLIR_PYTHON_VERSION=python3.11

Then rebuild:

cd /workspace/xla-dev/tt-mlir
cmake --build env/build

fatal: detected dubious ownership in repository

Fix:

git config --global --add safe.directory /workspace/xla-dev/tt-xla
git config --global --add safe.directory /workspace/xla-dev/tt-mlir
git config --global --add safe.directory /workspace/tt-metal
git config --global --add safe.directory /workspace/xla-dev/tt-xla/third_party/tt_forge_models

This error can appear during:

  • git submodule update --init --recursive
  • the TT-MLIR bootstrap
  • the TT-XLA pip install -e step

Tracy.hpp not found

Symptom from the logs:

.../third_party/tracy/public/tracy/Tracy.hpp: No such file or directory

Fix:

mkdir -p /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src
ln -sfn /workspace/xla-dev/tt-mlir/third_party/tt-metal/src/tt-metal \
/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal

Then verify:

ls -l /workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/third_party/tt-metal/src/tt-metal/tt_metal/third_party/tracy/public/tracy/Tracy.hpp

pip install -e fails while building TT-XLA

If the failure includes metadata-generation-failed and a nested git rev-parse error, the usual cause is still Git safety on the mounted repositories. Re-run the safe.directory commands, then re-run:

cd /workspace/xla-dev/tt-xla
cmake --build build --verbose -j"$(nproc)"

pjrt_plugin_tt.so is missing from python_package

The trials page captured this failure after the build and test startup:

FileNotFoundError: ... /workspace/xla-dev/tt-xla/python_package/pjrt_plugin_tt/pjrt_plugin_tt.so does not exist

Check the build output and the package path:

cd /workspace/xla-dev/tt-xla
source venv/activate
ls -l build/pjrt_implementation/src/pjrt_plugin_tt.so
ls -l python_package/pjrt_plugin_tt/pjrt_plugin_tt.so

If the build output exists and the package path does not, repair it with:

ln -sf \
/workspace/xla-dev/tt-xla/build/pjrt_implementation/src/pjrt_plugin_tt.so \
/workspace/xla-dev/tt-xla/python_package/pjrt_plugin_tt/pjrt_plugin_tt.so

pip install -e python_package --no-deps --no-build-isolation
python -c "from pjrt_plugin_tt import get_library_path; print(get_library_path())"

Then rerun the test:

export TT_METAL_RUNTIME_ROOT=/workspace/tt-metal
export PJRT_DEVICE=TT
pytest /workspace/xla-dev/tt-xla/tests/benchmark/test_vision.py::test_resnet50 -sv

TT-XLA expects a mirrored local tt-mlir path

The trials page also captured a failure where install_ttmlir_requirements.sh expected:

/workspace/xla-dev/tt-xla/third_party/tt-mlir/src/tt-mlir/env/CMakeLists.txt

If you are reusing a local tt-mlir checkout, create the expected mirror path:

cd /workspace/xla-dev/tt-xla
mkdir -p third_party/tt-mlir/src
ln -sfn /workspace/xla-dev/tt-mlir third_party/tt-mlir/src/tt-mlir
ls -l /workspace/xla-dev/tt-mlir/env/CMakeLists.txt

Then clean the stale external-project artifacts and rebuild:

rm -rf build third_party/tt-mlir/src/tt-mlir-stamp

cmake -G Ninja -B build \
-DTT_MLIR_VERSION=develop \
-DUSE_BOS_SEMI_TTMLIR=ON \
-DUSE_CUSTOM_TT_MLIR_VERSION=ON \
-DUSE_BOS_REPO=ON

cmake --build build --verbose

If /workspace/xla-dev/tt-mlir/env/CMakeLists.txt is missing in the real checkout, stop there: that usually means the tt-mlir branch is incompatible with the tt-xla revision you are using.

Verify mounts before debugging the build

This quick check from the trials page is worth repeating whenever the container is recreated:

ls -la /workspace/xla-dev/tt-xla
ls -la /workspace/xla-dev/tt-mlir
ls -la /workspace/tt-metal

If any of these are missing, fix the mount configuration first.

core_assignment.cpp unused-variable build failure

This is a known BOS issue from the local Confluence snapshot. If it appears in your environment, patch the affected variables with [[maybe_unused]] and rebuild. Keep this as a targeted fix; do not apply it pre-emptively unless your build actually hits that compiler error.