FastOFt
OFT (Orthographic Feature Transform) Model
OFT is 3d object detection model that uses orthographic feature transforms to detect objects in 3D space. The model combines a ResNet-based frontend with specialized orthographic feature transformation layers and a topdown refinement network.
Model Architecture
The OFT model consists of several key components:
- Frontend: ResNet-18/34 backbone for feature extraction at multiple scales (8x, 16x, 32x downsampling)
- Lateral Layers: Convert ResNet outputs to a common 256-channel feature representation
- OFT Layers: Orthographic Feature Transform modules that project features into bird's-eye view
- Topdown Network: 8-layer refinement network using BasicBlock modules
- Detection Head: Final convolutional layer that outputs object scores, positions, dimensions, and angles
- Decoder Additional module that is used to decode encoded outputs into objects
The model outputs:
- Scores: Object detection confidence scores
- Position Offsets: 3D position predictions (x, y, z)
- Dimension Offsets: Object size predictions (width, height, length)
- Angle Offsets: Object orientation predictions (sin, cos components)
- Objects: Decoded outputs into list of detected objects.
Project Structure
models/bos_model/oft/
├── demo/ # Demo scripts and visualization
├── reference/ # PyTorch reference implementation
├── resources/ # Test images and calibration files
├── tests/ # All tests together
└── pcc/ # Unit tests for individual components
└── tt/ # TenstorrentNN (TTNN) optimized implementation
Input Requirements: Both demos require:
- env variable CHECKPOINTS_PATH with pre-trained checkpoint file (e.g.,
export CHECKPOINTS_PATH="*your-path*/checkpoint-0600.pth") - Input images in JPG format (located in
resources/) - Corresponding calibration files in TXT format (camera intrinsic parameters)
demo.py
Full end-to-end inference demo that runs both PyTorch reference and TTNN implementations, comparing their outputs and generating visualizations.
Features:
- Loads pre-trained model weights from checkpoint
- Processes input images with calibration data
- Runs full OFT inference pipeline on both CPU (PyTorch) and device (TTNN)
- Executes complete pipeline on TTNN: OFTNet model inference + object decoder/encoder
- Compares intermediate outputs and final predictions
- Generates detection visualizations and heatmaps
- Supports various precision modes (float32, bfloat16)
- Configurable fallback modes for debugging
Usage:
TT_METAL_CORE_GRID_OVERRIDE_TODEPRECATE="4,3" pytest models/bos_model/oft/demo/demo.py
host_demo.py
Host-only demo that compares float32 and bfloat16 precision using only PyTorch reference implementation.
Features:
- Precision comparison between fp32 and bfp16
- Object detection visualization
- Performance and accuracy analysis
- No device execution required - pure CPU inference
- Useful for baseline validation
Usage:
pytest models/bos_model/oft/demo/host_demo.py