TeHOR targets joint reconstruction of a 3D human and object from a single image by capturing their holistic and semantic interactions using text descriptions. It further aligns reconstructions with the appearance of the human and object so that non-contact interactions (e.g., gazing or pointing) remain semantically plausible, beyond what contact-only reasoning allows.
Clone the repository with submodules:
git clone --recursive https://github.com/hygenie1228/TeHOR_RELEASE.git
cd TeHOR_RELEASE
If you already cloned the repository, initialize the submodules manually:
git submodule update --init --recursive
We recommend an Anaconda environment with Python 3.10, PyTorch 2.3.x, and CUDA 12.1. From the repository root:
conda create -n tehor python=3.10 -y
conda activate tehor
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install thirdparties/diff-gaussian-rasterization
pip install thirdparties/simple-knn
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
pip install thirdparties/multiperson/sdf
pip install -e engine/PoseAPI/third-party/ViTPose
For compatibility with the bundled third-party code, run:
bash scripts/_install/lhm_setup.sh
bash scripts/_install/install_sam.sh
Prepare the required files with the following layout:
data
|-- clip-vit-large-patch14
|-- examples
|-- pretrained_models
|-- segmentation
|-- smart-eraser
|-- stable-diffusion-2-1
|-- openai.env
exp
|-- ...
You can download the required model files easily by running:
bash scripts/_install/download.sh
- Download the pretrained SmartEraser model from SmartEraser, then place it under
data/smart-eraser. - Download the example files (
data,exp) from Google Drive. - Put your OpenAI API key in
data/openai.env. Ifopenai.envis missing, you can manually input the text prompts.
Builds png/, processed/, human/, object/, and prompts.json under your experiment directory:
python scripts/preprocess.py --img_path {PATH/TO/IMAGE.jpg} --exp_dir {PATH/TO/EXP_DIR} --gpu 0
For example,
python scripts/preprocess.py --img_path data/examples/demo-1.png --exp_dir exp/demo-1 --gpu 0
For the official paper setup, align the object mesh to the estimated depth map from ZoeDepth using ICP (Iterative Closest Point).
python scripts/run_tehor.py --exp-dir {PATH/TO/EXP_DIR} --gpu 0
For example,
python scripts/run_tehor.py --exp-dir exp/open3dhoi/teddy_bear-HICO_train2015_00005436 --gpu 0
We thank the authors of:
- TeCH for text‑guided human reconstruction ideas and related tooling in this line of work.
- LHM (and upstream LRM families) for strong single‑image human priors.
- InstantMesh for image‑to‑3D object assets.
- TRELLIS for image‑to‑3D object assets.
- 3D Gaussian Splatting ecosystem (
diff-gaussian-rasterization,simple-knn). - Segment Anything, Grounding DINO, and related segmenters bundled under
engine/SegmentAPI/.
@inproceedings{nam2026tehor,
author = {Nam, Hyeongjin and Jung, Daniel Sungho and Lee, Kyoung Mu},
title = {{TeHOR}: Text-Guided 3D Human and Object Reconstruction with Textures},
booktitle = {CVPR},
year = {2026},
}

