GitHub - shitagaki-lab/see-through: "Single-image Layer Decomposition for Anime Characters" (SIGGRAPH 2026, Conditionally Accepted)

See-through: Single-image Layer Decomposition for Anime Characters

Jian Lin¹, Chengze Li^1*, Haoyun Qin^2,3,4, Kwun Wang Chan¹, Yanghua Jin³, Hanyuan Liu¹, Stephen Chun Wang Choy¹, Xueting Liu¹

¹Saint Francis University ²University of Pennsylvania ³Spellbrush ⁴Shitagaki Lab

^*Corresponding author

Conditionally accepted to appear in ACM SIGGRAPH 2026 Conference Proceedings.

TL;DR

We introduce a framework that automates the transformation of static anime illustrations into manipulatable 2.5D models. Our approach decomposes a single image into fully inpainted, semantically distinct layers with inferred drawing orders — up to 23 layers including hair, face, eyes, clothing, accessories, and more.

trailer.mp4

This is our trailer video. Click to play.

Environment Setup

# 1. Create environment
conda create -n see_through python=3.12 -y
conda activate see_through

# 2. Install PyTorch (CUDA 12.8)
pip install torch==2.8.0+cu128 torchvision==0.23.0+cu128 torchaudio==2.8.0+cu128 \
  --index-url https://download.pytorch.org/whl/cu128

# 3. Install dependencies (includes common utilities and annotators)
pip install -r requirements.txt

# 4. Create assets symlink (you can also copy assets to the root if you prefer)
ln -sf common/assets assets

Optional annotator tiers (install as needed):

Tier	Command	What it adds
Body parsing	`pip install --no-build-isolation -r requirements-inference-annotators.txt`	detectron2 for body attribute tagging
SAM2	`pip install --no-build-isolation -r requirements-inference-sam2.txt`	SAM2 for language-guided segmentation
Instance seg	`pip install -r requirements-inference-mmdet.txt`	mmcv/mmdet for anime instance segmentation

Note: Always run scripts from the repository root as the working directory.

Scripts & Models

Models

Model	HuggingFace Repo	Description
LayerDiff 3D		Diffusion-based transparent layer generation (SDXL)
Marigold Depth		Pseudo-depth estimation fine-tuned for anime
SAM Body Parsing		Semantic body part segmentation

Inference Scripts

Script	Purpose
`inference/scripts/inference_psd.py`	Main pipeline — end-to-end layer decomposition → PSD output
`inference/scripts/syn_data.py`	Synthetic training data generation utilities

For the other inference/data parsing scripts refer to the codebase and check the docstrings for details.

Demo

Notebook	Description
`inference/demo/bodypartseg_sam.ipynb`	Interactive body part segmentation demo with visualization (19-parts)

For the definition of complete body tags, refer to scrap_model.py.

Huggingface Space

We have prepared a Huggingface Space with ZeroGPU, so that if you register with HuggingFace, you should be able to run 1-2 PSD extractions per day (approximately 2-3 mins each, at 1280 resolution).

(Copyright Tohoku Zunko Project).

Usage

Layer Decomposition (main pipeline)

inference_psd.py runs the full See-through pipeline: it applies the LayerDiff 3D model for transparent layer generation and the fine-tuned Marigold model for pseudo-depth inference, then stratifies the character into up to 23 semantic layers and exports a layered PSD file. Note that the separation for head and body are in two continuous stages, which may lead to a longer time than the original model mentioned in the paper.

# Decompose a single image into a layered PSD
python inference/scripts/inference_psd.py \
  --srcp assets/test_image.png \
  --save_to_psd

# Process a directory of images
python inference/scripts/inference_psd.py \
  --srcp path/to/image_folder/ \
  --save_to_psd

Output is saved to workspace/layerdiff_output/ by default. Each result includes:

A layered .psd file with semantically separated layers
Intermediate depth maps and segmentation masks

Note: This uses our most recent model with 23-layer body part separation (V3).

Once you have finished the layer splitting, you can further process the PSD with the scripts in inference/scripts/heuristic_partseg.py for depth-based or left-right stratification.

# Split based on depth
python inference/scripts/heuristic_partseg.py seg_wdepth --srcp workspace/test_samples_output/PV_0047_A0020.psd --target_tags handwear


#Left-right split
python inference/scripts/heuristic_partseg.py seg_wlr --srcp workspace/test_samples_output/PV_0047_A0020_wdepth.psd --target_tags handwear-1

Low-VRAM Users

The default pipeline runs at bf16 precision and requires approximately 12-16 GB of VRAM at 1280 resolution.

12 GB GPUs: Enable group offload to reduce peak VRAM to ~10 GB at 1280 resolution:

python inference/scripts/inference_psd.py \
  --srcp assets/test_image.png \
  --save_to_psd \
  --group_offload

8 GB GPUs: Use the NF4 quantized pipeline, which uses 4-bit quantized model weights. This achieves ~8 GB peak VRAM at 1280 resolution, and can be further reduced by lowering the resolution with group offload:

# Install bitsandbytes (one-time)
pip install -r requirements-inference-bnb.txt

# Run with NF4 quantization (default: group_offload on, depth resolution 720)
python inference/scripts/inference_psd_quantized.py \
  --srcp assets/test_image.png \
  --save_to_psd

# For even lower VRAM, reduce layerdiff resolution to 1024
python inference/scripts/inference_psd_quantized.py \
  --srcp assets/test_image.png \
  --save_to_psd \
  --resolution 1024

The quantized models are hosted on HuggingFace and downloaded automatically on first run. Quality is close to the full-precision model (PSNR ~30 dB, SSIM ~0.96 vs bf16 baseline).

Note: Group offload trades speed for VRAM savings (roughly 1.5x slower). NF4 quantization has minimal speed overhead but reduces model weight memory.

Preparing the dataset for training (e.g., Live2D Parsing)

We have provided a separate repo for you to prepare the dataset for training the Live2D parsing model. Please refer to CubismPartExtr to know how to download the sample model files and prepare your workspace folder.

After that, refer to the README_datapipeline.md for the instructions on how to run the data parsing scripts to prepare the dataset for inspection and training.

User Interface

Once you have prepared your data, you may go ahead with the user interfaces. Refer to UI Readme for the instructions on how to launch the UI.

We currently require the workspace/datasets/ folder located at the repository root to launch the UI, as it contains the sample data for demonstration. We will work on making this more flexible in the future.

We recommend installing the mmdet tier dependencies to ensure the UI can launch successfully.

Training

We have our training scripts ready, but we are still working on the documentation. We will release them no later than 2026/04/12. Please stay tuned!

Community Support

We welcome community contributions and third-party integrations!

If you build tools, extensions, or workflows on top of this project, please let us know by opening an issue or pull request — we would be happy to feature your work here.

ComfyUI-See-through by @jtydhr88 — Integration for ComfyUI, with node-based workflow and in-browser PSD export. Thank you for the amazing work!

We also seek i18n help for this project. Your help will be highly appreciated.

Discussion: Is this Image-to-Live2D?

We don't think so — at least, not yet.

While we produce 2.5D layer decompositions from a single image, the full Image-to-Live2D pipeline requires significantly more:

Finer artistic decomposition. Live2D models demand layers designed with specific deformation behaviors in mind. Our automatic decomposition prioritizes semantic correctness, but a Live2D artist would make different artistic choices about how to split layers for natural-looking motion.
Rigging. After decomposition, a Live2D model needs a deformation mesh, physics parameters, and motion curves — this rigging process is arguably the most critical (and labor-intensive) step, and it is not covered in this project.
Artistic intent. Professional Live2D works are crafted holistically: the layer structure, inpainting style, and rigging are designed together. Automating one step in isolation cannot replicate this.

That said, we believe our decomposition can serve as a useful starting point for Live2D artists by eliminating some of the most tedious part of the workflow, such as manual segmentation and occluded region inpainting.

Changelog

2026-04-02

Multiple memory optimizations; added suggestions for low-VRAM users (group offload, NF4 quantization).

Acknowledgements

This work is funded and substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project. No. UGC/FDS11/E02/23).

We would like to pay our thanks to the following people for their help and support:

Dingkun Yan and Xinrui Wang for their inspiration and support on the project.
USTC Student ACG Club "LEO" for kindly providing the sample Live2D model files for us to demonstrate on the paper.

This is an open-source research project. We thank the authors of the following projects that made this work possible:

LayerDiffuse — Transparent image layer diffusion (Lvmin Zhang is always a legend)
Marigold — Diffusion-based monocular depth estimation
Segment Anything (SAM) — Foundation model for segmentation
Grounding DINO — Open-set object detection
LaMa — Large mask inpainting
AnimeInstanceSegmentation — Anime-specific instance segmentation

Citation

If you find this work useful, please cite:

@article{lin2026seethrough,
  title={See-through: Single-image Layer Decomposition for Anime Characters},
  author={Lin, Jian and Li, Chengze and Qin, Haoyun and Chan, Kwun Wang and Jin, Yanghua and Liu, Hanyuan and Choy, Stephen Chun Wang and Liu, Xueting},
  journal={arXiv preprint arXiv:2602.03749},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

See-through: Single-image Layer Decomposition for Anime Characters

TL;DR

Environment Setup

Scripts & Models

Models

Inference Scripts

Demo

Huggingface Space

Usage

Layer Decomposition (main pipeline)

Low-VRAM Users

Preparing the dataset for training (e.g., Live2D Parsing)

User Interface

Training

Community Support

Discussion: Is this Image-to-Live2D?

Changelog

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
annotators		annotators
common		common
inference		inference
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_datapipeline.md		README_datapipeline.md
requirements-inference-annotators.txt		requirements-inference-annotators.txt
requirements-inference-bnb.txt		requirements-inference-bnb.txt
requirements-inference-mmdet.txt		requirements-inference-mmdet.txt
requirements-inference-sam2.txt		requirements-inference-sam2.txt
requirements-training-deepspeed.txt		requirements-training-deepspeed.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

See-through: Single-image Layer Decomposition for Anime Characters

TL;DR

Environment Setup

Scripts & Models

Models

Inference Scripts

Demo

Huggingface Space

Usage

Layer Decomposition (main pipeline)

Low-VRAM Users

Preparing the dataset for training (e.g., Live2D Parsing)

User Interface

Training

Community Support

Discussion: Is this Image-to-Live2D?

Changelog

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages