GitHub - Thingi10K/Thingi10K: Thingi10K Dataset

Thingi10K Dataset

Thingi10K is a large-scale 3D dataset created to study the variety, complexity and quality of real-world 3D printing models. We analyzed every mesh of all things featured on Thingiverse.com between Sept. 16, 2009 and Nov. 15, 2015. In this repository, we share our findings with you.

In a nutshell, Thingi10K contains...

10,000 models
4,892 tags
2,011 things
1,083 designers
72 categories
10 open source licenses
7+ years span
99.6% .stl files
50% non-solid
45% with self-intersections
31% with coplanar self-intersections
26% with multiple components
22% non-manifold
16% with degenerate faces
14% non-PWN
11% topologically open
10% non-oriented

Thingi10K is curated by Qingnan Zhou and Alec Jacobson.

Raw dataset

You can download the raw dataset from one of the following mirrors: NYU Box, Hugging Face, Google Drive.

One can also obtain the dataset via the thingi10k Python package. It contains both geometric and contextual data extracted from the raw dataset, and provides a convenient API to access and filter the dataset.

Usage

In addition to the raw dataset, we provide a Python package thingi10k to facilitate easy access to the dataset. The package provides functions to download, filter, and load the dataset.

Installation

pip install thingi10k

Simple usage

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "thingi10k",
# ]
# ///

import thingi10k

thingi10k.init() # Download the dataset and update cache

# Loop through all entries in the dataset
for entry in thingi10k.dataset():
    file_id = entry['file_id']
    author = entry['author']
    license = entry['license']
    vertices, facets = thingi10k.load_file(entry['file_path'])
    # Do something with the vertices and facets

help(thingi10k) # for more information

Filtering the dataset

The thingi10k.dataset() function provides a convenient way to filter the dataset based on various geometric and contextual criteria. The function returns an iterator over the filtered entries. For numeric filters, you can specify ranges using tuples where (min, max) sets both bounds, (None, max) sets only an upper bound, and (min, None) sets only a lower bound. The following are some examples of filtering the dataset:

The example below demonstrates how to iterate over models in the Thingi10K dataset that are closed and have at most 100 vertices.

for entry in thingi10k.dataset(num_vertices=(None, 100), closed=True):
    vertices, facets = thingi10k.load_file(entry['file_path'])

The following example shows how to filter and iterate over models that are licensed under Creative Commons.

for entry in thingi10k.dataset(license='creative commons'):
    vertices, facets = thingi10k.load_file(entry['file_path'])

This example illustrates how to iterate over models that are solid, consist of a single component, and have no self-intersections.

for entry in thingi10k.dataset(num_components=1, self_intersecting=False, solid=True):
    vertices, facets = thingi10k.load_file(entry['file_path'])

Please see help(thingi10k.dataset) for all available filtering options.

Semantic search using CLIP

Thingi10K supports semantic search using open-clip models, allowing one to find 3D models using natural language queries. Please note this is a beta feature, and the results may not be perfect.

Installation

To use semantic search, you need to install the optional CLIP dependencies:

pip install 'thingi10k[clip]'

Quotes are used to avoid shell expansion of the square brackets. This installs the required open_clip and its dependencies.

Basic usage

The semantic search functionality is integrated with the main dataset() function through the query parameter:

import thingi10k

thingi10k.init()

# Find models that look like cute monsters
for entry in thingi10k.dataset(query="A cute monster"):
    vertices, facets = thingi10k.load_file(entry['file_path'])

Note that semantic query can be combined with any other filtering options.

Dataset variants

Thingi10K provides three variants of the dataset: npz, raw, and tetwild.

npz variant contains the geometry (vertex and facet arrays) in NumPy arrays. It is faster to download and no mesh parsing is necessary.
raw variant contains the raw mesh files (STL, OBJ, etc.) in their original format. It is slower to download and requires parsing to extract geometric data.
tetwild variant contains the surface meshes remeshed using TetWild. These are high-quality triangle meshes and stored in NumPy arrays.

By default, thingi10k.init() will download the npz variant. To download the raw variant:

thingi10k.init(variant='raw')

To download the tetwild variant:

thingi10k.init(variant='tetwild')

Caching the dataset

By default, thingi10k.init() will cache the dataset in a local directory. Any subsequent calls to thingi10k.init() will use the cached dataset and incur no additional download cost. The cache directory can be explicitly specified by the user:

thingi10k.init(cache_dir="path/to/.thingi10k")

To force a re-download of the dataset:

thingi10k.init(force_redownload=True)

License

The source code for organizing and filtering the Thingi10K dataset is licensed under the Apache License, Version 2.0. Each "thing" in the dataset has its own license. Please refer to the license field associated with each entry in the dataset.

Errata

The following models are known to be "corrupt." However, we decided to still include them in our dataset in order to faithfully reflect mesh qualities on Thingiverse.

Model 49911 is truncated (ASCII STL).
Model 74463 is empty.
Model 286163 is empty.
Model 81313 contains NURBS curves and surfaces instead of polygonal faces, which may not be supported by many OBJ parsers.
Model 77942 is corrupt (binary STL).

Acknowledgements

This project is funded in part by NSF grants CMMI-11-29917, IIS-14-09286, and IIS-17257.

We thank Marcel Campen, Chelsea Tymms, and Julian Panetta for early feedback and proofreading. We also thank Neil Dickson for pointing out corrupt models, and Nick Sharp for pointing out bugs in the download script. Lastly, we thank Silvia Sellán and Yun-Chun Chen for discussions and suggestions on hosting the dataset.

Cite us

@article{Thingi10K,
  title={Thingi10K: A Dataset of 10,000 3D-Printing Models},
  author={Zhou, Qingnan and Jacobson, Alec},
  journal={arXiv preprint arXiv:1605.04797},
  year={2016}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
scripts		scripts
src/thingi10k		src/thingi10k
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thingi10K Dataset

Raw dataset

Usage

Installation

Simple usage

Filtering the dataset

Semantic search using CLIP

Installation

Basic usage

Dataset variants

Caching the dataset

License

Errata

Acknowledgements

Cite us

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Thingi10K Dataset

Raw dataset

Usage

Installation

Simple usage

Filtering the dataset

Semantic search using CLIP

Installation

Basic usage

Dataset variants

Caching the dataset

License

Errata

Acknowledgements

Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages