dank-data

dank-data is AgentDank's public dataset repository. We take public cannabis datasets, openly clean them, and publish them for public benefit.

This is the one of the data sources for AgentDank. You can connect with it yourself using the AgentDank MCP server named dank-mcp].

The data is sourced and cleaned using dank-extract, our CLI tool for fetching, cleaning, and exporting cannabis data.

Snapshots
Data Cleaning
Contribution and Conduct
Credits and License

Currently the following datasets are snapshotted:

Snapshots

Data snapshots are stored in snapshots/. CSV and JSON files are stored uncompressed for efficient Git delta compression. Only DuckDB files are compressed with ZStandard (.zst).

Snapshots are updated weekly at 4:20 PM Pacific via GitHub Actions. Git history provides access to previous snapshots.

Download

# Download current data
curl -LO "https://github.com/AgentDank/dank-data/raw/main/snapshots/us/ct/dank-data.duckdb.zst"
zstd -d dank-data.duckdb.zst

# Or get CSV directly (no decompression needed)
curl -LO "https://github.com/AgentDank/dank-data/raw/main/snapshots/us/ct/us_ct_brands.csv"

Or get JSON directly (no decompression needed)

curl -LO "https://github.com/AgentDank/dank-data/raw/main/snapshots/us/ct/us_ct_brands.json"


### Historical Data

Git history is your time machine:

```bash
git clone https://github.com/AgentDank/dank-data.git
cd dank-data

# View snapshot history
git log --oneline -- snapshots/

# Get data from a specific date
git checkout $(git rev-list -n1 --before="2026-01-01" main) -- snapshots/

Data Cleaning

Because the upstream datasets are not perfect, we have to "clean" them. Such practices are opinionated, but since this is Open Source, you can inspect how we clean the data by examining dank-extract.

Generally, we simply remove weird characters. We treat detected "trace" amounts as 0 -- and even with that, there's a decision about what field entry actually means "trace".

We also remove rows with ridiculous data -- as much as I'd love a 90,385% THC product, I don't think it really exists. In that case, it was an incorrect decimal point (by looking at the label picture), but not every picture validates every error. It's also a pain -- but maybe a multi-modal vision LLM can help with that?

Contribution and Conduct

As AgentDank curates these datasets, pull requests are generally not welcome here. If you want to contribute, please create an issue.

Either way, obey our Code of Conduct.

Credits and License

Released under the MIT License, see LICENSE.txt.

Made with 🌿 and 🔥 by the team behind AgentDank.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
scripts		scripts
snapshots/us/ct		snapshots/us/ct
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dank-data

Snapshots

Download

Or get JSON directly (no decompression needed)

Data Cleaning

Contribution and Conduct

Credits and License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dank-data

Snapshots

Download

Or get JSON directly (no decompression needed)

Data Cleaning

Contribution and Conduct

Credits and License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages