Easily create high quality dataset descriptions – with a little help from ✨ AI.
Contents
# Clone the repository
git clone https://github.com/statistikZH/ogd_ai-metafairy.git
cd ogd_ai-metafairy
# Install uv and dependencies
pip3 install uv
uv venv
source .venv/bin/activate
uv sync- Create an
.envfile and input your OpenAI API keys like so:
OPENAI_API_KEY=sk-...
- Change into the app directory:
cd _streamlit_app/ - Start the app:
streamlit run metafairy.py
This app simplifies the creation of meaningful, complete, and well-written dataset descriptions.
- Analyze: Copy an existing description into the input window and click «Beschreibung analysieren».
- Create: Enter keywords and basic information about your dataset and click «Beschreibung generieren».
We offer this tool to our data publishers and stewards, and believe it can be helpful for others as well.
The app structures the analysis and the drafts along these four key points:
- Data Content (Dateninhalt) - What is the data about? What can be found in this data?
- Context of Creation (Entstehungszusammenhang) - How were the data measured and for what purpose? What is the source?
- Data Quality (Datenqualität) - Are the data complete? Are there any changes in the collection? What conclusions can and can not be drawn from the data?
- Spatial Reference (Räumlicher Bezug) - How are the data spatially collected and aggregated? In which area are the data points located?
Important
Use of the LLM-based analysis code results in data being sent to third-party model providers through OpenRouter, which brokers requests to multiple LLM services. Do not submit sensitive or confidential data.
Important
LLMs make errors. This app provides suggestions only and yields a draft analysis that you should always double-check.
- Metafairy provides a valuable scaffold for writing good data descriptions. Data stewards use the structure more than the generated text itself.
- Generating descriptions is fun. 🤓
- AI improving existing descriptions is more useful than generating new ones. We implemented this feature upon request.
Laure Stadler, Chantal Amrhein, Patrick Arnecke – Statistisches Amt Zürich: Team Data
Many thanks also go to Corinna Grobe and our former colleague Adrian Rupp.
We would love to hear from you. Please share your feedback and let us know how you use the code. You can write an email or share your ideas by opening an issue or pull request.
Please note that we use Ruff for linting and code formatting with default settings.
This software (the Software) incorporates models (Models) from Google and others and has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software as well as of the underlying Models complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.
