Thank you for considering contributing to the dataprep project! This document explains how to contribute. We welcome all forms of contributions, including code contributions, documentation improvements, bug reports, and feature suggestions.
- Gleam 1.15 or later
- Erlang/OTP 27 or later
- just (task runner)
- mise (recommended for managing Gleam and Erlang versions)
git clone https://github.com/nao1215/dataprep.git
cd dataprepmise install # install Gleam and Erlang
gleam deps downloadjust cimainbranch is the latest stable version- Create new branches from
mainfor new features or bug fixes - Branch naming examples:
feature/add-float-rules- New featurefix/issue-123- Bug fixdocs/update-readme- Documentation update
This project follows these standards:
- Follow the Gleam language guide
- Keep the public API surface small -- use
pub opaque typewhere appropriate - Pure functions first -- no actors or OTP in this library
- Keep functions as small as possible
- Add doc comments to all public functions and types
- Respect the two-phase design -- Prep transforms, Validator checks, they do not mix
Tests are organized by module, mirroring the source structure.
- Test pure functions first, then combinators
- Short-circuit behavior must be verified with
panicsentinels - Boundary conditions matter -- test empty strings, whitespace, zero, negative values
just test # run all tests
just check # format check, typecheck, build, testThe following must not be added to this library:
- Domain-specific rules (email, URL, UUID, phone number)
- Parsing / decoding (String -> Int, JSON decoding)
- Schema abstraction or string-based DSLs
- Prep-Validator fusion (this would break the Validator invariant)
See doc/reference/DESIGN.md section 10 for the full rationale.
We actively encourage the use of AI coding assistants to improve productivity and code quality. Tools like Claude Code, GitHub Copilot, and Cursor are welcome for:
- Writing boilerplate code
- Generating comprehensive test cases
- Improving documentation
- Refactoring existing code
- Review all generated code: Always review and understand AI-generated code before committing
- Maintain consistency: Ensure AI-generated code follows our coding standards in CLAUDE.md
- Test thoroughly: AI-generated code must pass
just ci
-
Check or Create Issues
- Check if there are existing issues
- For major changes, discuss the approach in an issue first
-
Write Tests
- Always add tests for new features
- For bug fixes, create tests that reproduce the bug
-
Quality Check
just ci
- Create a Pull Request from your forked repository to the main repository
- PR title should briefly describe the changes
- Include the following in PR description:
- Purpose and content of changes
- Related issue number (if any)
- Test method
GitHub Actions automatically checks the following items:
- Format check:
gleam format --check - Lint:
gleam build --warnings-as-errors - Build:
gleam build - Test:
gleam test
Merging is not possible unless all checks pass.
When you find a bug, please create an issue with the following information:
-
Environment Information
- OS and version
- Gleam version
- Erlang/OTP version
- dataprep version
-
Reproduction Steps
- Minimal code example to reproduce the bug
-
Expected and Actual Behavior
-
Error Messages or Stack Traces (if any)
The following activities are also greatly welcomed:
- Give a GitHub Star: Show your interest in the project
- Promote the Project: Introduce it in blogs, social media, study groups, etc.
- Become a GitHub Sponsor: Support available at https://github.com/sponsors/nao1215
- Documentation Improvements: Fix typos, improve clarity of explanations
- Feature Suggestions: Share new combinator ideas in issues
Contributions to this project are considered to be released under the project's license (MIT License).
Thank you again for considering contributing! We sincerely look forward to your participation.