Skip to content

Auto-generate JSON Schema for Pyrefly config using schemars (#2448 Task 2)#2473

Open
Prathamesh-tech-eng wants to merge 10 commits intofacebook:mainfrom
Prathamesh-tech-eng:autogenerate-schema-schemars
Open

Auto-generate JSON Schema for Pyrefly config using schemars (#2448 Task 2)#2473
Prathamesh-tech-eng wants to merge 10 commits intofacebook:mainfrom
Prathamesh-tech-eng:autogenerate-schema-schemars

Conversation

@Prathamesh-tech-eng
Copy link
Contributor

Summary

Implements Task 2 from #2448: auto-generate the Pyrefly JSON Schema from Rust config types using the schemars crate, replacing the need to manually maintain schemas/pyrefly.json.

Changes

  • Add schemars = "0.8" as an optional dependency behind a jsonschema feature flag to 4 crates: pyrefly_util, pyrefly_python, pyrefly_build, pyrefly_config
  • Add JsonSchema implementations for all config-related types:
    • Derive-based : Tool, Severity, ErrorKind, UntypedDefBehavior, RecursionOverflowHandler, BxlArgs, BuildSystemArgs, BuildSystem
    • Manual impls (for types with custom serde or external types without JsonSchema): Glob, Globs, PythonVersion, PythonPlatform, CustomQueryArgs, ErrorDisplayConfig, ModuleWildcard, ConfigOrigin, ExtraConfigs, ConfigBase, PythonEnvironment, Interpreters, SubConfig, ConfigFile
  • Add generate_schema binary in pyrefly_config that outputs Draft-07 JSON Schema to stdout
  • All schema code is gated with #[cfg(feature = "jsonschema")] , zero impact on normal builds

Usage

cargo run -p pyrefly_config --features jsonschema --bin generate_schema > schemas/pyrefly.json

- Extract configBase into reusable  subschema
- Flatten configBase properties into top-level schema
- Reuse configBase in sub-config via
- Fix build-system: move repo_root to all types, add not conditions
- Simplify pyproject-tool-pyrefly.json to reference pyrefly.json
- All tests passing
…ow additionalProperties, make python-platform non-exhaustive
- errors: accept both boolean and severity strings (ignore/info/warn/error)
- Add missing configBase fields: tensor-shapes, recursion-depth-limit, recursion-overflow-handler
- Add 'ty' to enabled-ignores enum
- Fix project-includes default to include **/*.ipynb
- Add baseline field (top-level)
- Remove non-existent repo_root from build-system
- Add ignore-if-build-system-missing and search-path-prefix to build-system
- Fix pyproject-tool-pyrefly.json  to use relative path
- Update test files to exercise new fields and severity strings
…#2448 Task 2)

Add schemars 0.8 as an optional dependency behind a `jsonschema` feature
flag to pyrefly_util, pyrefly_python, pyrefly_build, and pyrefly_config.

Add JsonSchema implementations for all config-related types:
- Derive-based: Tool, Severity, ErrorKind, UntypedDefBehavior,
  RecursionOverflowHandler, BxlArgs, BuildSystemArgs, BuildSystem
- Manual impls (custom serde / external types): Glob, Globs,
  PythonVersion, PythonPlatform, CustomQueryArgs, ErrorDisplayConfig,
  ModuleWildcard, ConfigOrigin<T>, ExtraConfigs, ConfigBase,
  PythonEnvironment, Interpreters, SubConfig, ConfigFile

Add generate_schema binary that outputs Draft-07 JSON Schema to stdout:
  cargo run -p pyrefly_config --features jsonschema --bin generate_schema

All schema code is gated with #[cfg(feature = \"jsonschema\")] for zero
impact on normal builds."
@meta-cla meta-cla bot added the cla signed label Feb 20, 2026
@Prathamesh-tech-eng
Copy link
Contributor Author

@connernilsen This implements Task 2 from #2448. I had few doubts:

  1. Should the generated schema replace the existing schemas/pyrefly.json directly, or should they coexist during a transition period?
  2. Would you like me to add a CI step that runs the generator and checks the output matches the checked-in schema (to catch drift)?
  3. Cargo.toml files: I noticed the # @generated by autocargo comment , will Meta's internal tooling need to be updated separately for these dependency additions, or is my change sufficient?

@connernilsen
Copy link
Contributor

connernilsen commented Mar 9, 2026

@Prathamesh-tech-eng sorry for the delay.

  1. Should the generated schema replace the existing schemas/pyrefly.json directly, or should they coexist during a transition period?

I think it would be alright to replace them directly. Is there any specific reason you're thinking of to have them coexist?

  1. Would you like me to add a CI step that runs the generator and checks the output matches the checked-in schema (to catch drift)?

Yes! This would be amazing!

  1. Cargo.toml files: I noticed the # @generated by autocargo comment , will Meta's internal tooling need to be updated separately for these dependency additions, or is my change sufficient?

The @generated should be fine to leave in there. Our internal tooling handles that comment nicely right now :)

Thanks for working on this, and sorry again for the delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants