Auto-generate JSON Schema for Pyrefly config using schemars (#2448 Task 2)#2473
Auto-generate JSON Schema for Pyrefly config using schemars (#2448 Task 2)#2473Prathamesh-tech-eng wants to merge 8 commits into
Conversation
|
@connernilsen This implements Task 2 from #2448. I had few doubts:
|
|
@Prathamesh-tech-eng sorry for the delay.
I think it would be alright to replace them directly. Is there any specific reason you're thinking of to have them coexist?
Yes! This would be amazing!
The Thanks for working on this, and sorry again for the delay |
connernilsen
left a comment
There was a problem hiding this comment.
Hey @Prathamesh-tech-eng, sorry for the delay in review. I have a few questions on how this works, since I haven't used this before, but I like the look of things so far.
The main question below is about whether we can take advantage of macro settings on fields, variants, and the overall structs instead of manually implementing the schemars::JsonSchema trait, since I'm worried we might forget to update it in the future.
Let me know if you have any questions or want to discuss stuff :)
| } | ||
|
|
||
| #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, Eq, Default, Hash)] | ||
| #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))] |
There was a problem hiding this comment.
Can we enable that as a feature everywhere in the pyrefly_build, pyrefly_config, and pyrefly_python crates?
Here's an example of how we did it for another feature in one of our crates
There was a problem hiding this comment.
Done! The jsonschema feature already propagates through all crates via Cargo.toml:
pyrefly_config → pyrefly_build/jsonschema, pyrefly_python/jsonschema, pyrefly_util/jsonschema
pyrefly_build → pyrefly_python/jsonschema, pyrefly_util/jsonschema
pyrefly_python → pyrefly_util/jsonschema
We use the same #[cfg(feature = "jsonschema")] pattern as in pyrefly_util/src/globs.rs (lines 331-353). Running cargo run -p pyrefly_config --features jsonschema enables the feature across all dependent crates.
There was a problem hiding this comment.
Oh I think I see what you're saying, but I also think I misread what is going on here. Can we remove the cfg_attr(feature = "jsonschema" part? If it's set as an enabled feature in the cargo file, do we also need to gate this here?
| pub repo_root: Option<PathBuf>, | ||
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] |
There was a problem hiding this comment.
What are all of the extra non-derived schemas adding?
For cases where we need custom behavior for fields, can we instead use #[schemars(schema_with = "some::function")] (override specific fields), #[schemars(transform = some::transform)] (mutate the schema after auto generating it), or other macros instead of implementing schemars::JsonSchema completely for some of the structs below?
There was a problem hiding this comment.
Refactored CustomQueryArgs to use schemars derive macros as suggested. The 60+ line manual implementation is now replaced with:
#[derive(Debug, Clone, ...)]
#[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))]
pub struct CustomQueryArgs {
#[cfg_attr(feature = "jsonschema", schemars(schema_with = "vec1_string_schema"))]
pub command: Vec1,
// ...
}
A small helper function vec1_string_schema generates the array schema with minItems: 1. This is much more maintainable and the descriptions come from doc comments automatically.
There was a problem hiding this comment.
Thanks for fixing that schema generator! It looks much cleaner and will be a lot easier to maintain in the future.
Would you also be able to update the other custom schema generators to attempt to move as much logic as possible into those configuration macros?
b88cabf to
c1c5406
Compare
|
Thanks for the review! I've pushed the changes addressing all feedback. For the next step (Task 3 - SchemaStore submission), I have a few questions:
|
c1c5406 to
5b4d343
Compare
|
Awesome, thanks for the updates.
Yeah, I think waiting until we nail down the generation logic is the best approach. It'll help us reduce the likelihood of errors in any schema we push.
That might be a good idea. Probably something like
I think a lot of the recommendations here would be good to follow. Maybe some validation checks here. Other than that, I think once this is committed we should be good to start working on a submission to schemastore.
We should only match |
connernilsen
left a comment
There was a problem hiding this comment.
I love the direction this is going in. Thanks for working on it, and as always, let me know if you have questions
| } | ||
|
|
||
| #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, Eq, Default, Hash)] | ||
| #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))] |
There was a problem hiding this comment.
Oh I think I see what you're saying, but I also think I misread what is going on here. Can we remove the cfg_attr(feature = "jsonschema" part? If it's set as an enabled feature in the cargo file, do we also need to gate this here?
| pub repo_root: Option<PathBuf>, | ||
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] |
There was a problem hiding this comment.
Thanks for fixing that schema generator! It looks much cleaner and will be a lot easier to maintain in the future.
Would you also be able to update the other custom schema generators to attempt to move as much logic as possible into those configuration macros?
| pub(crate) struct ExtraConfigs(pub(crate) Table); | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for ExtraConfigs { |
There was a problem hiding this comment.
Is it possible to get schemars to generate something that says this field shouldn't exist? It might just be that we have to mark it as something schemars should skip
There was a problem hiding this comment.
I considered using #[schemars(skip)] but kept the current manual implementation because it more accurately represents the actual behavior.
ExtraConfigs is a #[serde(flatten)] catch-all that captures any unknown fields via toml::Table. The current manual impl generates an object schema with additionalProperties: true, which correctly tells schema validators that extra properties are allowed.
Using #[schemars(skip)] would hide this field from the schema entirely, suggesting only explicitly defined properties are valid. This would cause false validation errors for users with custom/future config fields.
We could use #[schemars(skip)] and rely on the parent schema's additionalProperties: true setting. Let me know if you'd prefer that approach.
There was a problem hiding this comment.
Yeah, I think if that correctly shows the current behavior, that would be great. The less manual implementation we have here the better.
| @@ -1,266 +1,117 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
It looks like the changes to this file are getting rid of tests that validate schemars should fail when given invalid values. Can we add those back?
There was a problem hiding this comment.
Done! I've restored the full test suite with all 23 negative test cases from upstream/main.
Note: 3 tests are commented out with an explanation - they validate mutual exclusivity between build-system types (buck shouldn't have command, custom shouldn't have extras/isolation-dir). The auto-generated schema doesn't include additionalProperties: false in the oneOf variants, so these specific constraints can't be enforced. The comments document this known limitation.
There was a problem hiding this comment.
Oh good point. In that case, we can delete the extra tests instead of just commenting them out.
- Extract configBase into reusable subschema - Flatten configBase properties into top-level schema - Reuse configBase in sub-config via - Fix build-system: move repo_root to all types, add not conditions - Simplify pyproject-tool-pyrefly.json to reference pyrefly.json - All tests passing
…ow additionalProperties, make python-platform non-exhaustive
- errors: accept both boolean and severity strings (ignore/info/warn/error) - Add missing configBase fields: tensor-shapes, recursion-depth-limit, recursion-overflow-handler - Add 'ty' to enabled-ignores enum - Fix project-includes default to include **/*.ipynb - Add baseline field (top-level) - Remove non-existent repo_root from build-system - Add ignore-if-build-system-missing and search-path-prefix to build-system - Fix pyproject-tool-pyrefly.json to use relative path - Update test files to exercise new fields and severity strings
…#2448 Task 2) Add schemars 0.8 as an optional dependency behind a `jsonschema` feature flag to pyrefly_util, pyrefly_python, pyrefly_build, and pyrefly_config. Add JsonSchema implementations for all config-related types: - Derive-based: Tool, Severity, ErrorKind, UntypedDefBehavior, RecursionOverflowHandler, BxlArgs, BuildSystemArgs, BuildSystem - Manual impls (custom serde / external types): Glob, Globs, PythonVersion, PythonPlatform, CustomQueryArgs, ErrorDisplayConfig, ModuleWildcard, ConfigOrigin<T>, ExtraConfigs, ConfigBase, PythonEnvironment, Interpreters, SubConfig, ConfigFile Add generate_schema binary that outputs Draft-07 JSON Schema to stdout: cargo run -p pyrefly_config --features jsonschema --bin generate_schema All schema code is gated with #[cfg(feature = \"jsonschema\")] for zero impact on normal builds."
- Simplify .gitignore: Remove complex schemas/* pattern with exceptions. This makes it easier to add new schemas in the future without needing to modify .gitignore. - Add documentation for schema settings: Document option_nullable and option_add_null_type in generate_schema.rs explaining their purpose and why we set them to false. - Refactor CustomQueryArgs: Replace 60+ line manual JsonSchema impl with schemars derive macro using #[schemars(schema_with = "...")] for the Vec1<String> field. This is more maintainable and less error-prone. - Regenerate schema: Update pyrefly.json to reflect upstream changes from rebase.
5b4d343 to
3366229
Compare
… fix schema - Move vec1_string_schema to pyrefly_util::schema_helpers for reusability - Restore full validate_schemas.py test suite with 23 negative test cases (3 tests skipped due to schemars not supporting additionalProperties:false) - Add missing strict-callable-subtyping field to ConfigBase schema - Use referencing library for proper $ref resolution in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… fix schema Move vec1_string_schema to pyrefly_util::schema_helpers for reusability Restore full validate_schemas.py test suite with 23 negative test cases (3 tests skipped due to schemars not supporting additionalProperties:false) Add missing strict-callable-subtyping field to ConfigBase schema Use referencing library for proper $ref resolution in tests
3366229 to
d6115d3
Compare
|
Thanks for the review! I've addressed all the feedback: I kept the manual implementation to accurately represent Also fixed a bug I discovered during the review: the |
|
This pull request has been automatically marked as stale because it has not had recent activity for more than 2 weeks. If you are still working on this this pull request, please add a comment or push new commits to keep it active. Otherwise, please unassign yourself and allow someone else to take over. Thank you for your contributions! |
connernilsen
left a comment
There was a problem hiding this comment.
Hey, thanks for continuing to work on this, and sorry for the delay. We're busy getting ready for our V1 release, so things are hectic right now.
Things are progressing well! For your question about ExtraConfigs,
I kept the manual implementation to accurately represent
additionalProperties: true. Would you prefer me to use#[schemars(skip)]instead? Both approaches work, but have different schema output.
I think if additionalProperties: true and #[schemars(skip)] get Pyrefly to allow additional properties for now but hide the extra-configs value, that would be perfect. If not, feel free to leave the custom implementation for ExtraConfigs.
Also, generally, we want to keep the manual schema generation to the absolute minimum we can. I expect config stuff to change over time, so I'd like to avoid a case where we're doing something wrong because someone forgot to update the schema generator when changing the struct. For anything (including possibly ExtraConfigs) that you leave custom schema generation for, can you put a comment detailing why this is so that future editors/reviewers understand why things are the way they are now? And for anything that doesn't have a great reason, can you derive #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))]? If it's just a few fields that are making it difficult, prefer to use schemars(schema_with = <function>) like we did for Vec1 types instead of defining schema generation for the whole type.
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for PythonEnvironment { |
There was a problem hiding this comment.
Is there a reason we can't generate most of this? It seems like as long as generators for the subschemas exist, we can just derive #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))] for `PythonEnvironment.
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for Interpreters { |
There was a problem hiding this comment.
Same here, can we derive #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))] on Interpreters?
| use schemars::schema::*; | ||
|
|
||
| let mut properties = schemars::Map::new(); | ||
| let required: std::collections::BTreeSet<String> = std::collections::BTreeSet::new(); |
There was a problem hiding this comment.
I don't think this is being used. Could you double check on what this was supposed to do?
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for ConfigBase { |
There was a problem hiding this comment.
Is it possible to derive this with #[cfg_attr(feature = "jsonschema", derive(schemars::JsonSchema))]?
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for SubConfig { |
There was a problem hiding this comment.
Could we try flattening here with the #[serde(flatten)] value macro? If so, I'm hoping we can get rid of the manual generator here.
| fn json_schema(generator: &mut schemars::r#gen::SchemaGenerator) -> schemars::schema::Schema { | ||
| use schemars::schema::*; | ||
|
|
||
| // Each error code maps to either a bool or a severity string |
There was a problem hiding this comment.
Does the default derivation for severity enum not automatically allow for one_of a severity string or bool?
| } | ||
|
|
||
| #[cfg(feature = "jsonschema")] | ||
| impl schemars::JsonSchema for ModuleWildcard { |
There was a problem hiding this comment.
Can we derive this too?
|
|
||
| /// ConfigOrigin is transparent in schema — it serializes as the inner type T. | ||
| #[cfg(feature = "jsonschema")] | ||
| impl<T: schemars::JsonSchema> schemars::JsonSchema for ConfigOrigin<T> { |
There was a problem hiding this comment.
This one is somewhat interesting, makes sense why we need a manual implementation here :)
| ..Default::default() | ||
| })), | ||
| string: Some(Box::new(schemars::schema::StringValidation { | ||
| pattern: Some(r"^\d+(\.\d+)?(\.\d+)?$".to_owned()), |
There was a problem hiding this comment.
Can we derive and use [schemars(regex...)] for this? https://docs.rs/schemars/latest/schemars/derive.JsonSchema.html#regex
| .to_owned(), | ||
| ), | ||
| default: Some(serde_json::json!("linux")), | ||
| examples: vec![ |
There was a problem hiding this comment.
Is there a way to derive all of this, specify that we prefer linux, darwin, or win32, but take anything here?
|
This pull request has been automatically marked as stale because it has not had recent activity for more than 2 weeks. If you are still working on this this pull request, please add a comment or push new commits to keep it active. Otherwise, please unassign yourself and allow someone else to take over. Thank you for your contributions! |
Summary
Implements Task 2 from #2448: auto-generate the Pyrefly JSON Schema from Rust config types using the
schemarscrate, replacing the need to manually maintain schemas/pyrefly.json.Changes
Usage
cargo run -p pyrefly_config --features jsonschema --bin generate_schema > schemas/pyrefly.json