Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

resolves #3099
resolves #3100

…me naming convention as JSONschema

- also simplified nested if chain for check args.flag_strict_formats and uses std::string::String::as_str directly to make clippy happy
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request standardizes the Polars schema file naming convention and adds an --output option to the schema command. Previously, Polars schema files used only the file prefix/stem (e.g., data.csvdata.pschema.json), but now they append .pschema.json to the full filename including extension (e.g., data.csvdata.csv.pschema.json, data.tsv.gzdata.tsv.gz.pschema.json). This makes the association between data files and their schema files more explicit and handles complex filenames (like compressed files) more consistently.

Key changes:

  • Added --output flag to the schema command allowing users to specify custom output filenames for generated schema files
  • Changed Polars schema file naming from using file prefix to appending .pschema.json to the full filename
  • Updated all commands that read or write Polars schema files (sqlp, joinp, pivotp, and internal utilities) to use the new naming convention

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/cmd/schema.rs Added --output flag for custom schema output filenames; updated Polars schema creation to use new naming convention; refactored nested if-statements to use let-chains for cleaner code
src/util.rs Updated load_schema_from_file() to append .pschema.json to full filename instead of using file prefix; added flag_output field to SchemaArgs; optimized unwrap_or to unwrap_or_else for lazy string creation
src/cmd/sqlp.rs Updated all Polars schema file path constructions to use new naming convention with PathBuf::from(format!("{}.pschema.json", ...))
src/cmd/joinp.rs Updated schema file path handling and SchemaArgs initialization to include flag_output: None
src/cmd/pivotp.rs Updated schema file path construction to use new naming convention; added flag_output field to SchemaArgs initializations
src/cmd/tojsonl.rs Added flag_output: None to SchemaArgs initialization
src/cmd/frequency.rs Added flag_output: None to SchemaArgs initialization
src/cmd/diff.rs Added flag_output: None to SchemaArgs initializations
src/cmd/sample.rs Added flag_output: None to SchemaArgs initialization
tests/test_sqlp.rs Updated test assertions to expect schema files with new naming convention (boston311-100.csv.pschema.json)
tests/test_slice.rs Updated all test schema file references to use new naming; added logic to copy schema files for compressed file variants
tests/test_joinp.rs Updated test assertions to check for schema files using new naming convention

@jqnatividad jqnatividad merged commit 9ea2b8b into master Nov 26, 2025
23 checks passed
@jqnatividad jqnatividad deleted the 3099-n-3100-schema-output-option-and-standardize-pschema branch November 26, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

schema: add --output option schema: polars default output filename

2 participants