Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

resolves #2157

to have a fallback if an sz file is not a snappy file. similar to Config::io_reader.

Also, make is_valid_snappy_file public so we can use it from config.rs
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances snappy compression detection and handling by implementing robust validation and graceful fallback mechanisms. It resolves issue #2157 where plain CSV files incorrectly named with .sz extensions would cause "corrupt input" errors. The implementation validates snappy-compressed files before attempting decompression and falls back to treating them as plain files when validation fails.

Key Changes:

  • Added validation logic to verify snappy-compressed files before decompression, preventing corrupt input errors
  • Implemented case-insensitive .sz extension detection for cross-platform robustness
  • Refactored file extension detection to use native Path methods instead of string manipulation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/test_snappy.rs Added comprehensive tests for snappy fallback behavior, case-insensitive extension handling, and validation prevention of corrupt errors
src/util.rs Made is_valid_snappy_file public; enhanced decompress_snappy_file with extensive documentation and fallback logic to copy plain files instead of failing
src/config.rs Refactored extension detection to use Path::extension() methods; added is_snappy_extension helper function; integrated validation into io_reader; removed redundant file_extension variable usage; added unit tests for case-insensitive extension detection

@jqnatividad jqnatividad merged commit e0d2cff into master Dec 6, 2025
22 of 23 checks passed
@jqnatividad jqnatividad deleted the 2157-fix-snappy-detection branch December 6, 2025 19:32
@tmtmtmtm
Copy link
Contributor

tmtmtmtm commented Dec 6, 2025

@jqnatividad apologies if I'm simply overlooking something here, but AFAICS all the tests are are around a files with a ".sz" extension (and the PR overview from copilot also suggests that was the issue). But the original issue here wasn't about a file with a .sz extention, it was a file with a ".KYpPcb8esz" extension. In practice your checks for something really being snappy no matter what the extension should make that distinction irrelevant, but for completeness it might be worth also adding a straightforward test that something called "data.esz" doesn't even trigger an assumption of snappy-ness that needs further investigation.

jqnatividad added a commit that referenced this pull request Dec 6, 2025
@jqnatividad
Copy link
Collaborator Author

Thanks for following up @tmtmtmtm .

I just added tests to explicitly check for sz false positive matches per your observation above -

0fd558e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intermittent "snappy: corrupt input" with plain CSVs

3 participants