Skip to content

Fix/issue 356 handling excel headers#359

Merged
Edwardvaneechoud merged 4 commits intomainfrom
fix/issue-356-handling-excel-headers
Mar 19, 2026
Merged

Fix/issue 356 handling excel headers#359
Edwardvaneechoud merged 4 commits intomainfrom
fix/issue-356-handling-excel-headers

Conversation

@Edwardvaneechoud
Copy link
Copy Markdown
Owner

This pull request introduces several improvements and fixes to the Excel file reading logic in both the core and worker modules, with a focus on handling headers, row selection, and column naming for various engines. It also adds new and updated tests to ensure correct behavior, especially for edge cases involving header rows and date-typed columns.

Excel reading logic improvements:

  • Refactored the way header row and has_header options are passed to the pl.read_excel function when using the xlsx2csv engine, ensuring that has_header is passed directly as a top-level argument rather than within read_options. This change is applied to both flowfile_core and flowfile_worker modules. [1] [2]
  • Improved handling of column names when reading Excel files with the calamine engine by ensuring columns are uniquely named and, in the core module, by using a custom header row extraction via raw_data_openpyxl. [1] [2]

Testing enhancements:

  • Updated all Excel-related tests to use a consistent SUPPORT_FILES path variable, and standardized string formatting and parameter usage for better readability and maintainability. [1] [2]
  • Added new tests to cover edge cases: reading Excel files where the second row contains date types, and verifying column naming when headers are absent.

Dependency management:

  • Added xlsx2csv as a new dependency in pyproject.toml to support the CSV-based Excel reading engine.

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 18, 2026

Deploy Preview for flowfile-wasm canceled.

Name Link
🔨 Latest commit 5270c61
🔍 Latest deploy log https://app.netlify.com/projects/flowfile-wasm/deploys/69bae243055e6f0009a52276

@Edwardvaneechoud Edwardvaneechoud merged commit e2b1201 into main Mar 19, 2026
22 checks passed
@Edwardvaneechoud Edwardvaneechoud deleted the fix/issue-356-handling-excel-headers branch March 19, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant