-
Notifications
You must be signed in to change notification settings - Fork 3
Add cell specification #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changed so that the starting cell can be specified. The specified format is "fileName::sheetName.Cell". ex: "book.xlsx::Sheet1.B4" If cell is specified, a blank column indicates the end of the column.
WalkthroughThe updates focus on enhancing the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (2)
- cmd/query.go (1 hunks)
- reader.go (3 hunks)
Files skipped from review due to trivial changes (1)
- cmd/query.go
Additional comments: 10
reader.go (10)
6-12: Imports are correctly organized and there are no apparent unused imports, which is good for maintainability.
31-53: The error handling is consistent and follows Go's idiomatic approach of returning early when an error is encountered.
34-49: The
parseExtendfunction is used to extract the sheet and cell information from theopts.InJQueryoption. It's important to ensure that theparseExtendfunction is robust and handles all expected formats correctly.43-48: The decrement of
cellXafter callingexcelize.CellNameToCoordinatessuggests that the coordinates are being adjusted to a zero-based index. This is a common source of off-by-one errors, so it's important to ensure that this adjustment is consistent throughout the code.70-179: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [65-77]
The loop to determine the maximum number of columns (
columnNum) and to handle theopts.InPreReadoption is clear. However, it's important to ensure thatopts.InPreReadis being set and used as intended, especially since it influences when the loop breaks.
90-99: The
nameTypefunction is used to extract column names and types. It's good to see that unnamed columns are given a default name based on their index. This improves the readability of the resulting data.116-137: The logic to trim the columns based on the
dataFlagarray is a bit complex. It's important to ensure that this logic correctly identifies the end of the column range when a blank column is encountered. This is a key part of the new feature and should be thoroughly tested.144-153: The
parseExtendfunction seems to handle different cases of the extended cell information format. It's important to ensure that this function has been tested with various input formats to verify its correctness.155-174: The
nameTypefunction appears to handle the assignment of column names and types correctly. It's important to ensure that theheaderboolean flag is being used as intended and that the function behaves correctly when headers are present or absent.177-179: The
getSheetfunction is responsible for retrieving the correct sheet from the XLSX file. It's important to ensure that the errorErrSheetNotFoundis handled appropriately in the calling code.Overall, the changes introduced in this pull request seem to align with the stated goal of enhancing the XLSX file reader to allow users to specify a starting cell. It's crucial to ensure that all new functions and logic are thoroughly tested, especially since they deal with user input and file parsing, which are common areas for bugs to arise.
TableName was deleted because it was not actually defined. Clarified variable names.
A feature of xlsxsql is that it can JOIN multiple tables in one sheet. Let this be the image of xlsxsql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- docs/xlsx-multi.png
Files selected for processing (4)
- README.md (3 hunks)
- cmd/query.go (1 hunks)
- cmd/root.go (2 hunks)
- reader.go (4 hunks)
Files skipped from review due to trivial changes (1)
- cmd/query.go
Additional comments: 15
cmd/root.go (1)
- 46-54: The introduction of the
Debugglobal variable is a good practice for managing debug mode state. However, ensure that this variable is used consistently across the codebase and that debug mode does not expose sensitive information or alter the program's behavior in a way that could lead to security vulnerabilities.reader.go (10)
6-12: The import block is correctly formatted and organized. The packages
trdsqlandexcelize/v2are imported for use in the file, which is consistent with the file's purpose.17-25: The
XLSXReaderstruct is well-defined with slices for column names, types, and the body of the data. This struct will hold the data read from the XLSX file.31-49: The error handling after attempting to open the XLSX file is correct. The
parseExtendfunction is used to extract extended cell information, and thegetSheetfunction is used to retrieve the correct sheet. The logic to convert a cell name to coordinates and decrementcellXby one to align with zero-based indexing is correct.51-63: The error handling for
f.GetRows(sheet)is correct. The logic to determine the number of rows to skip based on thecellYcoordinate or theInSkipoption is correct. However, there is a potential issue if bothcellYandInSkipare set, asInSkipwill overridecellY. This may be intentional, but it's worth verifying that this is the desired behavior.70-87: The loop to determine the maximum number of columns (
columnNum) and to handle the header row is correct. The check foropts.InPreReadto break out of the loop early is a good performance optimization. The logic to adjust the header row index based on the presence of a header in the options is also correct.90-124: The logic to extract column names and types is well-implemented. The
filterColumnsfunction is correctly used to filter out columns based on thevalidColumnsflags. The handling of theisFilterflag to decide whether to filter columns is correct. The slices forr.namesandr.typesare correctly resized to match the length of the first row inr.body.127-148: The
filterColumnsfunction is well-implemented, with a clear start flag to determine when to start counting valid columns and a break condition when an invalid column is encountered after a valid one. This logic assumes that once a blank column is encountered, all subsequent columns are invalid, which matches the pull request summary.150-158: The
parseExtendfunction correctly handles the case where the extended query might contain more than one dot, joining the parts after the first dot as the cell reference. This allows for sheet names with dots in them.161-180: The
nameTypefunction correctly handles the creation of column names and types, with a fallback to a default name format if a name is not provided or is a duplicate. The use of a map to track unique names is a good approach to avoid duplicates. The default type is set to "text", which is a safe default for unknown column types.183-185: The
getSheetfunction correctly handles the case where no sheets are found and when no specific sheet name is provided. The loop to find the correct sheet by name is also correct.README.md (4)
7-11: The example provided is clear and demonstrates the new feature effectively. It shows how to specify a starting cell for the SQL query and how to perform a join operation between two ranges within the same Excel file.
20-21: This line provides useful context about the underlying libraries used by
xlsxsql, which is helpful for users who might be interested in the technical details or who may want to contribute to the project.end hunk 0
<<<<<<< start hunk 1
123-137: The explanation of how to specify a cell is clear and concise. It's good that the documentation includes a note about the behavior when a cell is specified, as this could be a source of confusion for users. The note clarifies that specifying a cell will cause the tool to interpret a blank column as the end of the column range, which is a key part of the new feature.
146-150: The shorthand designation section is a good addition, providing users with a quicker way to display sheet contents. It's important to ensure that the documentation is updated if the syntax or functionality of this command changes in the future.
end hunk 1
<<<<<<< start hunk 2
|
|
||
| func init() { | ||
| rootCmd.PersistentFlags().BoolVarP(&Ver, "version", "v", false, "display version information") | ||
| rootCmd.PersistentFlags().BoolVarP(&Debug, "debug", "", false, "debug mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag for enabling debug mode is missing a shorthand. Typically, command-line tools provide a single-letter shorthand for each flag. Consider adding a shorthand for the --debug flag for consistency and ease of use.
- rootCmd.PersistentFlags().BoolVarP(&Debug, "debug", "", false, "debug mode")
+ rootCmd.PersistentFlags().BoolVarP(&Debug, "debug", "d", false, "debug mode")Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
| rootCmd.PersistentFlags().BoolVarP(&Debug, "debug", "", false, "debug mode") | |
| rootCmd.PersistentFlags().BoolVarP(&Debug, "debug", "d", false, "debug mode") |
| ```console | ||
| xlsxsql query -H -o md \ | ||
| "SELECT a.id,a.name,b.price FROM testdata/test3.xlsx::.C1 AS a | ||
| LEFT JOIN testdata/test3.xlsx::.F4 AS b ON a.id=b.id | ||
| WHERE a.id != ''" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is repeated from the earlier hunk. It might be beneficial to avoid duplication in documentation to keep it concise. Consider consolidating examples that demonstrate the same functionality.
- "SELECT a.id,a.name,b.price FROM testdata/test3.xlsx::.C1 AS a
- LEFT JOIN testdata/test3.xlsx::.F4 AS b ON a.id=b.id
- WHERE a.id != ''"
-
This example is similar to the one provided earlier and could be omitted or referenced to avoid duplication.
```diff
+ # Example or explanation for the --header option
+ The `--header` option can be used to treat the first row of the sheet (after any skipped rows) as the header row for the data. This is useful when your data includes column names at the top of the sheet.
+ ```console
+ xlsxsql query --header --out JSONL "SELECT * FROM test.xlsx::Sheet2"
+ ```
| You can use the `--skip` option to ignore a certain number of rows at the beginning of the sheet. | ||
| For example, to skip the first two rows, you would use: | ||
|
|
||
| ```console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation of the --skip option is clear. It's good practice to provide examples of how to use new flags, as this can help users understand how to apply them to their own use cases.
The --header option explanation is missing in the provided hunk. It would be beneficial to include an example or explanation of this option as well, especially since it's mentioned in the context of the --skip option.
end hunk 2
Changed so that the starting cell can be specified.
The specified format is "fileName::sheetName.Cell".
ex: "book.xlsx::Sheet1.B4"
If cell is specified, a blank column indicates the end of the column.
Summary by CodeRabbit
New Features
xlsxsqltool to allow querying data from specified cells within Excel sheets.Enhancements
Documentation
xlsxsqlfeatures, including cell specification and table joins.Refactor
Chores