-
Notifications
You must be signed in to change notification settings - Fork 20
refactor: Major refactoring of functions to improve readability, efficiency and follow standard practices. #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Use more jinja templating instead of hardcoding strings - Simplified bounding box function - Changed parameter name for `_get_hocr_bounding_box` to `page_dimension` for more clarity.
…age.py - Added in #110 Lost in Merge
- Required for template to work in installed library
- improve readability, follow python conventions, and improve efficiency - Also, fixed a previously unknown bug where `Document.search_pages()` returned inaccurate results because it only searched paragraph.text, not page.text
…low python conventions
…ect` to match the page.py file
…ize functions, reduce complexity, and increase readability
- Also reduce wait_time in tests
- `converter.py` only had one external facing function that called an internal function with the same parameters. - Not sure if there was a specific reason for this setup, can be undone if needed.
- Also reduce wait_time in tests
- `converter.py` only had one external facing function that called an internal function with the same parameters. - Not sure if there was a specific reason for this setup, can be undone if needed.
- Changed how `Block` is initialized. - Changed `load_blocks_from_schema` into a `@classmethod` to simplify imports.
google/cloud/documentai_toolbox/converters/config/bbox_conversion.py
Outdated
Show resolved
Hide resolved
| def _convert_bbox_units( | ||
| coordinate, input_bbox_units, width=None, height=None, multiplier=1 | ||
| coordinate: float, | ||
| input_bbox_units: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can input_bbox_units be an enum instead of str?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that would make more sense. However, since this value is provided by the user in a JSON file, I wonder if there is a straightforward way to make that conversion? (Besides changing the config value to be an integer) Or maybe if it would be better to use constants?
|
|
||
| blocks: List[Block] = [] | ||
| ens = _get_target_object(objects, entities) | ||
| for i in ens: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this for loop and its nested if-else clauses is difficult to understand. I get the impression that many of them could be simplified if a block b's attributes have reasonable defaults. For instance, can't
if bounding_width:
b.bounding_width = _get_target_object(b.bounding_box, bounding_width)
be simplified to just
b.bounding_width = _get_target_object(b.bounding_box, bounding_width)
?
(Please ignore and file an internal clean up issue if this file was simply renamed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a refactoring that I did as well. So I'm good to make some changes as needed.
| FILES_TO_IGNORE = [".DS_Store"] | ||
|
|
||
|
|
||
| def _get_base_ocr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these simply moved from converter_helpers? Let me know if these additions need a more detailed review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were refactored as well, and everything was moved into this file.
You can see just the refactoring work in this commit fb517fb
- Removed FILES_TO_IGNORE - Simplification of logic in `_get_multiplier` `convert_bbox_to_docproto_bbox` - Addressed other lint errors - Adjusted function names to indicate not protected members.
BEGIN_COMMIT_OVERRIDE
fix: Internal refactoring to improve efficiency and readability. No external-facing changes.
END_COMMIT_OVERRIDE