[M1580] handle empty strings during validation for numeric values#606
[M1580] handle empty strings during validation for numeric values#606saanobhaai merged 1 commit intomasterfrom
Conversation
WalkthroughA new method, Changes
Sequence Diagram(s)sequenceDiagram
participant Validator as Validator (e.g., DepthValidator)
participant BaseValidator as BaseValidator
participant Record as Record (input data)
Validator->>BaseValidator: get_numeric_value(record, key)
BaseValidator->>Record: Retrieve value using dot-notated key
alt Key missing or value is None/empty string
BaseValidator-->>Validator: Return 0
else Value found
BaseValidator-->>Validator: Return value as is
end
Validator->>Validator: Continue validation logic with numeric value
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/api/submission/validations/validators/base.py(1 hunks)src/api/submission/validations/validators/benthic_transect.py(1 hunks)src/api/submission/validations/validators/depth.py(1 hunks)src/api/submission/validations/validators/interval_size.py(1 hunks)src/api/submission/validations/validators/interval_start.py(1 hunks)src/api/submission/validations/validators/obs_benthic_lit.py(1 hunks)src/api/submission/validations/validators/obs_benthic_photo_quadrat.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
src/api/submission/validations/validators/depth.py (1)
src/api/submission/validations/validators/base.py (1)
get_numeric_value(127-135)
src/api/submission/validations/validators/obs_benthic_lit.py (1)
src/api/submission/validations/validators/base.py (1)
get_numeric_value(127-135)
src/api/submission/validations/validators/interval_start.py (2)
src/api/submission/validations/validators/base.py (1)
get_numeric_value(127-135)src/api/models/classification.py (1)
collect_record(126-127)
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (2)
src/api/submission/validations/validators/base.py (1)
get_numeric_value(127-135)src/api/models/classification.py (1)
collect_record(126-127)
🔇 Additional comments (5)
src/api/submission/validations/validators/base.py (1)
127-135: Well-implemented numeric value retrieval method.The
get_numeric_valuemethod effectively addresses the PR objective by consistently handling empty strings andNonevalues by returning0. The implementation properly handlesKeyErrorexceptions and maintains consistency with the existingget_valuemethod by usingdottyfor nested dictionary access.src/api/submission/validations/validators/depth.py (1)
19-19: Appropriate use of get_numeric_value for depth validation.The change from
get_valuetoget_numeric_valueis well-suited for depth validation since it ensures empty strings are handled consistently by returning0, which will be properly validated by the subsequentDecimalconversion and range checks.src/api/submission/validations/validators/obs_benthic_lit.py (1)
24-24: Consistent numeric value handling for surveyed length.The change to
get_numeric_valueis appropriate here sincelen_surveyedis expected to be numeric (multiplied by 100 on line 25). The existing validation logic at line 29 properly handles the case wherelen_surveyed_cm <= 0, ensuring robust error handling.src/api/submission/validations/validators/interval_start.py (1)
15-15: Proper numeric value retrieval for interval start validation.The change to
get_numeric_valueis well-implemented here. The method handles empty strings by returning0, which is then properly validated through the explicitfloatconversion and range checks. This maintains consistency with the other validators in this PR.src/api/submission/validations/validators/interval_size.py (1)
15-15: Effective numeric value handling for interval size validation.The change to
get_numeric_valueis appropriate for interval size validation. The method ensures empty strings are handled consistently by returning0, which will be properly caught by the validation logic at line 25 that checks forinterval_size <= 0. The explicitfloatconversion with error handling maintains robust validation.
| len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path) | ||
| interval_size = self.get_numeric_value(collect_record, self.interval_size_path) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify error handling for non-numeric values.
The get_numeric_value method returns 0 for missing/empty values but doesn't validate that the value is actually numeric. If len_surveyed or interval_size contain non-numeric strings, the mathematical operation on line 133 (math.ceil(len_surveyed / interval_size)) will fail with a TypeError.
Consider adding proper numeric validation or error handling to ensure these values are numbers before performing mathematical operations.
Let me check if there are any existing validations that ensure these values are numeric before reaching this validator:
🏁 Script executed:
#!/bin/bash
# Check for numeric validation of len_surveyed and interval_size fields
ast-grep --pattern 'def $_($$$) {
$$$
len_surveyed$$$
$$$
}'
# Also check for field definitions and validators
rg -A 5 -B 5 "len_surveyed|interval_size" --type pyLength of output: 224515
🏁 Script executed:
#!/bin/bash
# Find the definition of get_numeric_value in the validators
grep -R "def get_numeric_value" -n src/api/submission/validations/validatorsLength of output: 178
🏁 Script executed:
#!/bin/bash
# Display get_numeric_value implementation
sed -n '110,150p' src/api/submission/validations/validators/base.pyLength of output: 721
🏁 Script executed:
#!/bin/bash
# Show definition of validator_result decorator
grep -R "def validator_result" -n src/api/submission/validations/validators/base.pyLength of output: 116
🏁 Script executed:
#!/bin/bash
# Display the validator_result decorator implementation and error handling
sed -n '1,80p' src/api/submission/validations/validators/base.pyLength of output: 2186
Add explicit numeric casting and error handling in BenthicIntervalObservationCountValidator
The current implementation in
src/api/submission/validations/validators/benthic_transect.py (lines 125–126)
uses get_numeric_value, which may return non-numeric types (e.g. a string) and lead to a TypeError when you do comparisons or math.ceil. Although field-level validators ensure these values are numeric, it’s safer for this record-level validator to guard against bad input directly.
Suggested change:
- len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
- interval_size = self.get_numeric_value(collect_record, self.interval_size_path)
+ # ensure we have actual numbers before doing math
+ try:
+ len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path))
+ except (TypeError, ValueError):
+ return ERROR, self.NON_POSITIVE.format("len_surveyed")
+ try:
+ interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path))
+ except (TypeError, ValueError):
+ return ERROR, self.NON_POSITIVE.format("interval_size")This will catch any unexpected non-numeric values and return a clear validation error instead of allowing a TypeError to bubble up.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path) | |
| interval_size = self.get_numeric_value(collect_record, self.interval_size_path) | |
| # ensure we have actual numbers before doing math | |
| try: | |
| len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path)) | |
| except (TypeError, ValueError): | |
| return ERROR, self.NON_POSITIVE.format("len_surveyed") | |
| try: | |
| interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path)) | |
| except (TypeError, ValueError): | |
| return ERROR, self.NON_POSITIVE.format("interval_size") |
🤖 Prompt for AI Agents
In src/api/submission/validations/validators/benthic_transect.py around lines
125 to 126, the variables len_surveyed and interval_size are assigned using
get_numeric_value but may not be explicitly cast to numeric types, risking
TypeErrors during operations. Modify the code to explicitly cast these values to
float or int within a try-except block to catch ValueError or TypeError. If
casting fails, return a clear validation error message instead of allowing an
exception to propagate.
| num_quadrats_max = validator.limit_value | ||
|
|
||
| num_quadrats = self.get_value(collect_record, self.num_quadrats_path) or 0 | ||
| num_quadrats = self.get_numeric_value(collect_record, self.num_quadrats_path) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify error handling for non-numeric num_quadrats.
The num_quadrats value is used in mathematical operations on line 131 (range(quadrat_number_start, quadrat_number_start + num_quadrats)) without exception handling. If get_numeric_value returns a non-numeric string, this will cause a TypeError.
Consider adding validation to ensure num_quadrats is numeric before using it in mathematical operations, or add appropriate error handling.
Let me check if there are upstream validations for the num_quadrats field:
🏁 Script executed:
#!/bin/bash
# Check for numeric validation of num_quadrats field
ast-grep --pattern 'num_quadrats$$$'
# Check field definitions and validators
rg -A 5 -B 5 "num_quadrats" --type py | head -50Length of output: 4722
🏁 Script executed:
#!/bin/bash
# Locate PositiveIntegerValidator definition
rg "class PositiveIntegerValidator" -A 20 --type py
# Locate QuadratCountValidator definition
rg "class QuadratCountValidator" -A 20 --type pyLength of output: 4140
🏁 Script executed:
#!/bin/bash
# Locate BaseValidator to find get_numeric_value implementation
rg "class BaseValidator" -A 50 --type py
# Locate get_numeric_value definition
rg "def get_numeric_value" -A 20 --type pyLength of output: 3722
Ensure num_quadrats is an integer before using it in range()
Although PositiveIntegerValidator enforces that data.quadrat_transect.num_quadrats is a non-negative integer at the field level, the record-level QuadratCountValidator still retrieves its value via get_numeric_value (which merely returns the raw data). If that raw value isn’t actually an int, calling
range(quadrat_number_start, quadrat_number_start + num_quadrats)will raise a TypeError.
Please harden the validator in src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (around line 131):
- Add an explicit check (
isinstance(num_quadrats, int)) or cast (num_quadrats = int(num_quadrats)) with atry/exceptto catch conversion failures. - If conversion fails, return a validation error (e.g., use
self.DIFFERENT_NUMBER_OF_QUADRATSor a new error code) instead of letting an exception bubble up. - Alternatively, wrap the
range(...)call intry/except TypeErrorand return a clear validation result.
This will ensure the record-level check never crashes on unexpected input.
🤖 Prompt for AI Agents
In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around
line 115, the variable num_quadrats obtained from get_numeric_value may not be
an integer, causing a TypeError when used in range(). To fix this, add an
explicit check or attempt to cast num_quadrats to int inside a try/except block.
If casting fails, return a validation error using
self.DIFFERENT_NUMBER_OF_QUADRATS or a similar error code instead of letting the
exception propagate. This ensures the validator handles unexpected input
gracefully without crashing.
| num_points = self.get_numeric_value(obs, self.observation_num_points_path) | ||
| except (TypeError, ValueError): | ||
| continue |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Review exception handling behavior with get_numeric_value.
The get_numeric_value method returns 0 for missing/empty values and returns the value as-is otherwise. If the value is a non-numeric string, it won't raise TypeError or ValueError as the original code might have expected.
This could change the behavior of the try-except block. Previously, if the value couldn't be used in the addition operation quadrat_number_groups[quadrat_number] += num_points, it would catch the exception and continue. Now, non-numeric strings would be returned by get_numeric_value and likely cause the same TypeError in the addition operation.
Verify that this behavior change is intentional and that the exception handling still works as expected.
Let me check what the original get_value method returned and how it was used:
🏁 Script executed:
#!/bin/bash
# Check the original get_value method behavior
ast-grep --pattern 'def get_value($$$) {
$$$
}'
# Check for any numeric conversion around num_points usage
rg -A 10 -B 5 "num_points.*=" --type pyLength of output: 43054
🏁 Script executed:
#!/bin/bash
set -e
# Locate definitions of get_numeric_value and get_value
rg -n "def get_numeric_value" --type py
rg -n "def get_value" --type pyLength of output: 355
🏁 Script executed:
#!/bin/bash
set -e
# Show get_numeric_value implementation
rg -n -C5 "def get_numeric_value" src/api/submission/validations/validators/base.py
# Show validator’s get_value implementation
rg -n -C5 "def get_value" src/api/submission/validations/validators/base.py
# Show utility get_value implementation
rg -n -C5 "def get_value" src/api/utils/__init__.pyLength of output: 1268
🏁 Script executed:
#!/bin/bash
set -e
# Show full get_numeric_value implementation
rg -n -C20 "def get_numeric_value" src/api/submission/validations/validators/base.pyLength of output: 993
Evaluate and adjust exception handling around get_numeric_value.
The new get_numeric_value implementation in src/api/submission/validations/validators/base.py (lines 127–135) always returns an integer (0 for missing/empty) or the raw value and never raises TypeError or ValueError. Consequently:
- The
except (TypeError, ValueError)around the call toget_numeric_valuein
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (lines 36–38) is now redundant. - Any TypeError from adding a non-integer (if it ever occurs) would surface at
quadrat_number_groups[quadrat_number] += num_pointsand not be caught.
Since all ingestion serializers guarantee integer num_points, this is safe in practice. However, to keep the code defensively correct, you should either:
- Remove the unused
try/exceptblock aroundget_numeric_value, or - Broaden the
tryto wrap both the call and the addition, or - Add an explicit type check on
num_pointsbefore the addition.
Locations to update:
- src/api/submission/validations/validators/obs_benthic_photo_quadrat.py: lines 36–38
🤖 Prompt for AI Agents
In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around
lines 36 to 38, the try/except block catching TypeError and ValueError around
get_numeric_value is now redundant because get_numeric_value never raises these
exceptions. To fix this, remove the try/except block and either rely on the
guaranteed integer return or add a type check on num_points before using it in
the addition to quadrat_number_groups. This will keep the code clean and
defensively correct.
Summary by CodeRabbit