Skip to content

[M1580] handle empty strings during validation for numeric values#606

Merged
saanobhaai merged 1 commit intomasterfrom
M1580
Jul 15, 2025
Merged

[M1580] handle empty strings during validation for numeric values#606
saanobhaai merged 1 commit intomasterfrom
M1580

Conversation

@saanobhaai
Copy link
Member

@saanobhaai saanobhaai commented Jul 15, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of numeric fields in various validation checks, ensuring missing or invalid values default to zero and are processed consistently across the application. This enhances reliability when validating numeric data in submissions.

@saanobhaai saanobhaai requested a review from gridcell July 15, 2025 16:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 15, 2025

Walkthrough

A new method, get_numeric_value, was added to the BaseValidator class to standardize numeric value extraction from records. Multiple validator modules were updated to use this new method instead of get_value, removing fallback defaults and unifying the handling of missing or invalid numeric fields during validation.

Changes

File(s) Change Summary
src/api/submission/validations/validators/base.py Added get_numeric_value method to BaseValidator for extracting numeric values with specific fallback behavior.
src/api/submission/validations/validators/benthic_transect.py
src/api/submission/validations/validators/depth.py
src/api/submission/validations/validators/interval_size.py
src/api/submission/validations/validators/interval_start.py
src/api/submission/validations/validators/obs_benthic_lit.py
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py
Replaced usage of get_value (with fallback defaults) with get_numeric_value for numeric field extraction.

Sequence Diagram(s)

sequenceDiagram
    participant Validator as Validator (e.g., DepthValidator)
    participant BaseValidator as BaseValidator
    participant Record as Record (input data)

    Validator->>BaseValidator: get_numeric_value(record, key)
    BaseValidator->>Record: Retrieve value using dot-notated key
    alt Key missing or value is None/empty string
        BaseValidator-->>Validator: Return 0
    else Value found
        BaseValidator-->>Validator: Return value as is
    end
    Validator->>Validator: Continue validation logic with numeric value
Loading
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@saanobhaai
Copy link
Member Author

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18d5687 and 3ce2d14.

📒 Files selected for processing (7)
  • src/api/submission/validations/validators/base.py (1 hunks)
  • src/api/submission/validations/validators/benthic_transect.py (1 hunks)
  • src/api/submission/validations/validators/depth.py (1 hunks)
  • src/api/submission/validations/validators/interval_size.py (1 hunks)
  • src/api/submission/validations/validators/interval_start.py (1 hunks)
  • src/api/submission/validations/validators/obs_benthic_lit.py (1 hunks)
  • src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
src/api/submission/validations/validators/depth.py (1)
src/api/submission/validations/validators/base.py (1)
  • get_numeric_value (127-135)
src/api/submission/validations/validators/obs_benthic_lit.py (1)
src/api/submission/validations/validators/base.py (1)
  • get_numeric_value (127-135)
src/api/submission/validations/validators/interval_start.py (2)
src/api/submission/validations/validators/base.py (1)
  • get_numeric_value (127-135)
src/api/models/classification.py (1)
  • collect_record (126-127)
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (2)
src/api/submission/validations/validators/base.py (1)
  • get_numeric_value (127-135)
src/api/models/classification.py (1)
  • collect_record (126-127)
🔇 Additional comments (5)
src/api/submission/validations/validators/base.py (1)

127-135: Well-implemented numeric value retrieval method.

The get_numeric_value method effectively addresses the PR objective by consistently handling empty strings and None values by returning 0. The implementation properly handles KeyError exceptions and maintains consistency with the existing get_value method by using dotty for nested dictionary access.

src/api/submission/validations/validators/depth.py (1)

19-19: Appropriate use of get_numeric_value for depth validation.

The change from get_value to get_numeric_value is well-suited for depth validation since it ensures empty strings are handled consistently by returning 0, which will be properly validated by the subsequent Decimal conversion and range checks.

src/api/submission/validations/validators/obs_benthic_lit.py (1)

24-24: Consistent numeric value handling for surveyed length.

The change to get_numeric_value is appropriate here since len_surveyed is expected to be numeric (multiplied by 100 on line 25). The existing validation logic at line 29 properly handles the case where len_surveyed_cm <= 0, ensuring robust error handling.

src/api/submission/validations/validators/interval_start.py (1)

15-15: Proper numeric value retrieval for interval start validation.

The change to get_numeric_value is well-implemented here. The method handles empty strings by returning 0, which is then properly validated through the explicit float conversion and range checks. This maintains consistency with the other validators in this PR.

src/api/submission/validations/validators/interval_size.py (1)

15-15: Effective numeric value handling for interval size validation.

The change to get_numeric_value is appropriate for interval size validation. The method ensures empty strings are handled consistently by returning 0, which will be properly caught by the validation logic at line 25 that checks for interval_size <= 0. The explicit float conversion with error handling maintains robust validation.

Comment on lines +125 to +126
len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
interval_size = self.get_numeric_value(collect_record, self.interval_size_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify error handling for non-numeric values.

The get_numeric_value method returns 0 for missing/empty values but doesn't validate that the value is actually numeric. If len_surveyed or interval_size contain non-numeric strings, the mathematical operation on line 133 (math.ceil(len_surveyed / interval_size)) will fail with a TypeError.

Consider adding proper numeric validation or error handling to ensure these values are numbers before performing mathematical operations.

Let me check if there are any existing validations that ensure these values are numeric before reaching this validator:


🏁 Script executed:

#!/bin/bash
# Check for numeric validation of len_surveyed and interval_size fields
ast-grep --pattern 'def $_($$$) {
  $$$
  len_surveyed$$$
  $$$
}'

# Also check for field definitions and validators
rg -A 5 -B 5 "len_surveyed|interval_size" --type py

Length of output: 224515


🏁 Script executed:

#!/bin/bash
# Find the definition of get_numeric_value in the validators
grep -R "def get_numeric_value" -n src/api/submission/validations/validators

Length of output: 178


🏁 Script executed:

#!/bin/bash
# Display get_numeric_value implementation
sed -n '110,150p' src/api/submission/validations/validators/base.py

Length of output: 721


🏁 Script executed:

#!/bin/bash
# Show definition of validator_result decorator
grep -R "def validator_result" -n src/api/submission/validations/validators/base.py

Length of output: 116


🏁 Script executed:

#!/bin/bash
# Display the validator_result decorator implementation and error handling
sed -n '1,80p' src/api/submission/validations/validators/base.py

Length of output: 2186


Add explicit numeric casting and error handling in BenthicIntervalObservationCountValidator

The current implementation in
src/api/submission/validations/validators/benthic_transect.py (lines 125–126)
uses get_numeric_value, which may return non-numeric types (e.g. a string) and lead to a TypeError when you do comparisons or math.ceil. Although field-level validators ensure these values are numeric, it’s safer for this record-level validator to guard against bad input directly.

Suggested change:

-    len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
-    interval_size = self.get_numeric_value(collect_record, self.interval_size_path)
+    # ensure we have actual numbers before doing math
+    try:
+        len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path))
+    except (TypeError, ValueError):
+        return ERROR, self.NON_POSITIVE.format("len_surveyed")
+    try:
+        interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path))
+    except (TypeError, ValueError):
+        return ERROR, self.NON_POSITIVE.format("interval_size")

This will catch any unexpected non-numeric values and return a clear validation error instead of allowing a TypeError to bubble up.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
interval_size = self.get_numeric_value(collect_record, self.interval_size_path)
# ensure we have actual numbers before doing math
try:
len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path))
except (TypeError, ValueError):
return ERROR, self.NON_POSITIVE.format("len_surveyed")
try:
interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path))
except (TypeError, ValueError):
return ERROR, self.NON_POSITIVE.format("interval_size")
🤖 Prompt for AI Agents
In src/api/submission/validations/validators/benthic_transect.py around lines
125 to 126, the variables len_surveyed and interval_size are assigned using
get_numeric_value but may not be explicitly cast to numeric types, risking
TypeErrors during operations. Modify the code to explicitly cast these values to
float or int within a try-except block to catch ValueError or TypeError. If
casting fails, return a clear validation error message instead of allowing an
exception to propagate.

num_quadrats_max = validator.limit_value

num_quadrats = self.get_value(collect_record, self.num_quadrats_path) or 0
num_quadrats = self.get_numeric_value(collect_record, self.num_quadrats_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify error handling for non-numeric num_quadrats.

The num_quadrats value is used in mathematical operations on line 131 (range(quadrat_number_start, quadrat_number_start + num_quadrats)) without exception handling. If get_numeric_value returns a non-numeric string, this will cause a TypeError.

Consider adding validation to ensure num_quadrats is numeric before using it in mathematical operations, or add appropriate error handling.

Let me check if there are upstream validations for the num_quadrats field:


🏁 Script executed:

#!/bin/bash
# Check for numeric validation of num_quadrats field
ast-grep --pattern 'num_quadrats$$$'

# Check field definitions and validators
rg -A 5 -B 5 "num_quadrats" --type py | head -50

Length of output: 4722


🏁 Script executed:

#!/bin/bash
# Locate PositiveIntegerValidator definition
rg "class PositiveIntegerValidator" -A 20 --type py

# Locate QuadratCountValidator definition
rg "class QuadratCountValidator" -A 20 --type py

Length of output: 4140


🏁 Script executed:

#!/bin/bash
# Locate BaseValidator to find get_numeric_value implementation
rg "class BaseValidator" -A 50 --type py

# Locate get_numeric_value definition
rg "def get_numeric_value" -A 20 --type py

Length of output: 3722


Ensure num_quadrats is an integer before using it in range()

Although PositiveIntegerValidator enforces that data.quadrat_transect.num_quadrats is a non-negative integer at the field level, the record-level QuadratCountValidator still retrieves its value via get_numeric_value (which merely returns the raw data). If that raw value isn’t actually an int, calling

range(quadrat_number_start, quadrat_number_start + num_quadrats)

will raise a TypeError.

Please harden the validator in src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (around line 131):

  • Add an explicit check (isinstance(num_quadrats, int)) or cast (num_quadrats = int(num_quadrats)) with a try/except to catch conversion failures.
  • If conversion fails, return a validation error (e.g., use self.DIFFERENT_NUMBER_OF_QUADRATS or a new error code) instead of letting an exception bubble up.
  • Alternatively, wrap the range(...) call in try/except TypeError and return a clear validation result.

This will ensure the record-level check never crashes on unexpected input.

🤖 Prompt for AI Agents
In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around
line 115, the variable num_quadrats obtained from get_numeric_value may not be
an integer, causing a TypeError when used in range(). To fix this, add an
explicit check or attempt to cast num_quadrats to int inside a try/except block.
If casting fails, return a validation error using
self.DIFFERENT_NUMBER_OF_QUADRATS or a similar error code instead of letting the
exception propagate. This ensures the validator handles unexpected input
gracefully without crashing.

Comment on lines +36 to 38
num_points = self.get_numeric_value(obs, self.observation_num_points_path)
except (TypeError, ValueError):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Review exception handling behavior with get_numeric_value.

The get_numeric_value method returns 0 for missing/empty values and returns the value as-is otherwise. If the value is a non-numeric string, it won't raise TypeError or ValueError as the original code might have expected.

This could change the behavior of the try-except block. Previously, if the value couldn't be used in the addition operation quadrat_number_groups[quadrat_number] += num_points, it would catch the exception and continue. Now, non-numeric strings would be returned by get_numeric_value and likely cause the same TypeError in the addition operation.

Verify that this behavior change is intentional and that the exception handling still works as expected.

Let me check what the original get_value method returned and how it was used:


🏁 Script executed:

#!/bin/bash
# Check the original get_value method behavior
ast-grep --pattern 'def get_value($$$) {
  $$$
}'

# Check for any numeric conversion around num_points usage
rg -A 10 -B 5 "num_points.*=" --type py

Length of output: 43054


🏁 Script executed:

#!/bin/bash
set -e
# Locate definitions of get_numeric_value and get_value
rg -n "def get_numeric_value" --type py
rg -n "def get_value" --type py

Length of output: 355


🏁 Script executed:

#!/bin/bash
set -e
# Show get_numeric_value implementation
rg -n -C5 "def get_numeric_value" src/api/submission/validations/validators/base.py

# Show validator’s get_value implementation
rg -n -C5 "def get_value" src/api/submission/validations/validators/base.py

# Show utility get_value implementation
rg -n -C5 "def get_value" src/api/utils/__init__.py

Length of output: 1268


🏁 Script executed:

#!/bin/bash
set -e
# Show full get_numeric_value implementation
rg -n -C20 "def get_numeric_value" src/api/submission/validations/validators/base.py

Length of output: 993


Evaluate and adjust exception handling around get_numeric_value.

The new get_numeric_value implementation in src/api/submission/validations/validators/base.py (lines 127–135) always returns an integer (0 for missing/empty) or the raw value and never raises TypeError or ValueError. Consequently:

  • The except (TypeError, ValueError) around the call to get_numeric_value in
    src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (lines 36–38) is now redundant.
  • Any TypeError from adding a non-integer (if it ever occurs) would surface at
    quadrat_number_groups[quadrat_number] += num_points and not be caught.

Since all ingestion serializers guarantee integer num_points, this is safe in practice. However, to keep the code defensively correct, you should either:

  • Remove the unused try/except block around get_numeric_value, or
  • Broaden the try to wrap both the call and the addition, or
  • Add an explicit type check on num_points before the addition.

Locations to update:

  • src/api/submission/validations/validators/obs_benthic_photo_quadrat.py: lines 36–38
🤖 Prompt for AI Agents
In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around
lines 36 to 38, the try/except block catching TypeError and ValueError around
get_numeric_value is now redundant because get_numeric_value never raises these
exceptions. To fix this, remove the try/except block and either rely on the
guaranteed integer return or add a type check on num_points before using it in
the addition to quadrat_number_groups. This will keep the code clean and
defensively correct.

@saanobhaai saanobhaai merged commit bf8ea7d into master Jul 15, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants