[M1580] handle empty strings during validation for numeric values by saanobhaai · Pull Request #606 · data-mermaid/mermaid-api

saanobhaai · 2025-07-15T16:34:19Z

Summary by CodeRabbit

Bug Fixes
- Improved handling of numeric fields in various validation checks, ensuring missing or invalid values default to zero and are processed consistently across the application. This enhances reliability when validating numeric data in submissions.

coderabbitai · 2025-07-15T16:34:26Z

Walkthrough

A new method, get_numeric_value, was added to the BaseValidator class to standardize numeric value extraction from records. Multiple validator modules were updated to use this new method instead of get_value, removing fallback defaults and unifying the handling of missing or invalid numeric fields during validation.

Changes

File(s)	Change Summary
src/api/submission/validations/validators/base.py	Added `get_numeric_value` method to `BaseValidator` for extracting numeric values with specific fallback behavior.
src/api/submission/validations/validators/benthic_transect.py src/api/submission/validations/validators/depth.py src/api/submission/validations/validators/interval_size.py src/api/submission/validations/validators/interval_start.py src/api/submission/validations/validators/obs_benthic_lit.py src/api/submission/validations/validators/obs_benthic_photo_quadrat.py	Replaced usage of `get_value` (with fallback defaults) with `get_numeric_value` for numeric field extraction.

Sequence Diagram(s)

sequenceDiagram
    participant Validator as Validator (e.g., DepthValidator)
    participant BaseValidator as BaseValidator
    participant Record as Record (input data)

    Validator->>BaseValidator: get_numeric_value(record, key)
    BaseValidator->>Record: Retrieve value using dot-notated key
    alt Key missing or value is None/empty string
        BaseValidator-->>Validator: Return 0
    else Value found
        BaseValidator-->>Validator: Return value as is
    end
    Validator->>Validator: Continue validation logic with numeric value

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

saanobhaai · 2025-07-15T16:34:38Z

validation empty string type error

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18d5687 and 3ce2d14.

📒 Files selected for processing (7)

src/api/submission/validations/validators/base.py (1 hunks)
src/api/submission/validations/validators/benthic_transect.py (1 hunks)
src/api/submission/validations/validators/depth.py (1 hunks)
src/api/submission/validations/validators/interval_size.py (1 hunks)
src/api/submission/validations/validators/interval_start.py (1 hunks)
src/api/submission/validations/validators/obs_benthic_lit.py (1 hunks)
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

src/api/submission/validations/validators/depth.py (1)

src/api/submission/validations/validators/base.py (1)

get_numeric_value (127-135)

src/api/submission/validations/validators/obs_benthic_lit.py (1)

src/api/submission/validations/validators/base.py (1)

get_numeric_value (127-135)

src/api/submission/validations/validators/interval_start.py (2)

src/api/submission/validations/validators/base.py (1)

get_numeric_value (127-135)

src/api/models/classification.py (1)

collect_record (126-127)

src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (2)

src/api/submission/validations/validators/base.py (1)

get_numeric_value (127-135)

src/api/models/classification.py (1)

collect_record (126-127)

🔇 Additional comments (5)

src/api/submission/validations/validators/base.py (1)

127-135: Well-implemented numeric value retrieval method.

The get_numeric_value method effectively addresses the PR objective by consistently handling empty strings and None values by returning 0. The implementation properly handles KeyError exceptions and maintains consistency with the existing get_value method by using dotty for nested dictionary access.

src/api/submission/validations/validators/depth.py (1)

19-19: Appropriate use of get_numeric_value for depth validation.

The change from get_value to get_numeric_value is well-suited for depth validation since it ensures empty strings are handled consistently by returning 0, which will be properly validated by the subsequent Decimal conversion and range checks.

src/api/submission/validations/validators/obs_benthic_lit.py (1)

24-24: Consistent numeric value handling for surveyed length.

The change to get_numeric_value is appropriate here since len_surveyed is expected to be numeric (multiplied by 100 on line 25). The existing validation logic at line 29 properly handles the case where len_surveyed_cm <= 0, ensuring robust error handling.

src/api/submission/validations/validators/interval_start.py (1)

15-15: Proper numeric value retrieval for interval start validation.

The change to get_numeric_value is well-implemented here. The method handles empty strings by returning 0, which is then properly validated through the explicit float conversion and range checks. This maintains consistency with the other validators in this PR.

src/api/submission/validations/validators/interval_size.py (1)

15-15: Effective numeric value handling for interval size validation.

The change to get_numeric_value is appropriate for interval size validation. The method ensures empty strings are handled consistently by returning 0, which will be properly caught by the validation logic at line 25 that checks for interval_size <= 0. The explicit float conversion with error handling maintains robust validation.

coderabbitai · 2025-07-15T16:43:18Z

src/api/submission/validations/validators/benthic_transect.py

+        len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
+        interval_size = self.get_numeric_value(collect_record, self.interval_size_path)


💡 Verification agent

🧩 Analysis chain

Verify error handling for non-numeric values.

The get_numeric_value method returns 0 for missing/empty values but doesn't validate that the value is actually numeric. If len_surveyed or interval_size contain non-numeric strings, the mathematical operation on line 133 (math.ceil(len_surveyed / interval_size)) will fail with a TypeError.

Consider adding proper numeric validation or error handling to ensure these values are numbers before performing mathematical operations.

Let me check if there are any existing validations that ensure these values are numeric before reaching this validator:

🏁 Script executed:

#!/bin/bash # Check for numeric validation of len_surveyed and interval_size fields ast-grep --pattern 'def $_($$$) { $$$ len_surveyed$$$ $$$ }' # Also check for field definitions and validators rg -A 5 -B 5 "len_surveyed|interval_size" --type py

Length of output: 224515

🏁 Script executed:

#!/bin/bash # Find the definition of get_numeric_value in the validators grep -R "def get_numeric_value" -n src/api/submission/validations/validators

Length of output: 178

🏁 Script executed:

#!/bin/bash # Display get_numeric_value implementation sed -n '110,150p' src/api/submission/validations/validators/base.py

Length of output: 721

🏁 Script executed:

#!/bin/bash # Show definition of validator_result decorator grep -R "def validator_result" -n src/api/submission/validations/validators/base.py

Length of output: 116

🏁 Script executed:

#!/bin/bash # Display the validator_result decorator implementation and error handling sed -n '1,80p' src/api/submission/validations/validators/base.py

Length of output: 2186

Add explicit numeric casting and error handling in BenthicIntervalObservationCountValidator

The current implementation in
src/api/submission/validations/validators/benthic_transect.py (lines 125–126)
uses get_numeric_value, which may return non-numeric types (e.g. a string) and lead to a TypeError when you do comparisons or math.ceil. Although field-level validators ensure these values are numeric, it’s safer for this record-level validator to guard against bad input directly.

Suggested change:

- len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path) - interval_size = self.get_numeric_value(collect_record, self.interval_size_path) + # ensure we have actual numbers before doing math + try: + len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path)) + except (TypeError, ValueError): + return ERROR, self.NON_POSITIVE.format("len_surveyed") + try: + interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path)) + except (TypeError, ValueError): + return ERROR, self.NON_POSITIVE.format("interval_size")

This will catch any unexpected non-numeric values and return a clear validation error instead of allowing a TypeError to bubble up.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)

interval_size = self.get_numeric_value(collect_record, self.interval_size_path)

# ensure we have actual numbers before doing math

try:

len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path))

except (TypeError, ValueError):

return ERROR, self.NON_POSITIVE.format("len_surveyed")

try:

interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path))

except (TypeError, ValueError):

return ERROR, self.NON_POSITIVE.format("interval_size")

🤖 Prompt for AI Agents

In src/api/submission/validations/validators/benthic_transect.py around lines 125 to 126, the variables len_surveyed and interval_size are assigned using get_numeric_value but may not be explicitly cast to numeric types, risking TypeErrors during operations. Modify the code to explicitly cast these values to float or int within a try-except block to catch ValueError or TypeError. If casting fails, return a clear validation error message instead of allowing an exception to propagate.

coderabbitai · 2025-07-15T16:43:18Z

src/api/submission/validations/validators/obs_benthic_photo_quadrat.py

                num_quadrats_max = validator.limit_value

-        num_quadrats = self.get_value(collect_record, self.num_quadrats_path) or 0
+        num_quadrats = self.get_numeric_value(collect_record, self.num_quadrats_path)


💡 Verification agent

🧩 Analysis chain

Verify error handling for non-numeric num_quadrats.

The num_quadrats value is used in mathematical operations on line 131 (range(quadrat_number_start, quadrat_number_start + num_quadrats)) without exception handling. If get_numeric_value returns a non-numeric string, this will cause a TypeError.

Consider adding validation to ensure num_quadrats is numeric before using it in mathematical operations, or add appropriate error handling.

Let me check if there are upstream validations for the num_quadrats field:

🏁 Script executed:

#!/bin/bash # Check for numeric validation of num_quadrats field ast-grep --pattern 'num_quadrats$$$' # Check field definitions and validators rg -A 5 -B 5 "num_quadrats" --type py | head -50

Length of output: 4722

🏁 Script executed:

#!/bin/bash # Locate PositiveIntegerValidator definition rg "class PositiveIntegerValidator" -A 20 --type py # Locate QuadratCountValidator definition rg "class QuadratCountValidator" -A 20 --type py

Length of output: 4140

🏁 Script executed:

#!/bin/bash # Locate BaseValidator to find get_numeric_value implementation rg "class BaseValidator" -A 50 --type py # Locate get_numeric_value definition rg "def get_numeric_value" -A 20 --type py

Length of output: 3722

Ensure num_quadrats is an integer before using it in range()

Although PositiveIntegerValidator enforces that data.quadrat_transect.num_quadrats is a non-negative integer at the field level, the record-level QuadratCountValidator still retrieves its value via get_numeric_value (which merely returns the raw data). If that raw value isn’t actually an int, calling

range(quadrat_number_start, quadrat_number_start + num_quadrats)

will raise a TypeError.

Please harden the validator in src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (around line 131):

Add an explicit check (isinstance(num_quadrats, int)) or cast (num_quadrats = int(num_quadrats)) with a try/except to catch conversion failures.

If conversion fails, return a validation error (e.g., use self.DIFFERENT_NUMBER_OF_QUADRATS or a new error code) instead of letting an exception bubble up.

Alternatively, wrap the range(...) call in try/except TypeError and return a clear validation result.

This will ensure the record-level check never crashes on unexpected input.

🤖 Prompt for AI Agents

In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around line 115, the variable num_quadrats obtained from get_numeric_value may not be an integer, causing a TypeError when used in range(). To fix this, add an explicit check or attempt to cast num_quadrats to int inside a try/except block. If casting fails, return a validation error using self.DIFFERENT_NUMBER_OF_QUADRATS or a similar error code instead of letting the exception propagate. This ensures the validator handles unexpected input gracefully without crashing.

coderabbitai · 2025-07-15T16:43:18Z

src/api/submission/validations/validators/obs_benthic_photo_quadrat.py

+                num_points = self.get_numeric_value(obs, self.observation_num_points_path)
            except (TypeError, ValueError):
                continue


💡 Verification agent

🧩 Analysis chain

Review exception handling behavior with get_numeric_value.

The get_numeric_value method returns 0 for missing/empty values and returns the value as-is otherwise. If the value is a non-numeric string, it won't raise TypeError or ValueError as the original code might have expected.

This could change the behavior of the try-except block. Previously, if the value couldn't be used in the addition operation quadrat_number_groups[quadrat_number] += num_points, it would catch the exception and continue. Now, non-numeric strings would be returned by get_numeric_value and likely cause the same TypeError in the addition operation.

Verify that this behavior change is intentional and that the exception handling still works as expected.

Let me check what the original get_value method returned and how it was used:

🏁 Script executed:

#!/bin/bash # Check the original get_value method behavior ast-grep --pattern 'def get_value($$$) { $$$ }' # Check for any numeric conversion around num_points usage rg -A 10 -B 5 "num_points.*=" --type py

Length of output: 43054

🏁 Script executed:

#!/bin/bash set -e # Locate definitions of get_numeric_value and get_value rg -n "def get_numeric_value" --type py rg -n "def get_value" --type py

Length of output: 355

🏁 Script executed:

#!/bin/bash set -e # Show get_numeric_value implementation rg -n -C5 "def get_numeric_value" src/api/submission/validations/validators/base.py # Show validator’s get_value implementation rg -n -C5 "def get_value" src/api/submission/validations/validators/base.py # Show utility get_value implementation rg -n -C5 "def get_value" src/api/utils/__init__.py

Length of output: 1268

🏁 Script executed:

#!/bin/bash set -e # Show full get_numeric_value implementation rg -n -C20 "def get_numeric_value" src/api/submission/validations/validators/base.py

Length of output: 993

Evaluate and adjust exception handling around get_numeric_value.

The new get_numeric_value implementation in src/api/submission/validations/validators/base.py (lines 127–135) always returns an integer (0 for missing/empty) or the raw value and never raises TypeError or ValueError. Consequently:

The except (TypeError, ValueError) around the call to get_numeric_value in
src/api/submission/validations/validators/obs_benthic_photo_quadrat.py (lines 36–38) is now redundant.

Any TypeError from adding a non-integer (if it ever occurs) would surface at
quadrat_number_groups[quadrat_number] += num_points and not be caught.

Since all ingestion serializers guarantee integer num_points, this is safe in practice. However, to keep the code defensively correct, you should either:

Remove the unused try/except block around get_numeric_value, or

Broaden the try to wrap both the call and the addition, or

Add an explicit type check on num_points before the addition.

Locations to update:

src/api/submission/validations/validators/obs_benthic_photo_quadrat.py: lines 36–38

🤖 Prompt for AI Agents

In src/api/submission/validations/validators/obs_benthic_photo_quadrat.py around lines 36 to 38, the try/except block catching TypeError and ValueError around get_numeric_value is now redundant because get_numeric_value never raises these exceptions. To fix this, remove the try/except block and either rely on the guaranteed integer return or add a type check on num_points before using it in the addition to quadrat_number_groups. This will keep the code clean and defensively correct.

handle empty strings during validation for numeric values

3ce2d14

saanobhaai requested a review from gridcell July 15, 2025 16:34

coderabbitai bot reviewed Jul 15, 2025

View reviewed changes

gridcell approved these changes Jul 15, 2025

View reviewed changes

saanobhaai merged commit bf8ea7d into master Jul 15, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[M1580] handle empty strings during validation for numeric values#606

[M1580] handle empty strings during validation for numeric values#606
saanobhaai merged 1 commit intomasterfrom
M1580

saanobhaai commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 15, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

saanobhaai commented Jul 15, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 15, 2025

Uh oh!

coderabbitai bot Jul 15, 2025

Uh oh!

coderabbitai bot Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
		interval_size = self.get_numeric_value(collect_record, self.interval_size_path)

-        len_surveyed = self.get_numeric_value(collect_record, self.len_surveyed_path)
-        interval_size = self.get_numeric_value(collect_record, self.interval_size_path)
+        # ensure we have actual numbers before doing math
+        try:
+            len_surveyed = float(self.get_numeric_value(collect_record, self.len_surveyed_path))
+        except (TypeError, ValueError):
+            return ERROR, self.NON_POSITIVE.format("len_surveyed")
+        try:
+            interval_size = float(self.get_numeric_value(collect_record, self.interval_size_path))
+        except (TypeError, ValueError):
+            return ERROR, self.NON_POSITIVE.format("interval_size")

Conversation

saanobhaai commented Jul 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

saanobhaai commented Jul 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saanobhaai commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 15, 2025 •

edited

Loading