fix: add input validation to prevent unbounded storage writes#94
Conversation
WalkthroughAdds a new internal validation module and applies pre-insert checks for string length and serialized JSON size across group, message, and welcome storage; error handling updated to include a Validation variant and tests added/adjusted to cover new limits. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (.cursor/rules/rust.mdc)
Files:
🧬 Code graph analysis (1)crates/mdk-sqlite-storage/src/error.rs (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/mdk-sqlite-storage/src/groups.rs (1)
82-106: Consider serializing admin_pubkeys before acquiring the lock.The group name and description validations correctly occur before acquiring the lock (lines 82-91), which is efficient. However, the admin_pubkeys serialization happens after acquiring the lock at line 93 (serialization at lines 95-98, validation at 101-106).
For consistency with
messages.rsandwelcomes.rs, consider moving the serialization and validation of admin_pubkeys before the lock acquisition. This would avoid holding the lock during serialization and validation operations.🔎 Suggested refactor
- // Validate group name and description lengths validate_string_length(&group.name, MAX_GROUP_NAME_LENGTH, "Group name") .map_err(GroupError::InvalidParameters)?; validate_string_length( &group.description, MAX_GROUP_DESCRIPTION_LENGTH, "Group description", ) .map_err(GroupError::InvalidParameters)?; - let conn_guard = self.db_connection.lock().map_err(into_group_err)?; - + // Serialize and validate admin pubkeys before acquiring lock let admin_pubkeys_json: String = serde_json::to_string(&group.admin_pubkeys).map_err(|e| { GroupError::DatabaseError(format!("Failed to serialize admin pubkeys: {}", e)) })?; - // Validate admin pubkeys JSON size validate_size( admin_pubkeys_json.as_bytes(), MAX_ADMIN_PUBKEYS_JSON_SIZE, "Admin pubkeys JSON", ) .map_err(GroupError::InvalidParameters)?; + let conn_guard = self.db_connection.lock().map_err(into_group_err)?; + let last_message_id: Option<&[u8; 32]> = group.last_message_id.as_ref().map(|id| id.as_bytes());
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
crates/mdk-sqlite-storage/src/groups.rscrates/mdk-sqlite-storage/src/lib.rscrates/mdk-sqlite-storage/src/messages.rscrates/mdk-sqlite-storage/src/validation.rscrates/mdk-sqlite-storage/src/welcomes.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.cursor/rules/rust.mdc)
**/*.rs: All trait bounds should be written inwhereclauses rather than inline with type parameters in Rust
UseSelfinstead of the explicit type name when referring to the type in impl blocks
DeriveDebug,Clone,Copy,PartialEq,Eq, andHashfor public types when possible, in that order
DeriveDefaultfor public types when there is a reasonable default value
Always use full paths for logging withtracing::<op>!(...)instead of importing and using<op>!(...)
Prefer.to_string()or.to_owned()for&strtoStringconversion instead of.into()orString::from()
Order imports in the following sequence: (1) core/alloc/std, (2) external crates, (3) current sub-modules with mod declarations, (4) internal crate modules
Import thefmtmodule rather than using full paths when implementing traits fromcore::fmtorstd::fmt
Use relativeself::imports for sub-modules rather than bare module names
Place allusestatements at the top of the file following the specified import order, never inside functions, methods, or other blocks
Avoidif let ... { } else { }constructs; usematchinstead when both branches are non-empty
Useif let ... { }without an else clause when a match arm would be intentionally empty
Define sub-modules using separate files (e.g.,x.rs) withmod x;declaration rather than inlinemod x { .. }blocks, except fortestsandbenchesmodules
Use inlinemod tests { #[cfg(test)] ... }for test modules rather than creating separatetests.rsfiles
Use inlinemod benches { #[cfg(bench)] ... }for benchmark modules rather than creating separatebenches.rsfiles
Files:
crates/mdk-sqlite-storage/src/welcomes.rscrates/mdk-sqlite-storage/src/messages.rscrates/mdk-sqlite-storage/src/lib.rscrates/mdk-sqlite-storage/src/groups.rscrates/mdk-sqlite-storage/src/validation.rs
🧠 Learnings (1)
📚 Learning: 2025-12-03T10:18:10.432Z
Learnt from: CR
Repo: marmot-protocol/mdk PR: 0
File: .cursor/rules/rust.mdc:0-0
Timestamp: 2025-12-03T10:18:10.432Z
Learning: Applies to **/*.rs : Use inline `mod tests { #[cfg(test)] ... }` for test modules rather than creating separate `tests.rs` files
Applied to files:
crates/mdk-sqlite-storage/src/welcomes.rs
🧬 Code graph analysis (1)
crates/mdk-sqlite-storage/src/messages.rs (1)
crates/mdk-sqlite-storage/src/validation.rs (2)
validate_size(28-38)validate_string_length(42-52)
🔇 Additional comments (8)
crates/mdk-sqlite-storage/src/lib.rs (1)
27-27: LGTM: Internal validation module added.The module declaration follows the coding guideline for separate file module declarations and is appropriately placed with other internal modules.
crates/mdk-sqlite-storage/src/groups.rs (2)
14-17: LGTM: Validation imports added.The imports follow the coding guidelines with correct import order and path usage.
368-434: LGTM: Comprehensive validation tests.The tests effectively verify that oversized group names and descriptions are rejected with appropriate error messages. The test coverage adequately validates the new input constraints.
crates/mdk-sqlite-storage/src/messages.rs (2)
25-48: LGTM: Validation logic correctly prevents unbounded writes.The validation pattern correctly enforces size limits before database operations:
- Message content is validated before lock acquisition (efficient)
- Tags and event JSON are serialized and validated after lock acquisition but before DB write (acceptable since serialization is fast)
- Error messages are informative and include actual vs. maximum sizes
258-316: LGTM: Test validates content size enforcement.The test effectively verifies that message content exceeding the 1MB limit is rejected with an appropriate error message.
crates/mdk-sqlite-storage/src/welcomes.rs (2)
26-71: LGTM: Validation logic prevents unbounded storage writes.The validation pattern correctly enforces size limits:
- Group name and description are validated before lock acquisition (efficient)
- JSON serialization and validation occur after lock acquisition but before DB write (acceptable)
- All validations use appropriate error types and messages
267-318: LGTM: Test validates group name length enforcement.The test effectively verifies that welcome messages with oversized group names are rejected with appropriate error messages.
crates/mdk-sqlite-storage/src/validation.rs (1)
26-52: LGTM: Validation functions are well-implemented.The validation helper functions are:
- Appropriately marked
#[inline]for performance in hot paths- Simple and focused on single responsibilities
- Provide clear, informative error messages with actual vs. expected values
- Follow a consistent error message format
✅ Coverage: 86.12% → 86.28% (+0.16%) |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/mdk-storage-traits/tests/shared/group_tests.rs (1)
3-11: Reorder imports to follow coding guidelines.The imports should follow this sequence: (1) core/alloc/std, (2) external crates, (3) current sub-modules with mod declarations, (4) internal crate modules. Currently, the nostr import (external) appears after mdk_storage_traits imports (internal).
As per coding guidelines, the imports should be reordered.
🔎 Proposed reordering
use std::collections::BTreeSet; +use nostr::RelayUrl; + +use super::create_test_group; use mdk_storage_traits::GroupId; use mdk_storage_traits::groups::GroupStorage; use mdk_storage_traits::groups::error::GroupError; use mdk_storage_traits::groups::types::GroupExporterSecret; -use nostr::RelayUrl; - -use super::create_test_group;
🧹 Nitpick comments (1)
crates/mdk-storage-traits/tests/shared/group_tests.rs (1)
283-291: Test updated appropriately for new validation limits.The change from 1000 to 250 characters correctly aligns with the new MAX_GROUP_NAME_LENGTH = 255 validation. The test now verifies that names within the limit work as expected.
Optional: Consider testing closer to the boundary.
The test uses 250 bytes, which leaves some margin. Consider testing at exactly 255 bytes to ensure the boundary condition works correctly:
🔎 Optional boundary test suggestion
- // Test saving group with long name (within limits) - let long_name = "a".repeat(250); // Within 255 byte limit + // Test saving group with name at maximum length + let long_name = "a".repeat(255); // Exactly at 255 byte limitAdditionally, consider adding a test case with multi-byte UTF-8 characters to verify that the validation correctly checks byte length rather than character count. For example, a string with 128 two-byte characters (256 bytes total) should be rejected even though it's only 128 characters.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/mdk-sqlite-storage/src/validation.rscrates/mdk-storage-traits/tests/shared/group_tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/mdk-sqlite-storage/src/validation.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.cursor/rules/rust.mdc)
**/*.rs: All trait bounds should be written inwhereclauses rather than inline with type parameters in Rust
UseSelfinstead of the explicit type name when referring to the type in impl blocks
DeriveDebug,Clone,Copy,PartialEq,Eq, andHashfor public types when possible, in that order
DeriveDefaultfor public types when there is a reasonable default value
Always use full paths for logging withtracing::<op>!(...)instead of importing and using<op>!(...)
Prefer.to_string()or.to_owned()for&strtoStringconversion instead of.into()orString::from()
Order imports in the following sequence: (1) core/alloc/std, (2) external crates, (3) current sub-modules with mod declarations, (4) internal crate modules
Import thefmtmodule rather than using full paths when implementing traits fromcore::fmtorstd::fmt
Use relativeself::imports for sub-modules rather than bare module names
Place allusestatements at the top of the file following the specified import order, never inside functions, methods, or other blocks
Avoidif let ... { } else { }constructs; usematchinstead when both branches are non-empty
Useif let ... { }without an else clause when a match arm would be intentionally empty
Define sub-modules using separate files (e.g.,x.rs) withmod x;declaration rather than inlinemod x { .. }blocks, except fortestsandbenchesmodules
Use inlinemod tests { #[cfg(test)] ... }for test modules rather than creating separatetests.rsfiles
Use inlinemod benches { #[cfg(bench)] ... }for benchmark modules rather than creating separatebenches.rsfiles
Files:
crates/mdk-storage-traits/tests/shared/group_tests.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test Coverage
85cdfc3 to
785e442
Compare
✅ Coverage: 86.12% → 86.28% (+0.16%) |
✅ Coverage: 86.12% → 86.25% (+0.13%) |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/mdk-sqlite-storage/src/validation.rs (1)
26-39: Consider deriving additional traits for ValidationError.The current derives are minimal (Debug via thiserror::Error). According to coding guidelines, consider adding
Clone,PartialEq, andEqfor improved usability in testing and error handling contexts.🔎 Suggested trait additions
-#[derive(Debug, thiserror::Error)] +#[derive(Debug, Clone, PartialEq, Eq, thiserror::Error)] pub enum ValidationError {As per coding guidelines, derive order should be: Debug, Clone, Copy, PartialEq, Eq, Hash (omitting Copy and Hash here since String is not Copy and Hash isn't needed).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
crates/mdk-sqlite-storage/Cargo.tomlcrates/mdk-sqlite-storage/src/groups.rscrates/mdk-sqlite-storage/src/messages.rscrates/mdk-sqlite-storage/src/validation.rscrates/mdk-sqlite-storage/src/welcomes.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/mdk-sqlite-storage/src/welcomes.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.cursor/rules/rust.mdc)
**/*.rs: All trait bounds should be written inwhereclauses rather than inline with type parameters in Rust
UseSelfinstead of the explicit type name when referring to the type in impl blocks
DeriveDebug,Clone,Copy,PartialEq,Eq, andHashfor public types when possible, in that order
DeriveDefaultfor public types when there is a reasonable default value
Always use full paths for logging withtracing::<op>!(...)instead of importing and using<op>!(...)
Prefer.to_string()or.to_owned()for&strtoStringconversion instead of.into()orString::from()
Order imports in the following sequence: (1) core/alloc/std, (2) external crates, (3) current sub-modules with mod declarations, (4) internal crate modules
Import thefmtmodule rather than using full paths when implementing traits fromcore::fmtorstd::fmt
Use relativeself::imports for sub-modules rather than bare module names
Place allusestatements at the top of the file following the specified import order, never inside functions, methods, or other blocks
Avoidif let ... { } else { }constructs; usematchinstead when both branches are non-empty
Useif let ... { }without an else clause when a match arm would be intentionally empty
Define sub-modules using separate files (e.g.,x.rs) withmod x;declaration rather than inlinemod x { .. }blocks, except fortestsandbenchesmodules
Use inlinemod tests { #[cfg(test)] ... }for test modules rather than creating separatetests.rsfiles
Use inlinemod benches { #[cfg(bench)] ... }for benchmark modules rather than creating separatebenches.rsfiles
Files:
crates/mdk-sqlite-storage/src/messages.rscrates/mdk-sqlite-storage/src/validation.rscrates/mdk-sqlite-storage/src/groups.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test Coverage
🔇 Additional comments (8)
crates/mdk-sqlite-storage/Cargo.toml (1)
23-23: LGTM!The addition of
thiserroras a workspace dependency is appropriate for supporting the newValidationErrortype introduced in the validation module.crates/mdk-sqlite-storage/src/groups.rs (3)
14-17: LGTM!Import order follows the coding guidelines correctly: internal crate modules are imported after external crates.
82-106: Validation logic is correctly implemented.The validation sequence is well-structured:
- Validate string lengths first (name, description)
- Serialize admin_pubkeys to JSON
- Validate serialized JSON size
This ordering ensures early failure on invalid input before expensive serialization, and prevents unbounded JSON from being written to the database.
368-434: Test coverage effectively validates boundary conditions.The tests correctly verify that inputs exceeding limits (256 bytes for 255-byte limit, 2001 bytes for 2000-byte limit) trigger validation errors with appropriate error messages.
crates/mdk-sqlite-storage/src/messages.rs (2)
25-48: Validation implementation is correct and well-sequenced.The validation flow properly checks message content length before proceeding, then validates serialized JSON sizes (tags and event) before database insertion. This prevents unbounded data from reaching the database while maintaining clear error messages for each validation failure.
258-316: Test comprehensively validates content size limits.The test properly sets up all prerequisites (group creation) and verifies that content exceeding 1 MB triggers a validation error with the expected error message.
crates/mdk-sqlite-storage/src/validation.rs (2)
5-24: Size limits are well-chosen for preventing resource exhaustion.The constants provide reasonable upper bounds:
- 1 MB for message content
- 100 KB for JSON structures (tags, events)
- 255/2000 bytes for text fields (name/description)
- 50 KB for admin/relay JSON
These limits balance security (preventing disk/CPU exhaustion) with usability (generous enough for normal use cases).
41-76: Validation functions are correct; note unresolved design discussion.The inline validation helpers correctly check size limits and produce clear error messages. The functions properly handle UTF-8 byte counting (as documented).
However, there's an unresolved discussion from past review comments (lines 41-76) where @dannym-arx and @erskingardner suggested wrapping
ValidationErroras a variant within the domain error enums (e.g.,GroupError,MessageError) rather than converting to string via.map_err(|e| Error::InvalidParameters(e.to_string())).Current approach:
validate_string_length(...).map_err(|e| GroupError::InvalidParameters(e.to_string()))?Suggested alternative:
// Add to GroupError enum: // ValidationFailed(ValidationError) validate_string_length(...).map_err(GroupError::ValidationFailed)?The current string-based approach works but loses type information. Adding a
ValidationErrorvariant to each domain error enum would preserve type safety and enable programmatic error handling.Based on past review comments, should
ValidationErrorbe integrated as an error variant inGroupError,MessageError, etc., rather than converted to string?
✅ Coverage: 86.12% → 86.3% (+0.18%) |
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.