-
Notifications
You must be signed in to change notification settings - Fork 182
fix: remove invalid chars in prometheus metrics #6065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoved the type name utility and its tests, dropped the convert_case dependency, and changed tipset state caches to require explicit per-cache identifier strings — updating call sites and removing previous default/derived cache naming and public re-exports. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant StateManager
participant TipsetCacheFactory as TipsetStateCache
participant LRU as LruCacheImpl
StateManager->>TipsetCacheFactory: new("state_output")
Note right of TipsetCacheFactory: construct name "tipset_state_state_output"
TipsetCacheFactory->>LRU: initialize with name & size
LRU-->>TipsetCacheFactory: cache instance
TipsetCacheFactory-->>StateManager: TipsetStateCache<StateOutputValue>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing Touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/utils/misc/type.rs (3)
7-10: Regex cleanup: avoid zero-length global matches; fix typo; keep intent.
((... )|\s)*can match empty strings;replace_allwill still terminate, but this does extra work.- Rename
multi_understore_pattern→multi_underscore_pattern.Lean regexes with the same behavior:
- let prefix_pattern = lazy_regex::regex!(r#"(([^,:<>]+::)|\s)*"#); + // Remove every `segment::` path prefix anywhere (keeps the final type identifiers) + let prefix_pattern = lazy_regex::regex!(r#"(?:[A-Za-z_][A-Za-z0-9_]*)::"#); @@ - let multi_understore_pattern = lazy_regex::regex!(r#"_{2,}"#); + let multi_underscore_pattern = lazy_regex::regex!(r#"_{2,}"#); @@ - let n = multi_understore_pattern.replace_all(n.as_ref(), "_"); + let n = multi_underscore_pattern.replace_all(n.as_ref(), "_");Also applies to: 14-14
12-18: Harden invariants for Prometheus: non-empty and non-digit-leading names.If this feeds metric/label names (not just values), enforce Prometheus constraints: only [A-Za-z0-9_], not empty, and not starting with a digit. Current code can yield an empty string for types like
()(becomes_→ trimmed to"").Apply tail-end normalization:
- n.as_ref() - .strip_suffix("_") - .unwrap_or(n.as_ref()) - .to_string() + { + // Trim both ends (underscores can accumulate at boundaries) + let mut s = n.as_ref().trim_matches('_').to_string(); + if s.is_empty() { + s.push_str("Unknown"); + } + if s.starts_with(|c: char| c.is_ascii_digit()) { + s.insert(0, '_'); + } + s + }Confirm intended target (metric name vs label value). If it’s a metric or label name, consider an assertion test ensuring:
- starts with [A-Za-z_] (metric name can also start with ‘_’),
- only [A-Za-z0-9_],
- non-empty.
30-31: Tests updated appropriately; consider adding invariants coverage.LGTM on changed expectations.
Add a couple of edge-case tests to lock in Prometheus-safety invariants:
#[test] fn test_short_type_name_edge_cases() { // Unit type should not produce empty let u = short_type_name::<()>(); assert!(!u.is_empty()); // If used as (label/metric) name, ensure allowed charset and leading char let name = short_type_name::<Option<&'static str>>(); assert!(name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_')); // Metric/label name rule (skip if this is only a label value): assert!(name.starts_with(|c: char| c.is_ascii_alphabetic() || c == '_')); }Also applies to: 34-34, 36-36, 39-39
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utils/misc/type.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: All lint checks
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
🔇 Additional comments (1)
src/utils/misc/type.rs (1)
4-6: Breaking API change (Cow<'static, str> → String): confirm external impact or keep compatibility.Repo scan: only internal callers found — tests at src/utils/misc/type.rs:29–40 and cache usage at src/state_manager/cache.rs:49; none use Cow-specific methods. Because short_type_name is public, this still breaks external consumers. Either keep compatibility (return Cow<'static, str>) or explicitly document the breaking change.
Suggested minimal compatibility diff:
@@ -use std::any::type_name; +use std::any::type_name; +use std::borrow::Cow; -pub fn short_type_name<T>() -> String { +pub fn short_type_name<T>() -> Cow<'static, str> { @@ - n.as_ref() - .strip_suffix("_") - .unwrap_or(n.as_ref()) - .to_string() + Cow::Owned( + n.as_ref() + .strip_suffix("_") + .unwrap_or(n.as_ref()) + .to_string() + ) }
4c432c8 to
7327a34
Compare
7327a34 to
2daa377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/state_manager/cache.rs (1)
68-125: Pending-entry leak on compute error; remove frompendingwhen computation fails.If
compute().awaitreturns Err, the(key, mtx)remains inpendingforever unless another successful compute happens later. Clean it up on error to avoid unbounded growth.Apply this diff:
- CacheLookupStatus::Empty(mtx) => { - let _guard = mtx.lock().await; - match self.get(key) { + CacheLookupStatus::Empty(mtx) => { + let _guard = mtx.lock().await; + let key_owned = key.clone(); + match self.get(key) { Some(v) => { // While locking someone else computed the pending task crate::metrics::LRU_CACHE_HIT .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET) .inc(); Ok(v) } None => { // Entry does not have state computed yet, compute value and fill the cache crate::metrics::LRU_CACHE_MISS .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET) .inc(); - let value = compute().await?; - - // Write back to cache, release lock and return value - self.insert(key.clone(), value.clone()); - Ok(value) + let res = compute().await; + match res { + Ok(value) => { + // Write back to cache, release lock and return value + self.insert(key_owned.clone(), value.clone()); + Ok(value) + } + Err(e) => { + // Remove stale pending entry on failure + self.with_inner(|inner| { + inner.pending.retain(|(k, _)| k != &key_owned); + }); + Err(e) + } + } } } }
🧹 Nitpick comments (2)
src/state_manager/cache.rs (2)
31-33: Sanitize identifiers before constructing metric names.Call sites currently pass safe literals, but this API allows arbitrary &str. Guard against regressions by normalizing to [a-z0-9_], lowering case, and replacing any other char with '' before composing "tipset_state{...}".
Apply this diff to enforce sanitization at the point of name construction:
- fn cache_name(cache_identifier: &str) -> String { - format!("tipset_state_{cache_identifier}") - } + fn cache_name(cache_identifier: &str) -> String { + let mut out = String::with_capacity("tipset_state_".len() + cache_identifier.len()); + out.push_str("tipset_state_"); + for ch in cache_identifier.chars() { + match ch { + 'a'..='z' | '0'..='9' | '_' => out.push(ch), + 'A'..='Z' => out.push(ch.to_ascii_lowercase()), + _ => out.push('_'), + } + } + out + }
47-58: API change looks good; consider documenting allowed identifier charset.Add a brief rustdoc noting the identifier should be [a-z0-9_]+ to remain Prometheus-safe.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.toml(0 hunks)src/state_manager/cache.rs(6 hunks)src/state_manager/mod.rs(1 hunks)src/utils/misc/mod.rs(0 hunks)src/utils/misc/type.rs(0 hunks)
💤 Files with no reviewable changes (3)
- Cargo.toml
- src/utils/misc/mod.rs
- src/utils/misc/type.rs
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (2)
src/state_manager/mod.rs (2)
src/chain/store/index.rs (1)
new(42-48)src/chain/store/chain_store.rs (2)
new(119-145)new(566-573)
src/state_manager/cache.rs (2)
src/utils/cache/lru.rs (2)
new_with_default_metrics_registry(96-101)cache(120-122)src/chain/store/index.rs (1)
new(42-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: All lint checks
🔇 Additional comments (5)
src/state_manager/cache.rs (4)
21-29: Per-cache identifiers: good step toward Prometheus-safe names.Explicit identifiers remove type-derived punctuation from metric names/labels and improve observability hygiene.
175-186: Good: stable, lowercase identifiers for per-cache metrics."events" and "receipts" are Prometheus-safe and consistent.
325-341: Tests updated to use explicit identifiers: LGTM.Covers basic, same-key concurrent, and different-keys concurrent paths with the new constructor.
Also applies to: 345-388, 392-430
21-29: Verify TipsetStateCache::new/with_size usage and metric namesrg returned "No files were searched" — verification incomplete. Run locally (or grant repo access) and paste outputs:
rg -nP '\bTipsetStateCache::new\s*\(\s*\)' --hidden -S || true rg -nP '\bTipsetStateCache::with_size\s*\(' -n -C2 --hidden -S || true rg -nP 'new_with_default_metrics_registry\([^)]*\)' -n -C2 --hidden -S || true rg -nP 'new_with_default_metrics_registry\(\s*"[^"]*[-./ <>%{}()#$@!~`+=,;][^"]*"' -n --hidden -S || trueCheck: src/state_manager/cache.rs lines 21-29, 31-33, 47-58, 175-186.
src/state_manager/mod.rs (1)
203-206: Explicit state cache identifier: good."state_output" keeps the metric label/name clean and consistent.
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit