fix: remove invalid chars in prometheus metrics #6065

hanabi1224 · 2025-09-10T23:56:48Z

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Refactor
- Removed an internal type-name utility and its public re-export; cache interfaces now require explicit per-cache identifiers and cache names include those identifiers.
Chore
- Removed an unused dependency from the build configuration.
Tests
- Updated tests to construct caches with explicit identifiers.

coderabbitai · 2025-09-10T23:56:55Z

Walkthrough

Removed the type name utility and its tests, dropped the convert_case dependency, and changed tipset state caches to require explicit per-cache identifier strings — updating call sites and removing previous default/derived cache naming and public re-exports.

Changes

Cohort / File(s)	Summary of Changes
Removed type utility `src/utils/misc/type.rs`	Deleted file containing `pub fn short_type_name<T>() -> Cow<'static, str>` and its tests; removed the short-type-name logic and related tests.
Misc public exports `src/utils/misc/mod.rs`	Removed `mod r#type;` and `pub use r#type::*;`, no longer re-exporting the deleted module.
Build config `Cargo.toml`	Removed dependency `convert_case = "0.8"` from [dependencies].
Tipset state cache (implementation) `src/state_manager/cache.rs`	Changed cache construction to require a `cache_identifier: &str`: `TipsetStateCache::new(cache_identifier)` and `with_size(cache_identifier, cache_size)`; removed `Default` impls; cache naming now uses `tipset_state_{identifier}`; updated internal helpers and tests to pass explicit identifiers.
State manager usage `src/state_manager/mod.rs`	Updated call site: `TipsetStateCache::new()` → `TipsetStateCache::new("state_output")` when initializing the StateManager cache.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant StateManager
  participant TipsetCacheFactory as TipsetStateCache
  participant LRU as LruCacheImpl

  StateManager->>TipsetCacheFactory: new("state_output")
  Note right of TipsetCacheFactory: construct name "tipset_state_state_output"
  TipsetCacheFactory->>LRU: initialize with name & size
  LRU-->>TipsetCacheFactory: cache instance
  TipsetCacheFactory-->>StateManager: TipsetStateCache<StateOutputValue>

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: track TipsetStateCache size with prometheus metrics #5912 — Removes short_type_name and convert_case dep and switches cache naming to explicit identifiers (strong overlap with these changes).

Suggested reviewers

LesnyRumcajs
sudo-shashank

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The PR title claims a fix for "invalid chars in prometheus metrics", but the provided diff and summaries show removing a type utility (short_type_name), dropping the convert_case dependency, and refactoring TipsetStateCache to require cache identifiers; there are no changes touching Prometheus or metric sanitization. Because the title describes an issue not present in the changeset, it misrepresents the PR and will mislead reviewers. Therefore the title does not accurately reflect the primary changes.	Rename the PR to accurately summarise the actual changes (for example: "refactor(state_manager): add per-cache identifiers and remove short_type_name utility; drop convert_case dependency") and update the PR description to list the modified files, API changes, and any breaking changes. If the intended Prometheus fix is missing, either add those changes to this branch or create a separate PR for the metrics fix, and ensure tests and CHANGELOG entries are updated accordingly.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/no-invalid-chars-in-metrics

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/utils/misc/type.rs (3)
7-10: Regex cleanup: avoid zero-length global matches; fix typo; keep intent.

((... )|\s)* can match empty strings; replace_all will still terminate, but this does extra work.

Rename multi_understore_pattern → multi_underscore_pattern.

Lean regexes with the same behavior:
-    let prefix_pattern = lazy_regex::regex!(r#"(([^,:<>]+::)|\s)*"#);
+    // Remove every `segment::` path prefix anywhere (keeps the final type identifiers)
+    let prefix_pattern = lazy_regex::regex!(r#"(?:[A-Za-z_][A-Za-z0-9_]*)::"#);
@@
-    let multi_understore_pattern = lazy_regex::regex!(r#"_{2,}"#);
+    let multi_underscore_pattern = lazy_regex::regex!(r#"_{2,}"#);
@@
-    let n = multi_understore_pattern.replace_all(n.as_ref(), "_");
+    let n = multi_underscore_pattern.replace_all(n.as_ref(), "_");
Also applies to: 14-14

12-18: Harden invariants for Prometheus: non-empty and non-digit-leading names.

If this feeds metric/label names (not just values), enforce Prometheus constraints: only [A-Za-z0-9_], not empty, and not starting with a digit. Current code can yield an empty string for types like () (becomes _ → trimmed to "").

Apply tail-end normalization:
-    n.as_ref()
-        .strip_suffix("_")
-        .unwrap_or(n.as_ref())
-        .to_string()
+    {
+        // Trim both ends (underscores can accumulate at boundaries)
+        let mut s = n.as_ref().trim_matches('_').to_string();
+        if s.is_empty() {
+            s.push_str("Unknown");
+        }
+        if s.starts_with(|c: char| c.is_ascii_digit()) {
+            s.insert(0, '_');
+        }
+        s
+    }
Confirm intended target (metric name vs label value). If it’s a metric or label name, consider an assertion test ensuring:

starts with [A-Za-z_] (metric name can also start with ‘_’),

only [A-Za-z0-9_],

non-empty.

30-31: Tests updated appropriately; consider adding invariants coverage.

LGTM on changed expectations.

Add a couple of edge-case tests to lock in Prometheus-safety invariants:
#[test]
fn test_short_type_name_edge_cases() {
    // Unit type should not produce empty
    let u = short_type_name::<()>();
    assert!(!u.is_empty());

    // If used as (label/metric) name, ensure allowed charset and leading char
    let name = short_type_name::<Option<&'static str>>();
    assert!(name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_'));
    // Metric/label name rule (skip if this is only a label value):
    assert!(name.starts_with(|c: char| c.is_ascii_alphabetic() || c == '_'));
}
Also applies to: 34-34, 36-36, 39-39

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b4411a and 4c432c8.

📒 Files selected for processing (1)

src/utils/misc/type.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests
GitHub Check: tests-release
GitHub Check: All lint checks
GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS

🔇 Additional comments (1)

src/utils/misc/type.rs (1)
4-6: Breaking API change (Cow<'static, str> → String): confirm external impact or keep compatibility.

Repo scan: only internal callers found — tests at src/utils/misc/type.rs:29–40 and cache usage at src/state_manager/cache.rs:49; none use Cow-specific methods. Because short_type_name is public, this still breaks external consumers. Either keep compatibility (return Cow<'static, str>) or explicitly document the breaking change.

Suggested minimal compatibility diff:
@@
-use std::any::type_name;
+use std::any::type_name;
+use std::borrow::Cow;

-pub fn short_type_name<T>() -> String {
+pub fn short_type_name<T>() -> Cow<'static, str> {
@@
-    n.as_ref()
-        .strip_suffix("_")
-        .unwrap_or(n.as_ref())
-        .to_string()
+    Cow::Owned(
+        n.as_ref()
+            .strip_suffix("_")
+            .unwrap_or(n.as_ref())
+            .to_string()
+    )
}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/state_manager/cache.rs (1)

68-125: Pending-entry leak on compute error; remove from pending when computation fails.

If compute().await returns Err, the (key, mtx) remains in pending forever unless another successful compute happens later. Clean it up on error to avoid unbounded growth.

Apply this diff:

-            CacheLookupStatus::Empty(mtx) => {
-                let _guard = mtx.lock().await;
-                match self.get(key) {
+            CacheLookupStatus::Empty(mtx) => {
+                let _guard = mtx.lock().await;
+                let key_owned = key.clone();
+                match self.get(key) {
                     Some(v) => {
                         // While locking someone else computed the pending task
                         crate::metrics::LRU_CACHE_HIT
                             .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET)
                             .inc();
 
                         Ok(v)
                     }
                     None => {
                         // Entry does not have state computed yet, compute value and fill the cache
                         crate::metrics::LRU_CACHE_MISS
                             .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET)
                             .inc();
 
-                        let value = compute().await?;
-
-                        // Write back to cache, release lock and return value
-                        self.insert(key.clone(), value.clone());
-                        Ok(value)
+                        let res = compute().await;
+                        match res {
+                            Ok(value) => {
+                                // Write back to cache, release lock and return value
+                                self.insert(key_owned.clone(), value.clone());
+                                Ok(value)
+                            }
+                            Err(e) => {
+                                // Remove stale pending entry on failure
+                                self.with_inner(|inner| {
+                                    inner.pending.retain(|(k, _)| k != &key_owned);
+                                });
+                                Err(e)
+                            }
+                        }
                     }
                 }
             }

🧹 Nitpick comments (2)

src/state_manager/cache.rs (2)
31-33: Sanitize identifiers before constructing metric names.

Call sites currently pass safe literals, but this API allows arbitrary &str. Guard against regressions by normalizing to [a-z0-9_], lowering case, and replacing any other char with '' before composing "tipset_state{...}".

Apply this diff to enforce sanitization at the point of name construction:
-    fn cache_name(cache_identifier: &str) -> String {
-        format!("tipset_state_{cache_identifier}")
-    }
+    fn cache_name(cache_identifier: &str) -> String {
+        let mut out = String::with_capacity("tipset_state_".len() + cache_identifier.len());
+        out.push_str("tipset_state_");
+        for ch in cache_identifier.chars() {
+            match ch {
+                'a'..='z' | '0'..='9' | '_' => out.push(ch),
+                'A'..='Z' => out.push(ch.to_ascii_lowercase()),
+                _ => out.push('_'),
+            }
+        }
+        out
+    }
47-58: API change looks good; consider documenting allowed identifier charset.

Add a brief rustdoc noting the identifier should be [a-z0-9_]+ to remain Prometheus-safe.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c432c8 and 2daa377.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

Cargo.toml (0 hunks)
src/state_manager/cache.rs (6 hunks)
src/state_manager/mod.rs (1 hunks)
src/utils/misc/mod.rs (0 hunks)
src/utils/misc/type.rs (0 hunks)

💤 Files with no reviewable changes (3)

Cargo.toml
src/utils/misc/mod.rs
src/utils/misc/type.rs

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

🧬 Code graph analysis (2)

src/state_manager/mod.rs (2)

src/chain/store/index.rs (1)

new (42-48)

src/chain/store/chain_store.rs (2)

new (119-145)

new (566-573)

src/state_manager/cache.rs (2)

src/utils/cache/lru.rs (2)

new_with_default_metrics_registry (96-101)

cache (120-122)

src/chain/store/index.rs (1)

new (42-48)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: tests-release
GitHub Check: tests
GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: cargo-publish-dry-run
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: All lint checks

🔇 Additional comments (5)

src/state_manager/cache.rs (4)
21-29: Per-cache identifiers: good step toward Prometheus-safe names.

Explicit identifiers remove type-derived punctuation from metric names/labels and improve observability hygiene.

175-186: Good: stable, lowercase identifiers for per-cache metrics.

"events" and "receipts" are Prometheus-safe and consistent.

325-341: Tests updated to use explicit identifiers: LGTM.

Covers basic, same-key concurrent, and different-keys concurrent paths with the new constructor.

Also applies to: 345-388, 392-430

21-29: Verify TipsetStateCache::new/with_size usage and metric names

rg returned "No files were searched" — verification incomplete. Run locally (or grant repo access) and paste outputs:
rg -nP '\bTipsetStateCache::new\s*$\s*$' --hidden -S || true
rg -nP '\bTipsetStateCache::with_size\s*$' -n -C2 --hidden -S || true
rg -nP 'new_with_default_metrics_registry\([^)]*$' -n -C2 --hidden -S || true
rg -nP 'new_with_default_metrics_registry\(\s*"[^"]*[-./ <>%{}()#$@!~`+=,;][^"]*"' -n --hidden -S || true
Check: src/state_manager/cache.rs lines 21-29, 31-33, 47-58, 175-186.
src/state_manager/mod.rs (1)

203-206: Explicit state cache identifier: good.

"state_output" keeps the metric label/name clean and consistent.

hanabi1224 marked this pull request as ready for review September 10, 2025 23:57

hanabi1224 requested a review from a team as a code owner September 10, 2025 23:57

hanabi1224 requested review from LesnyRumcajs and sudo-shashank and removed request for a team September 10, 2025 23:57

hanabi1224 marked this pull request as draft September 11, 2025 00:03

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

hanabi1224 force-pushed the hm/no-invalid-chars-in-metrics branch from 4c432c8 to 7327a34 Compare September 11, 2025 00:48

fix: remove invalid chars in prometheus metrics

2daa377

hanabi1224 force-pushed the hm/no-invalid-chars-in-metrics branch from 7327a34 to 2daa377 Compare September 11, 2025 01:08

hanabi1224 marked this pull request as ready for review September 11, 2025 01:08

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

LesnyRumcajs approved these changes Sep 11, 2025

View reviewed changes

hanabi1224 enabled auto-merge September 11, 2025 11:47

Merge branch 'main' into hm/no-invalid-chars-in-metrics

70ce452

sudo-shashank approved these changes Sep 11, 2025

View reviewed changes

hanabi1224 added this pull request to the merge queue Sep 11, 2025

Merged via the queue into main with commit 3a209c8 Sep 11, 2025
39 checks passed

hanabi1224 deleted the hm/no-invalid-chars-in-metrics branch September 11, 2025 13:15

Signor1 pushed a commit to Signor1/forest that referenced this pull request Sep 12, 2025

fix: remove invalid chars in prometheus metrics (ChainSafe#6065)

b72521f

LesnyRumcajs mentioned this pull request Sep 15, 2025

Integration tests with Grafana scraper #6064

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remove invalid chars in prometheus metrics #6065

fix: remove invalid chars in prometheus metrics #6065

Uh oh!

hanabi1224 commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: remove invalid chars in prometheus metrics #6065

fix: remove invalid chars in prometheus metrics #6065

Uh oh!

Conversation

hanabi1224 commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks (1 passed, 2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanabi1224 commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 10, 2025 •

edited

Loading