Live-check reporting improvements by lmolkova · Pull Request #943 · open-telemetry/weaver

lmolkova · 2025-09-17T03:20:25Z

This change is an update of #923 and simplifies reporting with weaver.

The goal is to print short violation-only reports like (which I can now do using custom weaver.yaml and templates)

Violations:
- [required_attribute_not_present] Required attribute `server.port` is not present. (3 occurrence(s)
 on metric `http.client.request.duration`)
- [missing_attribute] Attribute `asgi.event.type` does not exist in the registry. (2 occurrence(s))
- [deprecated] Attribute `db.name` is deprecated; reason = renamed, note = Replaced by `db.namespace`. (1 occurrence(s))
- [deprecated] Attribute `db.statement` is deprecated; reason = renamed, note = Replaced by `db.query.text`. (1 occurrence(s))
- [deprecated] Attribute `db.system` is deprecated; reason = renamed, note = Replaced by `db.system.name`. (1 occurrence(s))
- [deprecated] Attribute `db.user` is deprecated; reason = obsoleted, note = Removed, no replacement at this time. (1 occurrence(s))
- [deprecated] Attribute `net.peer.name` is deprecated; reason = uncategorized, note = Replaced by `server.address` on client spans and `client.address` on server spans. (1 occurrence(s))
- [deprecated] Attribute `net.peer.port` is deprecated; reason = uncategorized, note = Replaced by `server.port` on client spans and `client.port` on server spans. (1 occurrence(s))
- [deprecated] Attribute `net.transport` is deprecated; reason = renamed, note = Replaced by `network.transport`. (1 occurrence(s))

Seen: 12 metric(s), 5 span(s), 0 log(s), 4 resource(s)

Changes:

Makes advices self-contained: advices now include signal name and type, attribute name (when known / applicable). It simplifies JQ A LOT if I want to include context into the report. E.g. I can do .. | objects | select(has("live_check_result")) and it'd return all advices regardless of their nesting with full details
Advice message is plain english that describes violation. Then consumers don't need to custom-format or interpret the advice
Advice values are structured. E.g. attribute_name = test.me is better than test.me with meaning that depends on the context.

Default report update:

New:

Old:

…rovements

codecov · 2025-09-17T03:38:01Z

Codecov Report

❌ Patch coverage is 85.53459% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.9%. Comparing base (431204b) to head (276e4cf).

Files with missing lines	Patch %	Lines
crates/weaver_live_check/src/advice.rs	79.7%	16 Missing ⚠️
crates/weaver_live_check/src/lib.rs	76.6%	7 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main    #943   +/-   ##
=====================================
  Coverage   77.8%   77.9%           
=====================================
  Files         76      76           
  Lines       5941    6016   +75     
=====================================
+ Hits        4626    4689   +63     
- Misses      1315    1327   +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jerbly · 2025-09-17T13:00:32Z

Just had a quick look... I was thinking for this that, instead of changing the structure we would enhance the statistics collection. What you're doing here is a count of each violation type (with its message and other context).

We already have advice_type_counts that does this in the stats but is only collecting the advice_type and not more detail. https://github.com/open-telemetry/weaver/blob/main/crates/weaver_live_check/src/lib.rs#L315-L326

  "advice_type_counts": {
    "stability": 2
  },

We can then have a jinja template on the output for just the summary.

lmolkova · 2025-09-17T13:50:31Z

Having self-contained advices allows to use JQ to build whatever statistics you want. It also would be necessary if we end up writing advices as its own telemetry. I don't mind incorporating more stats and more detailed stats, but I don't see why having more info in the advice is controversial.

crates/weaver_live_check/src/sample_attribute.rs

jerbly · 2025-09-17T15:42:09Z

I don't see why having more info in the advice is controversial

Not controversial. I'm suggesting that, yes, add more info to the advice (provided we don't break the default ANSI template). But, maybe also for this specific case you could enhance the stats object to include all the detail needed for this summary. Then you'll need close to zero JQ hopefully. Since we're doing this aggregation already for the stats we could have an struct there instead of just a count.

lmolkova · 2025-09-18T01:44:29Z

But, maybe also for this specific case you could enhance the stats object to include all the detail needed for this summary. > Then you'll need close to zero JQ hopefully. Since we're doing this aggregation already for the stats we could have an struct there instead of just a count.

I'm less interested in a specific case, but more in the ability to generate whatever reports and aggregate in whatever way. I hope eventually we can find the golden path, but for now I don't mind a bit of not complicated jq AI can write on the first try. I tried writing JQ over current advices, me and AI both failed miserably.

PS: I updated default ansi template and attached some screenshots in the description. curious what you think @jerbly

jerbly · 2025-09-18T02:20:20Z

I think the messages are more readable with the removal of the advice_type. But we're now repeating information in the message which is adding clutter. e.g. the stability advice for the metric repeats the Metric's name.

If you have advice for an attribute will that also repeat the name in the message?

jerbly · 2025-09-18T02:50:57Z

Another thought. What if we leave the value and message back as it was but add a new verbose_message where you can embed the context in the message text?

lmolkova · 2025-09-22T16:45:14Z

I've removed metric / signal name from messages - they are available in the structured part anyway, so there is no duplication anymore.

The message (today, without this change) is already a duplicate of advice type, just written in plain english and advice value is too contextual - you don't know what to expect there unless you read weaver code and write large switch statement depending on advice type. I'd like to improve this.

jerbly · 2025-09-22T18:33:00Z

That looks better. If you run this, cargo run -- registry live-check --input-source crates/weaver_live_check/data/span.json what does that look like?

lmolkova · 2025-09-22T23:22:52Z

@jerbly

here you go

Resource
    service.name = my_service

Span test `client`
    http.response.status_code = foo
        - [violation] Attribute `http.response.status_code` has type `string`. Type should be `int`.
    aws.s3.bucket = value
        - [improvement] Attribute `aws.s3.bucket` is not stable; stability = development.
    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Extends existing namespace
        - [violation] Namespace matches existing attribute
    task.id = value
        - [violation] Attribute `task.id` does not exist in the registry.
    TaskId = value
        - [violation] Attribute `TaskId` does not exist in the registry.
        - [improvement] Does not have a namespace
        - [violation] Does not match name formatting rules
    aws.s3.extension.name = foo
        - [violation] Attribute `aws.s3.extension.name` does not exist in the registry.
        - [information] Extends existing namespace
    http.request.method = GET
    Span event test_event
        hello = world
            - [violation] Attribute `hello` does not exist in the registry.
            - [improvement] Does not have a namespace
    Span link
        hello = world
            - [violation] Attribute `hello` does not exist in the registry.
            - [improvement] Does not have a namespace

Samples
  - total: 14  - by type:
    - attribute: 10
    - resource: 1
    - span: 1
    - span_event: 1
    - span_link: 1
  - by highest advice level:
    - no advice: 6
    - improvement: 1
    - violation: 7

Advisories given
  - total: 15  - advice level:
    - improvement: 4
    - information: 2
    - violation: 9
  - advice type:
    - extends_namespace: 2
    - illegal_namespace: 1
    - invalid_format: 1
    - missing_attribute: 6
    - missing_namespace: 3
    - not_stable: 1
    - type_mismatch: 1

Registry coverage
  - entities seen: 0.36%

✔ Performed live check for registry `https://github.com/open-telemetry/semantic-conventions.git[model]`

Total execution time: 1.924507458s

lmolkova · 2025-09-22T23:25:01Z

I do see some duplication, but I don't see a problem with it though (I'd also be happy to go and update Does not have a namespace and such to contain full details).

jerbly · 2025-09-23T01:40:17Z

If you were to update those other messages I guess it would then look like this:

    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Attribute `aws.s3.bucket.name` extends existing namespace
        - [violation] Attribute `aws.s3.bucket.name` namespace matches existing attribute

In this view I don't see the need to repeat the attribute name in the message since the messages are in the context of the attribute. Therefore this is preferable isn't it?

    aws.s3.bucket.name = value
        - [violation] Does not exist in the registry.
        - [information] Extends existing namespace
        - [violation] Namespace matches existing attribute

I'm happy with the new messages but let's remove the "Attribute blah" at the beginning, it's not needed.

lmolkova · 2025-09-23T03:07:37Z

It's not needed only when reported under attribute and only when looking at specially formatted report.

What we have today:

advice_type is low-cardinality thing you'd like to see there - it does not contain any context. The only problem is that it's not human-readable
advice_message today is human-readable representation of the advice type, it does not bring any new info, just writes it nicely

It does not make sense to me to have two of them together. Could we merge them? Could advice_type become human readable and then message will be formatted, fully contextualized message that's useful without formatting things in a certain way?

lmolkova · 2025-09-23T03:44:54Z

Tried in cb2208e

Here's the stdout

Resource
    service.name = my_service

Span test `client`
    http.response.status_code = foo
        - [violation] attribute type does not match definition
    aws.s3.bucket = value
        - [improvement] attribute is not stable
    aws.s3.bucket.name = value
        - [violation] attribute does not exist in the registry
        - [information] attribute namespace collides with existing attribute key
        - [violation] attribute namespace collides with existing attribute key
    task.id = value
        - [violation] attribute does not exist in the registry
    TaskId = value
        - [violation] attribute does not exist in the registry
        - [improvement] attribute key does not have a namespace
        - [violation] attribute key format is invalid
    aws.s3.extension.name = foo
        - [violation] attribute does not exist in the registry
        - [information] attribute namespace collides with existing attribute key
    http.request.method = GET
    Span event test_event
        hello = world
            - [violation] attribute does not exist in the registry
            - [improvement] attribute key does not have a namespace
    Span link
        hello = world
            - [violation] attribute does not exist in the registry
            - [improvement] attribute key does not have a namespace

Samples
  - total: 14  - by type:
    - attribute: 10
    - resource: 1
    - span: 1
    - span_event: 1
    - span_link: 1
  - by highest advice level:
    - no advice: 6
    - improvement: 1
    - violation: 7

Advisories given
  - total: 15  - advice level:
    - improvement: 4
    - information: 2
    - violation: 9
  - advice type:
    - attribute does not exist in the registry: 6
    - attribute is not stable: 1
    - attribute key does not have a namespace: 3
    - attribute key format is invalid: 1
    - attribute namespace collides with existing attribute key: 3
    - attribute type does not match definition: 1

Registry coverage
  - entities seen: 0.36%

✔ Performed live check for registry `https://github.com/open-telemetry/semantic-conventions.git[model]`

lmolkova · 2025-09-23T03:49:32Z

BTW my preference would still be on

    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Attribute 'aws.s3.bucket.name' collides with existing namespace 'aws.s3'
        - [violation] Namespace 'aws.s3.bucket' collides with existing attribute key 'aws.s3.bucket.name'

this is baseline (main) - all the duplication is already there, but important details are missing

    aws.s3.bucket.name = value
        - missing_attribute: aws.s3.bucket.name - Does not exist in the registry
        - extends_namespace: aws.s3 - Extends existing namespace                // what is what? there is no indication of what `aws.s3` represents
        - illegal_namespace: aws.s3.bucket - Namespace matches existing attribute // what's `aws.s3.bucket` ?

Human readable text is intended to have redundancy, when I'm reading CI errors, I don't want to decode error messages, I want error to tell me exactly what went wrong with as much context as possible.

this is also how we format messages in semconv policies - https://github.com/open-telemetry/semantic-conventions/blob/e7eb01668175e11a9cf3b7adb57b38603505a448/policies/attribute_name_collisions.rego#L31 - give as much context as reasonable to identify the issue.

jerbly · 2025-09-23T19:56:21Z

OK, I see your point - I think my head has been in this for so long that it's translating the lack of information for me!

So should we:

boost the messages with embedded context (like you've done)
remove advice.value entirely (I think I had visions of doing something like "...collides with existing namespace '{value}'" - but it seems like a lot of work for not much value ;) )
keep advice.advice_type as identifiers only for the JSON representation and grouping stats etc.

crates/weaver_checker/src/lib.rs

crates/weaver_checker/src/violation.rs

crates/weaver_live_check/src/advice.rs

crates/weaver_live_check/src/live_checker.rs

src/util.rs

tests/registry_emit.rs

jerbly · 2025-09-26T00:25:42Z

Looks good. I think we just need to update the crate README: https://github.com/open-telemetry/weaver/tree/main/crates/weaver_live_check#readme

And, we need a breaking change notice in the changelog.

lmolkova · 2025-09-26T02:47:27Z

thanks @jerbly, I updated readme and added changelog.

crates/weaver_live_check/README.md

jerbly · 2025-09-26T10:39:22Z

crates/weaver_live_check/README.md

This rego policy needs updating for advice_context

Co-authored-by: Jeremy Blythe <jeremyblythe@gmail.com>

jerbly

Looks good. Excited to give it a try! Thanks!

lmolkova requested a review from a team as a code owner September 17, 2025 03:20

lmolkova force-pushed the weaver-report branch from 9137d44 to 7acbfb1 Compare September 17, 2025 03:30

Live check: self-contained advice, full message, and other advice imp…

ca772f2

…rovements

lmolkova force-pushed the weaver-report branch from 7acbfb1 to ca772f2 Compare September 17, 2025 03:31

jsuereth added this to OTel Weaver Project Sep 17, 2025

jsuereth moved this to To consider for the next release in OTel Weaver Project Sep 17, 2025

lmolkova commented Sep 17, 2025

View reviewed changes

crates/weaver_live_check/src/sample_attribute.rs Outdated Show resolved Hide resolved

update default live check template

696afae

remove metric name from message

16afae9

Merge branch 'main' into weaver-report

b6d3d08

lmolkova force-pushed the weaver-report branch from d23f249 to cb2208e Compare September 23, 2025 03:42

lmolkova added 2 commits September 23, 2025 16:11

make full-sentence messages

f00e5df

pass parent signal around for better context on the advise and clenaup

586a318

lmolkova force-pushed the weaver-report branch from cb2208e to 586a318 Compare September 24, 2025 01:44

add more context to value

04b02a3