Skip to content

Live-check reporting improvements#943

Merged
jerbly merged 15 commits intoopen-telemetry:mainfrom
lmolkova:weaver-report
Sep 26, 2025
Merged

Live-check reporting improvements#943
jerbly merged 15 commits intoopen-telemetry:mainfrom
lmolkova:weaver-report

Conversation

@lmolkova
Copy link
Member

@lmolkova lmolkova commented Sep 17, 2025

This change is an update of #923 and simplifies reporting with weaver.

The goal is to print short violation-only reports like (which I can now do using custom weaver.yaml and templates)

Violations:
- [required_attribute_not_present] Required attribute `server.port` is not present. (3 occurrence(s)
 on metric `http.client.request.duration`)
- [missing_attribute] Attribute `asgi.event.type` does not exist in the registry. (2 occurrence(s))
- [deprecated] Attribute `db.name` is deprecated; reason = renamed, note = Replaced by `db.namespace`. (1 occurrence(s))
- [deprecated] Attribute `db.statement` is deprecated; reason = renamed, note = Replaced by `db.query.text`. (1 occurrence(s))
- [deprecated] Attribute `db.system` is deprecated; reason = renamed, note = Replaced by `db.system.name`. (1 occurrence(s))
- [deprecated] Attribute `db.user` is deprecated; reason = obsoleted, note = Removed, no replacement at this time. (1 occurrence(s))
- [deprecated] Attribute `net.peer.name` is deprecated; reason = uncategorized, note = Replaced by `server.address` on client spans and `client.address` on server spans. (1 occurrence(s))
- [deprecated] Attribute `net.peer.port` is deprecated; reason = uncategorized, note = Replaced by `server.port` on client spans and `client.port` on server spans. (1 occurrence(s))
- [deprecated] Attribute `net.transport` is deprecated; reason = renamed, note = Replaced by `network.transport`. (1 occurrence(s))

Seen: 12 metric(s), 5 span(s), 0 log(s), 4 resource(s)

Changes:

  1. Makes advices self-contained: advices now include signal name and type, attribute name (when known / applicable). It simplifies JQ A LOT if I want to include context into the report. E.g. I can do .. | objects | select(has("live_check_result")) and it'd return all advices regardless of their nesting with full details
  2. Advice message is plain english that describes violation. Then consumers don't need to custom-format or interpret the advice
  3. Advice values are structured. E.g. attribute_name = test.me is better than test.me with meaning that depends on the context.

Default report update:

New:
image

Old:
image

@lmolkova lmolkova requested a review from a team as a code owner September 17, 2025 03:20
@codecov
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

❌ Patch coverage is 85.53459% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.9%. Comparing base (431204b) to head (276e4cf).

Files with missing lines Patch % Lines
crates/weaver_live_check/src/advice.rs 79.7% 16 Missing ⚠️
crates/weaver_live_check/src/lib.rs 76.6% 7 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #943   +/-   ##
=====================================
  Coverage   77.8%   77.9%           
=====================================
  Files         76      76           
  Lines       5941    6016   +75     
=====================================
+ Hits        4626    4689   +63     
- Misses      1315    1327   +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jerbly
Copy link
Contributor

jerbly commented Sep 17, 2025

Just had a quick look... I was thinking for this that, instead of changing the structure we would enhance the statistics collection. What you're doing here is a count of each violation type (with its message and other context).

We already have advice_type_counts that does this in the stats but is only collecting the advice_type and not more detail. https://github.com/open-telemetry/weaver/blob/main/crates/weaver_live_check/src/lib.rs#L315-L326

  "advice_type_counts": {
    "stability": 2
  },

We can then have a jinja template on the output for just the summary.

@lmolkova
Copy link
Member Author

lmolkova commented Sep 17, 2025

Having self-contained advices allows to use JQ to build whatever statistics you want. It also would be necessary if we end up writing advices as its own telemetry. I don't mind incorporating more stats and more detailed stats, but I don't see why having more info in the advice is controversial.

@jsuereth jsuereth moved this to To consider for the next release in OTel Weaver Project Sep 17, 2025
@jerbly
Copy link
Contributor

jerbly commented Sep 17, 2025

I don't see why having more info in the advice is controversial

Not controversial. I'm suggesting that, yes, add more info to the advice (provided we don't break the default ANSI template). But, maybe also for this specific case you could enhance the stats object to include all the detail needed for this summary. Then you'll need close to zero JQ hopefully. Since we're doing this aggregation already for the stats we could have an struct there instead of just a count.

@lmolkova
Copy link
Member Author

lmolkova commented Sep 18, 2025

But, maybe also for this specific case you could enhance the stats object to include all the detail needed for this summary. > Then you'll need close to zero JQ hopefully. Since we're doing this aggregation already for the stats we could have an struct there instead of just a count.

I'm less interested in a specific case, but more in the ability to generate whatever reports and aggregate in whatever way. I hope eventually we can find the golden path, but for now I don't mind a bit of not complicated jq AI can write on the first try. I tried writing JQ over current advices, me and AI both failed miserably.

PS: I updated default ansi template and attached some screenshots in the description. curious what you think @jerbly

@jerbly
Copy link
Contributor

jerbly commented Sep 18, 2025

I think the messages are more readable with the removal of the advice_type. But we're now repeating information in the message which is adding clutter. e.g. the stability advice for the metric repeats the Metric's name.
image
If you have advice for an attribute will that also repeat the name in the message?

@jerbly
Copy link
Contributor

jerbly commented Sep 18, 2025

Another thought. What if we leave the value and message back as it was but add a new verbose_message where you can embed the context in the message text?

@lmolkova
Copy link
Member Author

I've removed metric / signal name from messages - they are available in the structured part anyway, so there is no duplication anymore.

The message (today, without this change) is already a duplicate of advice type, just written in plain english and advice value is too contextual - you don't know what to expect there unless you read weaver code and write large switch statement depending on advice type. I'd like to improve this.

image

@jerbly
Copy link
Contributor

jerbly commented Sep 22, 2025

That looks better. If you run this, cargo run -- registry live-check --input-source crates/weaver_live_check/data/span.json what does that look like?

@lmolkova
Copy link
Member Author

@jerbly

here you go

Resource
    service.name = my_service

Span test `client`
    http.response.status_code = foo
        - [violation] Attribute `http.response.status_code` has type `string`. Type should be `int`.
    aws.s3.bucket = value
        - [improvement] Attribute `aws.s3.bucket` is not stable; stability = development.
    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Extends existing namespace
        - [violation] Namespace matches existing attribute
    task.id = value
        - [violation] Attribute `task.id` does not exist in the registry.
    TaskId = value
        - [violation] Attribute `TaskId` does not exist in the registry.
        - [improvement] Does not have a namespace
        - [violation] Does not match name formatting rules
    aws.s3.extension.name = foo
        - [violation] Attribute `aws.s3.extension.name` does not exist in the registry.
        - [information] Extends existing namespace
    http.request.method = GET
    Span event test_event
        hello = world
            - [violation] Attribute `hello` does not exist in the registry.
            - [improvement] Does not have a namespace
    Span link
        hello = world
            - [violation] Attribute `hello` does not exist in the registry.
            - [improvement] Does not have a namespace

Samples
  - total: 14  - by type:
    - attribute: 10
    - resource: 1
    - span: 1
    - span_event: 1
    - span_link: 1
  - by highest advice level:
    - no advice: 6
    - improvement: 1
    - violation: 7

Advisories given
  - total: 15  - advice level:
    - improvement: 4
    - information: 2
    - violation: 9
  - advice type:
    - extends_namespace: 2
    - illegal_namespace: 1
    - invalid_format: 1
    - missing_attribute: 6
    - missing_namespace: 3
    - not_stable: 1
    - type_mismatch: 1

Registry coverage
  - entities seen: 0.36%

✔ Performed live check for registry `https://github.com/open-telemetry/semantic-conventions.git[model]`

Total execution time: 1.924507458s

@lmolkova
Copy link
Member Author

I do see some duplication, but I don't see a problem with it though (I'd also be happy to go and update Does not have a namespace and such to contain full details).

@jerbly
Copy link
Contributor

jerbly commented Sep 23, 2025

If you were to update those other messages I guess it would then look like this:

    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Attribute `aws.s3.bucket.name` extends existing namespace
        - [violation] Attribute `aws.s3.bucket.name` namespace matches existing attribute

In this view I don't see the need to repeat the attribute name in the message since the messages are in the context of the attribute. Therefore this is preferable isn't it?

    aws.s3.bucket.name = value
        - [violation] Does not exist in the registry.
        - [information] Extends existing namespace
        - [violation] Namespace matches existing attribute

I'm happy with the new messages but let's remove the "Attribute blah" at the beginning, it's not needed.

@lmolkova
Copy link
Member Author

It's not needed only when reported under attribute and only when looking at specially formatted report.

What we have today:

  • advice_type is low-cardinality thing you'd like to see there - it does not contain any context. The only problem is that it's not human-readable
  • advice_message today is human-readable representation of the advice type, it does not bring any new info, just writes it nicely

It does not make sense to me to have two of them together. Could we merge them? Could advice_type become human readable and then message will be formatted, fully contextualized message that's useful without formatting things in a certain way?

@lmolkova
Copy link
Member Author

Tried in cb2208e

Here's the stdout

Resource
    service.name = my_service

Span test `client`
    http.response.status_code = foo
        - [violation] attribute type does not match definition
    aws.s3.bucket = value
        - [improvement] attribute is not stable
    aws.s3.bucket.name = value
        - [violation] attribute does not exist in the registry
        - [information] attribute namespace collides with existing attribute key
        - [violation] attribute namespace collides with existing attribute key
    task.id = value
        - [violation] attribute does not exist in the registry
    TaskId = value
        - [violation] attribute does not exist in the registry
        - [improvement] attribute key does not have a namespace
        - [violation] attribute key format is invalid
    aws.s3.extension.name = foo
        - [violation] attribute does not exist in the registry
        - [information] attribute namespace collides with existing attribute key
    http.request.method = GET
    Span event test_event
        hello = world
            - [violation] attribute does not exist in the registry
            - [improvement] attribute key does not have a namespace
    Span link
        hello = world
            - [violation] attribute does not exist in the registry
            - [improvement] attribute key does not have a namespace

Samples
  - total: 14  - by type:
    - attribute: 10
    - resource: 1
    - span: 1
    - span_event: 1
    - span_link: 1
  - by highest advice level:
    - no advice: 6
    - improvement: 1
    - violation: 7

Advisories given
  - total: 15  - advice level:
    - improvement: 4
    - information: 2
    - violation: 9
  - advice type:
    - attribute does not exist in the registry: 6
    - attribute is not stable: 1
    - attribute key does not have a namespace: 3
    - attribute key format is invalid: 1
    - attribute namespace collides with existing attribute key: 3
    - attribute type does not match definition: 1

Registry coverage
  - entities seen: 0.36%

✔ Performed live check for registry `https://github.com/open-telemetry/semantic-conventions.git[model]`

@lmolkova
Copy link
Member Author

lmolkova commented Sep 23, 2025

BTW my preference would still be on

    aws.s3.bucket.name = value
        - [violation] Attribute `aws.s3.bucket.name` does not exist in the registry.
        - [information] Attribute 'aws.s3.bucket.name' collides with existing namespace 'aws.s3'
        - [violation] Namespace 'aws.s3.bucket' collides with existing attribute key 'aws.s3.bucket.name'

this is baseline (main) - all the duplication is already there, but important details are missing

    aws.s3.bucket.name = value
        - missing_attribute: aws.s3.bucket.name - Does not exist in the registry
        - extends_namespace: aws.s3 - Extends existing namespace                // what is what? there is no indication of what `aws.s3` represents
        - illegal_namespace: aws.s3.bucket - Namespace matches existing attribute // what's `aws.s3.bucket` ? 

Human readable text is intended to have redundancy, when I'm reading CI errors, I don't want to decode error messages, I want error to tell me exactly what went wrong with as much context as possible.

this is also how we format messages in semconv policies - https://github.com/open-telemetry/semantic-conventions/blob/e7eb01668175e11a9cf3b7adb57b38603505a448/policies/attribute_name_collisions.rego#L31 - give as much context as reasonable to identify the issue.

@jerbly
Copy link
Contributor

jerbly commented Sep 23, 2025

OK, I see your point - I think my head has been in this for so long that it's translating the lack of information for me!

So should we:

  • boost the messages with embedded context (like you've done)
  • remove advice.value entirely (I think I had visions of doing something like "...collides with existing namespace '{value}'" - but it seems like a lot of work for not much value ;) )
  • keep advice.advice_type as identifiers only for the JSON representation and grouping stats etc.

@jerbly
Copy link
Contributor

jerbly commented Sep 26, 2025

Looks good. I think we just need to update the crate README: https://github.com/open-telemetry/weaver/tree/main/crates/weaver_live_check#readme

And, we need a breaking change notice in the changelog.

@lmolkova
Copy link
Member Author

thanks @jerbly, I updated readme and added changelog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rego policy needs updating for advice_context

lmolkova and others added 2 commits September 26, 2025 08:34
Co-authored-by: Jeremy Blythe <jeremyblythe@gmail.com>
Copy link
Contributor

@jerbly jerbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Excited to give it a try! Thanks!

@jerbly jerbly merged commit 47783c3 into open-telemetry:main Sep 26, 2025
23 checks passed
@github-project-automation github-project-automation bot moved this from To consider for the next release to Done in OTel Weaver Project Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants