Implement weaver registry infer command#1138
Conversation
There was a problem hiding this comment.
clippy found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
|
Opening as a draft first, manually tested and it seemed to work :) Some question that I have:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1138 +/- ##
=======================================
- Coverage 80.3% 80.3% -0.1%
=======================================
Files 109 109
Lines 8855 8855
=======================================
- Hits 7117 7114 -3
- Misses 1738 1741 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model? What I had in mind would have been more embedded in live-check, maybe a Also, live-check would be highlighting any items which would be troublesome to make an inference for: e.g. an attribute named |
Exactly. While giving talks about Weaver last year, a very common question was: "I have thousands of metrics already, I don't want to manually rewrite what I have into a schema. Is there anything to make this easier?". That's the problem I'm trying to solve here. As long as there's an appropriate receiver in the collector, you can send data in any format, translate to OTLP, send it to Weaver Infer and you'll have your OTel Schema available. It's up to you to do further modifications to the schema as needed. With a schema available, code generation could build dashboards, could generate instrumentation code that helps migrate from one SDK to another, etc etc. To be honest, I'm even envisioning a combined functionality of weaver serve+infer, where inferred schemas could be modified through the UI before the user "commits" them to the registry.
Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve? |
If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code. Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry. At my company we have a company-registry which is dependent on the OTel registry. We often find attributes and signals we want to express that fit in the OTel namespaces for example As another example, you produced a registry in your PR: prometheus/prometheus#17868 - moving forward, you could run the live-check inference again with this registry loaded and infer modifications to it alongside live-check telling you what's missing or invalid. |
So with your idea, if we add a I can work with that :)
Hmmm, I think I understand some parts but others I'm still feeling a bit lost.
|
No, I'm doing a bad job trying to explain this I think.
I'm thinking the command could be:
This use case could be a later phase. I think we would need options to determine if weaver should add or overwrite when it finds differences. And, if you want weaver to remove definitions if they were not received in the samples.
|
|
Ok, I think I've addressed all comments that are addressable, given what we discussed in the SIG meeting today. I'm intentionally letting some things undone to keep the scope of the PR small and easier to review:
But please let me know if any of the above should be worked on in this PR, and if there's anything else you'd like to see here |
jerbly
left a comment
There was a problem hiding this comment.
I have made a few comments:
- some are to tidy the code which you can treat as nits
- handling type mismatch and missing essential data I think needs to be addressed
- optimizing with a single pass to accumulate and translate could be fun, not essential
I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.
FYI. I've been asked for this infer tool a few times now so it's great to see it coming together. Thanks!
I think I made it work for attributes at least, but I'm struggling a bit to make it work for metrics, spans and events. The hashmap is useful for quick lookups, and I'm not sure how to do the deduplication without the hashmaps 😬
Happy to tackle both! |
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
9823e80 to
94161ee
Compare
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
jerbly
left a comment
There was a problem hiding this comment.
Overall: Looks good for a first pass.
Perhaps when adding the v2 support there can be some refactoring to make this more idiomatic Rust. The conversion logic between Sample* types and Accumulated*/AttributeSpec types could use Rust's conversion traits:
From<&SampleAttribute> for AttributeSpec- Replaceattribute_spec_from_sample()with aFromimplFrom<&AccumulatedSpan> for GroupSpec(and similar for Metric/Event) - Replace the inline conversion into_semconv_spec()
Maybe add an Accumulate trait - Something like:
trait Accumulate {
fn accumulate(&self, acc: &mut AccumulatedSamples);
}Implement for SampleResource, SampleSpan, SampleMetric, etc. This would let add_sample become simply sample.accumulate(self).
But, this is a great addition to weaver, let's get the first iteration in.
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Aha, trait is the next chapter in the rust book 😛, but gotcha! I'll focus on that in the next PRs :) |
|
Hey folks, just checking in if we're good to merge this PR or if somehow I missed some feedback that you want to see addressed in this PR. I'm kinda waiting on this one to start the follow up PRs :) |
|
I'm fine merging, as this is a good first step in an evolution, but i"m deferring to @jerbly here. He did the detailed review and has the vision for how it merges together with live-check over time. I wasn't sure if any of his comments were blocking, but wanted to check. |
|
It just needs main merged in. It's ready otherwise. In the next PR we should move it to use the new OutputProcessor. |
|
Ah damn, I though a simple rebase would be enough lol I'm supposed to be on PTO this week, I'll try to take a look at the new failures another day, but maximum early next week |
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
|
If I'm understanding #1206 correctly, including the parent resource in the inferred span doesn't affect the final result. We COULD include the parent resource while inferring metric, events, and spans, but that would require switching the deserialization logic... not sure if there's a benefit there. As always, happy to be proved wrong :) |
You're correct. What you have now is fine. |
|
Can you please add an entry to CHANGELOG.md? |
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
TLDR
Implements weaver registry infer command that generates a semantic convention registry YAML file by inferring the schema from incoming OTLP telemetry data.
Description
This PR adds a new weaver registry infer subcommand that starts a gRPC server to receive OTLP messages (traces, metrics, logs) and automatically infers a semantic convention schema from the observed telemetry. The command processes incoming data, deduplicates attributes across signals, and collects up to 5 unique example values per attribute to help document the inferred schema.
The inferred schema is written to a single registry.yaml file in the specified output directory (default: ./inferred-registry/). The output follows the standard semantic convention format with separate groups for resources, spans, metrics, and events. Resource attributes are currently accumulated into a single resource group; entity-based grouping (via OTLP EntityRef) is not yet supported but documented for future implementation.
Testing
Tested by using weaver registry emit to send OTLP telemetry to the infer command's gRPC endpoint. The generated registry.yaml file was verified to contain the expected groups (resources, spans, metrics, events) with properly inferred attribute types and example values.