Skip to content

Implement weaver registry infer command#1138

Merged
jerbly merged 21 commits intoopen-telemetry:mainfrom
ArthurSens:weaver-registry-infer
Feb 21, 2026
Merged

Implement weaver registry infer command#1138
jerbly merged 21 commits intoopen-telemetry:mainfrom
ArthurSens:weaver-registry-infer

Conversation

@ArthurSens
Copy link
Member

TLDR

Implements weaver registry infer command that generates a semantic convention registry YAML file by inferring the schema from incoming OTLP telemetry data.

Description

This PR adds a new weaver registry infer subcommand that starts a gRPC server to receive OTLP messages (traces, metrics, logs) and automatically infers a semantic convention schema from the observed telemetry. The command processes incoming data, deduplicates attributes across signals, and collects up to 5 unique example values per attribute to help document the inferred schema.
The inferred schema is written to a single registry.yaml file in the specified output directory (default: ./inferred-registry/). The output follows the standard semantic convention format with separate groups for resources, spans, metrics, and events. Resource attributes are currently accumulated into a single resource group; entity-based grouping (via OTLP EntityRef) is not yet supported but documented for future implementation.

Testing

Tested by using weaver registry emit to send OTLP telemetry to the infer command's gRPC endpoint. The generated registry.yaml file was verified to contain the expected groups (resources, spans, metrics, events) with properly inferred attribute types and example values.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@ArthurSens
Copy link
Member Author

ArthurSens commented Jan 14, 2026

Opening as a draft first, manually tested and it seemed to work :)

Some question that I have:

  1. Should we build v2 schemas instead of v1?
  2. I've created an object called YamlGroup to serialize the YAML file because I couldn't find another object that already does this. Don't we have something like that already? Could we re-use objects the unserialize YAML to also do the serialization somehow?
  3. Is code organized correctly? I'm still struggling to understand when code should go to a separate crate and when it should be in the CLI module.
  4. Do we want to implement entity inference already? I'm not sure how stable Entities are

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.3%. Comparing base (6034ddb) to head (0e2b557).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1138     +/-   ##
=======================================
- Coverage   80.3%   80.3%   -0.1%     
=======================================
  Files        109     109             
  Lines       8855    8855             
=======================================
- Hits        7117    7114      -3     
- Misses      1738    1741      +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jerbly
Copy link
Contributor

jerbly commented Jan 17, 2026

Opening as a draft first, manually tested and it seemed to work :)

Some question that I have:

  1. Should we build v2 schemas instead of v1?
  2. I've created an object called YamlGroup to serialize the YAML file because I couldn't find another object that already does this. Don't we have something like that already? Could we re-use objects the unserialize YAML to also do the serialization somehow?
  3. Is code organized correctly? I'm still struggling to understand when code should go to a separate crate and when it should be in the CLI module.
  4. Do we want to implement entity inference already? I'm not sure how stable Entities are
  1. IMO, we should make v2.
  2. We should use the weaver_semconv crate to make the structure and then Serialize that to YAML.
  3. As it stands it's OK. See below for further thoughts that might change this...
  4. I'm not sure either, this has also not been done in live-check. @jsuereth to comment.

Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model?

What I had in mind would have been more embedded in live-check, maybe a --infer option to live-check. You would then be comparing samples with an existing model, the otel semconv model by default. The inference would then be to create a new model that depends on and extends the model you're comparing with. This would make a definition with imports, refs and extends.

Also, live-check would be highlighting any items which would be troublesome to make an inference for: e.g. an attribute named MyAttr would fail policy checks around naming conventions (should be some_namespace.my_attr for example).

@ArthurSens
Copy link
Member Author

Overall I'm wondering what the intent of this command is. What you have made takes samples and aggregates them into entirely new definitions, I guess to use as a starting point model?

Exactly. While giving talks about Weaver last year, a very common question was: "I have thousands of metrics already, I don't want to manually rewrite what I have into a schema. Is there anything to make this easier?". That's the problem I'm trying to solve here. As long as there's an appropriate receiver in the collector, you can send data in any format, translate to OTLP, send it to Weaver Infer and you'll have your OTel Schema available. It's up to you to do further modifications to the schema as needed. With a schema available, code generation could build dashboards, could generate instrumentation code that helps migrate from one SDK to another, etc etc.

To be honest, I'm even envisioning a combined functionality of weaver serve+infer, where inferred schemas could be modified through the UI before the user "commits" them to the registry.

The inference would then be to create a new model that depends on and extends the model you're comparing with. This would make a definition with imports, refs and extends.

Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve?

@jerbly
Copy link
Contributor

jerbly commented Jan 20, 2026

Interesting! This hasn't crossed my mind at all before. Could you elaborate a bit on the use case for this? What are the problems you wanted to solve?

If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code.

Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry.

At my company we have a company-registry which is dependent on the OTel registry. We often find attributes and signals we want to express that fit in the OTel namespaces for example aws. Let's say my application emits aws.s3.bucket and aws.new.attr. I don't want to define aws.s3.bucket again since it's already in the OTel registry, I just want to modify my company registry to add aws.new.attr.

As another example, you produced a registry in your PR: prometheus/prometheus#17868 - moving forward, you could run the live-check inference again with this registry loaded and infer modifications to it alongside live-check telling you what's missing or invalid.

@ArthurSens
Copy link
Member Author

If you run live-check today with an empty registry it will produce an output with every sample and, where possible, it will tell you every attribute and signal is missing in the live_check_result for that sample. You could imagine taking the json report from this live-check and producing an inferred registry like you've done with your code.

So with your idea, if we add a --infer flag to live-check, instead of a json output we would get the YAML file as done in this PR so far?

I can work with that :)

Now extend this concept. Rather than starting with an empty registry, start with the OTel semconv registry. The output report can now be interpreted to infer either modifications to the registry, or extensions to it in a child registry.

Hmmm, I think I understand some parts but others I'm still feeling a bit lost.

  • The output report could be interpreted as extentions in a child registry: We can infer this information if the OTLP message includes Samples that were not present before, is that correct?
  • The output report could be interpreted as modifications to the registry: This is the part where I'm not understaning how we could tell. If our registry has a sample called metric.X, the OTLP message doesn't include this Sample but includes metric.Y... How do I know the difference between a Sample that was renamed or a Sample that was removed completely and a new unrelated one was added?

@jerbly
Copy link
Contributor

jerbly commented Jan 21, 2026

So with your idea, if we add a --infer flag to live-check, instead of a json output we would get the YAML file as done in this PR so far?

I can work with that :)

No, I'm doing a bad job trying to explain this I think.

  • The output report could be interpreted as extentions in a child registry: We can infer this information if the OTLP message includes Samples that were not present before, is that correct?

I'm thinking the command could be: weaver registry live-check -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.38.0.zip[model] --infer new - this would collect samples and compare them with the otel registry. Let's say one of the samples is for metric.X with attributes: server.address and server.port. metric.X is not found in the otel registry but server.address and server.port are. The inferred output would be a new registry defining metric.X with references to server.address and server.port. Since we ran --infer new, weaver would also create a registry_manifest.yaml declaring the dependency on https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.38.0.zip[model].

  • The output report could be interpreted as modifications to the registry: This is the part where I'm not understaning how we could tell. If our registry has a sample called metric.X, the OTLP message doesn't include this Sample but includes metric.Y... How do I know the difference between a Sample that was renamed or a Sample that was removed completely and a new unrelated one was added?

This use case could be a later phase.
In this case, the command could be: weaver registry live-check -r my_model_dir --infer modify - in this case, new registry files are created but suffixed with _inferred. Those registry files are a copy of the original with modifications made to them with any changes inferred from the live-check result. For example, let's say we used the registry generated in the example above. We receive a sample of metric.X with the server attributes but now also the attribute error.type. The registry is modified to add this attribute to the metric. This would retain any non-inferrable fields in the original registry e.g. brief, note, annotations.

I think we would need options to determine if weaver should add or overwrite when it finds differences. And, if you want weaver to remove definitions if they were not received in the samples.

--infer modify is quite a bit more complicated and I'm not sure it's worth it. But --infer new, where we're making a dependent child registry I think is important and inline with our multi-registry philosophy.

@ArthurSens ArthurSens marked this pull request as ready for review January 21, 2026 21:08
@ArthurSens ArthurSens requested a review from a team as a code owner January 21, 2026 21:08
@ArthurSens
Copy link
Member Author

ArthurSens commented Jan 21, 2026

Ok, I think I've addressed all comments that are addressable, given what we discussed in the SIG meeting today.

I'm intentionally letting some things undone to keep the scope of the PR small and easier to review:

  • I'm generating v1 schemas instead of v2 -- Not sure if the plan is to allow both, as generate does, or if I should replace v1 with v2 entirely in the future.
  • Functionality to compare the incoming OTLP messages with already existing registries, so inferred schemas use extends and/or imports directives instead of duplicating an entire semantic convention.

But please let me know if any of the above should be worked on in this PR, and if there's anything else you'd like to see here

Copy link
Contributor

@jerbly jerbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a few comments:

  • some are to tidy the code which you can treat as nits
  • handling type mismatch and missing essential data I think needs to be addressed
  • optimizing with a single pass to accumulate and translate could be fun, not essential

I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.

FYI. I've been asked for this infer tool a few times now so it's great to see it coming together. Thanks!

@ArthurSens
Copy link
Member Author

  • optimizing with a single pass to accumulate and translate could be fun, not essential

I think I made it work for attributes at least, but I'm struggling a bit to make it work for metrics, spans and events. The hashmap is useful for quick lookups, and I'm not sure how to do the deduplication without the hashmaps 😬

I think, if we're not supporting v2 in this PR that's ok (it's marked experimental) but we should quickly move on to that in a follow-up. I'd also suggest, in the next PR, we move the main conversion code out to either one of the existing crates or a new one.

Happy to tackle both!

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@ArthurSens ArthurSens force-pushed the weaver-registry-infer branch from 9823e80 to 94161ee Compare February 4, 2026 19:13
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Copy link
Contributor

@jerbly jerbly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: Looks good for a first pass.

Perhaps when adding the v2 support there can be some refactoring to make this more idiomatic Rust. The conversion logic between Sample* types and Accumulated*/AttributeSpec types could use Rust's conversion traits:

  • From<&SampleAttribute> for AttributeSpec - Replace attribute_spec_from_sample() with a From impl
  • From<&AccumulatedSpan> for GroupSpec (and similar for Metric/Event) - Replace the inline conversion in to_semconv_spec()

Maybe add an Accumulate trait - Something like:

trait Accumulate {
    fn accumulate(&self, acc: &mut AccumulatedSamples);
}

Implement for SampleResource, SampleSpan, SampleMetric, etc. This would let add_sample become simply sample.accumulate(self).

But, this is a great addition to weaver, let's get the first iteration in.

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@ArthurSens
Copy link
Member Author

Overall: Looks good for a first pass.

Perhaps when adding the v2 support there can be some refactoring to make this more idiomatic Rust. The conversion logic between Sample* types and Accumulated*/AttributeSpec types could use Rust's conversion traits:

  • From<&SampleAttribute> for AttributeSpec - Replace attribute_spec_from_sample() with a From impl
  • From<&AccumulatedSpan> for GroupSpec (and similar for Metric/Event) - Replace the inline conversion in to_semconv_spec()

Maybe add an Accumulate trait - Something like:

trait Accumulate {
    fn accumulate(&self, acc: &mut AccumulatedSamples);
}

Implement for SampleResource, SampleSpan, SampleMetric, etc. This would let add_sample become simply sample.accumulate(self).

But, this is a great addition to weaver, let's get the first iteration in.

Aha, trait is the next chapter in the rust book 😛, but gotcha! I'll focus on that in the next PRs :)

@ArthurSens
Copy link
Member Author

Hey folks, just checking in if we're good to merge this PR or if somehow I missed some feedback that you want to see addressed in this PR.

I'm kinda waiting on this one to start the follow up PRs :)

@jsuereth
Copy link
Contributor

jsuereth commented Feb 17, 2026

I'm fine merging, as this is a good first step in an evolution, but i"m deferring to @jerbly here. He did the detailed review and has the vision for how it merges together with live-check over time. I wasn't sure if any of his comments were blocking, but wanted to check.

@jerbly
Copy link
Contributor

jerbly commented Feb 17, 2026

It just needs main merged in. It's ready otherwise. In the next PR we should move it to use the new OutputProcessor.

@ArthurSens
Copy link
Member Author

Ah damn, I though a simple rebase would be enough lol

I'm supposed to be on PTO this week, I'll try to take a look at the new failures another day, but maximum early next week

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@ArthurSens
Copy link
Member Author

If I'm understanding #1206 correctly, including the parent resource in the inferred span doesn't affect the final result.

We COULD include the parent resource while inferring metric, events, and spans, but that would require switching the deserialization logic... not sure if there's a benefit there. As always, happy to be proved wrong :)

@jerbly
Copy link
Contributor

jerbly commented Feb 20, 2026

If I'm understanding #1206 correctly, including the parent resource in the inferred span doesn't affect the final result.

You're correct. What you have now is fine.

@jerbly jerbly enabled auto-merge (squash) February 20, 2026 23:38
@jerbly jerbly disabled auto-merge February 20, 2026 23:39
@jerbly
Copy link
Contributor

jerbly commented Feb 20, 2026

Can you please add an entry to CHANGELOG.md?

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@jerbly jerbly enabled auto-merge (squash) February 21, 2026 16:17
@jerbly jerbly merged commit 10253ee into open-telemetry:main Feb 21, 2026
22 checks passed
@ArthurSens ArthurSens deleted the weaver-registry-infer branch February 21, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants