The GenAI SIG has been discussing how to capture prompts and completion for a while and there are several issues that are blocked on this discussion (#1913, #1883, #1556)
What we have today on OTel semconv is a set of structured events exported separately from spans. This was implemented in #980 and discussed in #834 and #829. The motivation to use events was to
- overcome size limits on attribute values by using event body
- use a signal that supports structured body and attributes
- have a clear 1:1 relationship between event name and structure (as opposed to polymorphic types or arrays of heterogeneous objects)
- make it possible and easy to consume individual events and prompts/completions without spans
- have verbosity controls
Turns out that:
- after ~9 months events are still not adopted by GenAI-focused tracing tools and their external instrumentation libs including Arize, Traceloop, Langtrace - all these providers use span attributes to capture prompts and completions.
- These backends consume prompts and completions along with spans and don't envision separating them - they store and visualize this data altogether
So, the GenAI SIG is re-litigating this decision taking into account backends' feedback and other relevant issues: #1621, #1912, open-telemetry/opentelemetry-specification#4414
The fundamental question is whether this data belongs on the span or is a separate thing useful without a span.
How it can be useful without a span:
To be useful without a span, events should probably duplicate some of the span attributes - endpoint, model used, input parameters, etc - it's not the case today
Are prompts/completions point-in-time telemetry?
Arguably, from what we've seen so far, GenAI prompts and completion are used along with the spans and there is no great use-case for standalone events
Another fundamental question is how and if to capture unbounded (text, video, audio, etc) data on telemetry
It's problematic because of:
- privacy - prompts can contain health concerns, ssns, addresses, names, etc. Apps that remain compliant with different regulators would have a problem of sharing this data with a broad audience of DevOps humans. The data should be accessible for evaluations, audit, but access should be restricted
- size - non-GenAI specific backends are not optimized for this and it's expensive to store such data in hot storage.
Imagine, we had a solution that allowed us to store chat history somewhere and added a deep-link to that specific conversation to the telemetry - would we consider reporting this link as an event? We might, but we'd most likely have added this link as attribute on the span.
Arguably, the long term solution to this problem is having this data stored separately from telemetry, but recorded by reference (e.g. URL on span that points to the chat history)
TL;DR:
- current approach doesn't work, we're blocked and need to find path forward.
- GenAI-focused backends, innerloop scenarios, non-production apps would benefit from having prompts/completions stamped on the spans directly
- General-purpose observability backends and high-scale applications would have a problem with sensitive/large/binary data coming from end-users on telemetry anyway
The GenAI SIG has been discussing how to capture prompts and completion for a while and there are several issues that are blocked on this discussion (#1913, #1883, #1556)
What we have today on OTel semconv is a set of structured events exported separately from spans. This was implemented in #980 and discussed in #834 and #829. The motivation to use events was to
Turns out that:
So, the GenAI SIG is re-litigating this decision taking into account backends' feedback and other relevant issues: #1621, #1912, open-telemetry/opentelemetry-specification#4414
The fundamental question is whether this data belongs on the span or is a separate thing useful without a span.
How it can be useful without a span:
To be useful without a span, events should probably duplicate some of the span attributes - endpoint, model used, input parameters, etc - it's not the case today
Are prompts/completions point-in-time telemetry?
Arguably, from what we've seen so far, GenAI prompts and completion are used along with the spans and there is no great use-case for standalone events
Another fundamental question is how and if to capture unbounded (text, video, audio, etc) data on telemetry
It's problematic because of:
Imagine, we had a solution that allowed us to store chat history somewhere and added a deep-link to that specific conversation to the telemetry - would we consider reporting this link as an event? We might, but we'd most likely have added this link as attribute on the span.
Arguably, the long term solution to this problem is having this data stored separately from telemetry, but recorded by reference (e.g. URL on span that points to the chat history)
TL;DR: