Skip to content

feat(openai): add OpenAI integration#5488

Merged
Kyle-Verhoog merged 151 commits into
1.xfrom
evan.li/open-ai
Apr 26, 2023
Merged

feat(openai): add OpenAI integration#5488
Kyle-Verhoog merged 151 commits into
1.xfrom
evan.li/open-ai

Conversation

@lievan

@lievan lievan commented Apr 6, 2023

Copy link
Copy Markdown
Contributor

Add an integration for the OpenAI library. This integration provides tracing for the completion, embeddings and chat completion endpoints along with cost estimation metrics and prompt/completion sampling logs.

Each log, metric and trace are tagged with service, env, version, OpenAI model, OpenAI endpoint and OpenAI organization.

Docs preview

Design

Logs

A new log writer implementation is added to submit logs. Logs are submitted direct to intake following a similar approach that kyle-verhoog/datadog-python and the .NET tracer have taken already.

Metrics

A statsd client is used specifically for the OpenAI integration.

Testing

Testing is done using VCR to record requests made to OpenAI to ensure ease, consistency and reliability in test cases.

Logs and metrics are tested using mocking of the clients.

Several integration tests using snapshots and subprocess testing ensure that the integration works in a real world OpenAI application.

A manual test app was also used: https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16

Risk

  • Currently, logs, traces and metrics are all collected, buffered and sent individually through their respective pipelines. Due to this, there is risk that disparity occurs between the tagging and submission of the data. There is also a performance risk as this data is not aggregated or batched when submitted.
  • Prompts and completions are captured on spans by default with a default limit on the length of the data. This limit only applies to each prompt/completion individually but requests can contain several prompts and completions. If there are many prompts and completions of great length then there is a risk of performance overhead of encoding and transmitting the data.
  • Logs and metrics clients are specified specifically for this integration. If another integration were to introduce logs then there would be a need for another log writer. Having several log writers could induce thread contention and high memory usage.

Checklist

  • Change(s) are motivated and described in the PR description.
  • Testing strategy is described if automated tests are not included in the PR.
  • Risk is outlined (performance impact, potential for breakage, maintainability, etc).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Library release note guidelines are followed.
  • Documentation is included (in-code, generated user docs, public corp docs).
  • PR description includes explicit acknowledgement/acceptance of the performance implications of this PR as reported in the benchmarks PR comment.

Reviewer Checklist

  • Title is accurate.
  • No unnecessary changes are introduced.
  • Description motivates each change.
  • Avoids breaking API changes unless absolutely necessary.
  • Testing strategy adequately addresses listed risk(s).
  • Change is maintainable (easy to change, telemetry, documentation).
  • Release note makes sense to a user of the library.
  • Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.

@lievan lievan requested review from a team as code owners April 6, 2023 14:33
@lievan lievan marked this pull request as draft April 6, 2023 14:34
@Yun-Kim Yun-Kim changed the title ddtrace/contrib: support openai integration feat(open-ai): add open-ai integration Apr 6, 2023
@Kyle-Verhoog Kyle-Verhoog changed the title feat(open-ai): add open-ai integration feat(openai): add OpenAI integration Apr 7, 2023
@pr-commenter

pr-commenter Bot commented Apr 7, 2023

Copy link
Copy Markdown

Benchmarks

Comparing candidate commit c130908 in PR branch evan.li/open-ai with baseline commit a9f4c02 in branch 1.x.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 94 cases.

@Kyle-Verhoog Kyle-Verhoog enabled auto-merge (squash) April 26, 2023 21:38
@Kyle-Verhoog Kyle-Verhoog disabled auto-merge April 26, 2023 23:32
@Kyle-Verhoog Kyle-Verhoog merged commit abd4542 into 1.x Apr 26, 2023
@Kyle-Verhoog Kyle-Verhoog deleted the evan.li/open-ai branch April 26, 2023 23:34
@github-actions github-actions Bot added this to the v1.14.0 milestone Apr 26, 2023
Kyle-Verhoog added a commit that referenced this pull request Apr 26, 2023
Add an integration for the [OpenAI library](https://github.com/openai/openai-python). 
This integration provides tracing for the completion, embeddings and chat completion
endpoints along with cost estimation metrics and prompt/completion
sampling logs.

Each log, metric and trace are tagged with service, env, version, OpenAI
model, OpenAI endpoint and OpenAI organization.

[Docs
preview](https://output.circle-artifacts.com/output/job/fe3599b8-952e-4ceb-ac4f-0f15503e9c0d/artifacts/0/tmp/docs/integrations.html#openai)

## Design

### Logs

A new log writer implementation is added to submit logs. Logs are
submitted direct to intake following a similar approach that
[kyle-verhoog/datadog-python](https://github.com/Kyle-Verhoog/datadog-python/blob/main/datadog/_logging.py)
and the [.NET tracer](DataDog/dd-trace-dotnet#2240) have taken
already.

### Metrics

A statsd client is used specifically for the OpenAI integration.


## Testing

Testing is done using VCR to record requests made to OpenAI to ensure
ease, consistency and reliability in test cases.

Logs and metrics are tested using mocking of the clients.

Several integration tests using snapshots and subprocess testing ensure
that the integration works in a real world OpenAI application.

A manual test app was also used:
https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16


## Risk

- Currently, logs, traces and metrics are all collected, buffered and
sent individually through their respective pipelines. Due to this, there
is risk that disparity occurs between the tagging and submission of the
data. There is also a performance risk as this data is not aggregated or
batched when submitted.
- Prompts and completions are captured on spans by default with a
default limit on the length of the data. This limit only applies to each
prompt/completion individually but requests can contain several prompts
and completions. If there are many prompts and completions of great
length then there is a risk of performance overhead of encoding and
transmitting the data.
- Logs and metrics clients are specified specifically for this
integration. If another integration were to introduce logs then there
would be a need for another log writer. Having several log writers could
induce thread contention and high memory usage.


Co-authored-by: Kyle Verhoog <kyle@verhoog.ca>
Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com>
Kyle-Verhoog added a commit that referenced this pull request Apr 27, 2023
Add an integration for the [OpenAI
library](https://github.com/openai/openai-python). This integration
provides tracing for the completion, embeddings and chat completion
endpoints along with cost estimation metrics and prompt/completion
sampling logs.

Each log, metric and trace are tagged with service, env, version, OpenAI
model, OpenAI endpoint and OpenAI organization.

[Docs

preview](https://output.circle-artifacts.com/output/job/fe3599b8-952e-4ceb-ac4f-0f15503e9c0d/artifacts/0/tmp/docs/integrations.html#openai)

## Design

### Logs

A new log writer implementation is added to submit logs. Logs are
submitted direct to intake following a similar approach that
[kyle-verhoog/datadog-python](https://github.com/Kyle-Verhoog/datadog-python/blob/main/datadog/_logging.py)
and the [.NET
tracer](DataDog/dd-trace-dotnet#2240) have taken
already.

### Metrics

A statsd client is used specifically for the OpenAI integration.


## Testing

Testing is done using VCR to record requests made to OpenAI to ensure
ease, consistency and reliability in test cases.

Logs and metrics are tested using mocking of the clients.

Several integration tests using snapshots and subprocess testing ensure
that the integration works in a real world OpenAI application.

A manual test app was also used:
https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16


## Risk

- Currently, logs, traces and metrics are all collected, buffered and
sent individually through their respective pipelines. Due to this, there
is risk that disparity occurs between the tagging and submission of the
data. There is also a performance risk as this data is not aggregated or
batched when submitted.
- Prompts and completions are captured on spans by default with a
default limit on the length of the data. This limit only applies to each
prompt/completion individually but requests can contain several prompts
and completions. If there are many prompts and completions of great
length then there is a risk of performance overhead of encoding and
transmitting the data.
- Logs and metrics clients are specified specifically for this
integration. If another integration were to introduce logs then there
would be a need for another log writer. Having several log writers could
induce thread contention and high memory usage.


Co-authored-by: lievan <42917263+lievan@users.noreply.github.com>
Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com>
Co-authored-by: Federico Mon <federico.mon@datadoghq.com>
Yun-Kim added a commit that referenced this pull request Jun 15, 2023
## Tips for reviewers

**note: 99% of the line changes in this PR are non-code, in the form of
snapshot files, riot lockfiles, test data (audio/image files), and test
cassette files used to mock request/responses with the OpenAI API, which
store binary image/audio file data in `.yaml` files that Github counts
as lines of code.**

The major files that reviewers should focus on are:
- `ddtrace/contrib/openai/patch.py` - outlines which endpoint hooks are
added, minor changes to overall patching to minimize coupling with
endpoint hook information.
- `ddtrace/contrib/openai/_endpoint_hooks.py` - implementation of each
endpoint hook, as well as minor refactor to minimize code duplication
between endpoint hooks
- `tests/contrib/openai/test_openai.py`. - shows how each endpoint is
expected to be used.

## Summary
This PR adds tracing support for the remaining endpoints not implemented
by #5488. This includes:
- Model (list, retrieve)
- Edits (create)
- Images (create, edit, create variation)
- Audio (transcribe, translate)
- Files (list, create, delete, retrieve, retrieve contents)
- Fine-tunes (list, create, retrieve, cancel, list events, delete fine
tuned model)
- Moderations (create)

Changes in this PR include:
- Modifying the design of the endpoint hooks to minimize duplicate code,
and adding new endpoint hooks for the above endpoints.
- Ensuring rate-limit related metrics are tagged as numeric metrics
rather than string tags.
- Ensure user's OpenAI API key is set as a tag in multiple use scenarios
(set in env var, set as attribute in `openai`, directly added as a
request param)
- Adding and correcting names of attributes extracted from `openai`
(e.g. `organization`, `api_type`)
- Simplifying and separating patch logic from endpoint-specific tracing
logic (only operationID is propagated from endpoint hook level to patch
level)
- Moving some global integration-level tags to request-level (e.g.
`openai.request.model`, `openai.request.endpoint`, added
`openai.request.method`)


## OpenAI Integration Design

### Overall Patch Implementation

**note**: the overall patch implementation has not changed; this is
simply a refresher on how the overall integration works.

The openai integration features patching via generators - each of the
openAI API's endpoint methods (ex: `openai.Completions.create()`) is
wrapped by `_patched_endpoint(endpoint_hook, ...)`, which itself takes a
corresponding endpoint hook as an argument. A rough description of what
happens when `_patched_endpoint()` is called is as follows:
- `_patched_endpoint(endpoint_hook, ...)` starts a trace and starts a
generator `_traced_endpoint(endpoint_hook, ...)`
- `_traced_endpoint(endpoint_hook, ...)` starts
`endpoint_hook().handle_request(...)`, which performs endpoint-specific
tracing logic, and yields back to `_patched_endpoint()` once the request
has been processed by the endpoint hook.
- `_patched_endpoint()` runs the underlying openai API method, then
yields back the response/error back to `_traced_endpoint()`, which in
turn yields the response/error back to the `endpoint_hook`.
- `endpoint_hook()` performs the endpoint-specific tracing logic for the
response/error, then finishes the trace.

### Endpoint Hook Design

The internal implementation design of the endpoint hooks have been
modified (building off of the refactor from #5865) to minimize code
duplication. This means that each endpoint hook now stores the
following:
- `_request_arg_params` and `_request_kwarg_params`, which are tuples
containing the arg/kwarg signature of the underlying openAI API endpoint
method. These tuples are used in `_EndpointHook._record_request(...)` to
add request arg/kwarg parameters as span tags/metrics.
- `ENDPOINT_NAME`, `REQUEST_TYPE`, `OPERATION_ID` constants, which
reflect the base endpoint name (e.g. `/completions`), http request type
(e.g. `POST`), and operation ID as specified by the openAI API
specifcations (e.g. `createCompletion`). The operationID is used as the
span resource name, and the endpoint/request_type values are added as
span tags.
- Each endpoint hook also features (optional) a `_record_request(...)`
and `_record_response(...)` to add any endpoint-specific span tagging
logic.

## Testing

The testing for this PR involves snapshot testing for each endpoint, as
well as `vcrpy` cassettes used to mock request/responses to OpenAI.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines)
are followed.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] OPTIONAL: PR description includes explicit acknowledgement of the
performance implications of the change as reported in the benchmarks PR
comment.

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.

---------

Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
Co-authored-by: Federico Mon <federico.mon@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants