feat(openai): add OpenAI integration by lievan · Pull Request #5488 · DataDog/dd-trace-py

lievan · 2023-04-06T14:33:33Z

Add an integration for the OpenAI library. This integration provides tracing for the completion, embeddings and chat completion endpoints along with cost estimation metrics and prompt/completion sampling logs.

Each log, metric and trace are tagged with service, env, version, OpenAI model, OpenAI endpoint and OpenAI organization.

Docs preview

Design

Logs

A new log writer implementation is added to submit logs. Logs are submitted direct to intake following a similar approach that kyle-verhoog/datadog-python and the .NET tracer have taken already.

Metrics

A statsd client is used specifically for the OpenAI integration.

Testing

Testing is done using VCR to record requests made to OpenAI to ensure ease, consistency and reliability in test cases.

Logs and metrics are tested using mocking of the clients.

Several integration tests using snapshots and subprocess testing ensure that the integration works in a real world OpenAI application.

A manual test app was also used: https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16

Risk

Currently, logs, traces and metrics are all collected, buffered and sent individually through their respective pipelines. Due to this, there is risk that disparity occurs between the tagging and submission of the data. There is also a performance risk as this data is not aggregated or batched when submitted.
Prompts and completions are captured on spans by default with a default limit on the length of the data. This limit only applies to each prompt/completion individually but requests can contain several prompts and completions. If there are many prompts and completions of great length then there is a risk of performance overhead of encoding and transmitting the data.
Logs and metrics clients are specified specifically for this integration. If another integration were to introduce logs then there would be a need for another log writer. Having several log writers could induce thread contention and high memory usage.

Checklist

Change(s) are motivated and described in the PR description.
Testing strategy is described if automated tests are not included in the PR.
Risk is outlined (performance impact, potential for breakage, maintainability, etc).
Change is maintainable (easy to change, telemetry, documentation).
Library release note guidelines are followed.
Documentation is included (in-code, generated user docs, public corp docs).
PR description includes explicit acknowledgement/acceptance of the performance implications of this PR as reported in the benchmarks PR comment.

Reviewer Checklist

Title is accurate.
No unnecessary changes are introduced.
Description motivates each change.
Avoids breaking API changes unless absolutely necessary.
Testing strategy adequately addresses listed risk(s).
Change is maintainable (easy to change, telemetry, documentation).
Release note makes sense to a user of the library.
Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment.

pr-commenter · 2023-04-07T00:38:03Z

Benchmarks

Comparing candidate commit c130908 in PR branch evan.li/open-ai with baseline commit a9f4c02 in branch 1.x.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 94 cases.

…nai docs

Add an integration for the [OpenAI library](https://github.com/openai/openai-python). This integration provides tracing for the completion, embeddings and chat completion endpoints along with cost estimation metrics and prompt/completion sampling logs. Each log, metric and trace are tagged with service, env, version, OpenAI model, OpenAI endpoint and OpenAI organization. [Docs preview](https://output.circle-artifacts.com/output/job/fe3599b8-952e-4ceb-ac4f-0f15503e9c0d/artifacts/0/tmp/docs/integrations.html#openai) ## Design ### Logs A new log writer implementation is added to submit logs. Logs are submitted direct to intake following a similar approach that [kyle-verhoog/datadog-python](https://github.com/Kyle-Verhoog/datadog-python/blob/main/datadog/_logging.py) and the [.NET tracer](DataDog/dd-trace-dotnet#2240) have taken already. ### Metrics A statsd client is used specifically for the OpenAI integration. ## Testing Testing is done using VCR to record requests made to OpenAI to ensure ease, consistency and reliability in test cases. Logs and metrics are tested using mocking of the clients. Several integration tests using snapshots and subprocess testing ensure that the integration works in a real world OpenAI application. A manual test app was also used: https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16 ## Risk - Currently, logs, traces and metrics are all collected, buffered and sent individually through their respective pipelines. Due to this, there is risk that disparity occurs between the tagging and submission of the data. There is also a performance risk as this data is not aggregated or batched when submitted. - Prompts and completions are captured on spans by default with a default limit on the length of the data. This limit only applies to each prompt/completion individually but requests can contain several prompts and completions. If there are many prompts and completions of great length then there is a risk of performance overhead of encoding and transmitting the data. - Logs and metrics clients are specified specifically for this integration. If another integration were to introduce logs then there would be a need for another log writer. Having several log writers could induce thread contention and high memory usage. Co-authored-by: Kyle Verhoog <kyle@verhoog.ca> Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com>

Add an integration for the [OpenAI library](https://github.com/openai/openai-python). This integration provides tracing for the completion, embeddings and chat completion endpoints along with cost estimation metrics and prompt/completion sampling logs. Each log, metric and trace are tagged with service, env, version, OpenAI model, OpenAI endpoint and OpenAI organization. [Docs preview](https://output.circle-artifacts.com/output/job/fe3599b8-952e-4ceb-ac4f-0f15503e9c0d/artifacts/0/tmp/docs/integrations.html#openai) ## Design ### Logs A new log writer implementation is added to submit logs. Logs are submitted direct to intake following a similar approach that [kyle-verhoog/datadog-python](https://github.com/Kyle-Verhoog/datadog-python/blob/main/datadog/_logging.py) and the [.NET tracer](DataDog/dd-trace-dotnet#2240) have taken already. ### Metrics A statsd client is used specifically for the OpenAI integration. ## Testing Testing is done using VCR to record requests made to OpenAI to ensure ease, consistency and reliability in test cases. Logs and metrics are tested using mocking of the clients. Several integration tests using snapshots and subprocess testing ensure that the integration works in a real world OpenAI application. A manual test app was also used: https://gist.github.com/Kyle-Verhoog/1f263ed0aade076b313167d1ba3bfa16 ## Risk - Currently, logs, traces and metrics are all collected, buffered and sent individually through their respective pipelines. Due to this, there is risk that disparity occurs between the tagging and submission of the data. There is also a performance risk as this data is not aggregated or batched when submitted. - Prompts and completions are captured on spans by default with a default limit on the length of the data. This limit only applies to each prompt/completion individually but requests can contain several prompts and completions. If there are many prompts and completions of great length then there is a risk of performance overhead of encoding and transmitting the data. - Logs and metrics clients are specified specifically for this integration. If another integration were to introduce logs then there would be a need for another log writer. Having several log writers could induce thread contention and high memory usage. Co-authored-by: lievan <42917263+lievan@users.noreply.github.com> Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com> Co-authored-by: Federico Mon <federico.mon@datadoghq.com>

## Tips for reviewers **note: 99% of the line changes in this PR are non-code, in the form of snapshot files, riot lockfiles, test data (audio/image files), and test cassette files used to mock request/responses with the OpenAI API, which store binary image/audio file data in `.yaml` files that Github counts as lines of code.** The major files that reviewers should focus on are: - `ddtrace/contrib/openai/patch.py` - outlines which endpoint hooks are added, minor changes to overall patching to minimize coupling with endpoint hook information. - `ddtrace/contrib/openai/_endpoint_hooks.py` - implementation of each endpoint hook, as well as minor refactor to minimize code duplication between endpoint hooks - `tests/contrib/openai/test_openai.py`. - shows how each endpoint is expected to be used. ## Summary This PR adds tracing support for the remaining endpoints not implemented by #5488. This includes: - Model (list, retrieve) - Edits (create) - Images (create, edit, create variation) - Audio (transcribe, translate) - Files (list, create, delete, retrieve, retrieve contents) - Fine-tunes (list, create, retrieve, cancel, list events, delete fine tuned model) - Moderations (create) Changes in this PR include: - Modifying the design of the endpoint hooks to minimize duplicate code, and adding new endpoint hooks for the above endpoints. - Ensuring rate-limit related metrics are tagged as numeric metrics rather than string tags. - Ensure user's OpenAI API key is set as a tag in multiple use scenarios (set in env var, set as attribute in `openai`, directly added as a request param) - Adding and correcting names of attributes extracted from `openai` (e.g. `organization`, `api_type`) - Simplifying and separating patch logic from endpoint-specific tracing logic (only operationID is propagated from endpoint hook level to patch level) - Moving some global integration-level tags to request-level (e.g. `openai.request.model`, `openai.request.endpoint`, added `openai.request.method`) ## OpenAI Integration Design ### Overall Patch Implementation **note**: the overall patch implementation has not changed; this is simply a refresher on how the overall integration works. The openai integration features patching via generators - each of the openAI API's endpoint methods (ex: `openai.Completions.create()`) is wrapped by `_patched_endpoint(endpoint_hook, ...)`, which itself takes a corresponding endpoint hook as an argument. A rough description of what happens when `_patched_endpoint()` is called is as follows: - `_patched_endpoint(endpoint_hook, ...)` starts a trace and starts a generator `_traced_endpoint(endpoint_hook, ...)` - `_traced_endpoint(endpoint_hook, ...)` starts `endpoint_hook().handle_request(...)`, which performs endpoint-specific tracing logic, and yields back to `_patched_endpoint()` once the request has been processed by the endpoint hook. - `_patched_endpoint()` runs the underlying openai API method, then yields back the response/error back to `_traced_endpoint()`, which in turn yields the response/error back to the `endpoint_hook`. - `endpoint_hook()` performs the endpoint-specific tracing logic for the response/error, then finishes the trace. ### Endpoint Hook Design The internal implementation design of the endpoint hooks have been modified (building off of the refactor from #5865) to minimize code duplication. This means that each endpoint hook now stores the following: - `_request_arg_params` and `_request_kwarg_params`, which are tuples containing the arg/kwarg signature of the underlying openAI API endpoint method. These tuples are used in `_EndpointHook._record_request(...)` to add request arg/kwarg parameters as span tags/metrics. - `ENDPOINT_NAME`, `REQUEST_TYPE`, `OPERATION_ID` constants, which reflect the base endpoint name (e.g. `/completions`), http request type (e.g. `POST`), and operation ID as specified by the openAI API specifcations (e.g. `createCompletion`). The operationID is used as the span resource name, and the endpoint/request_type values are added as span tags. - Each endpoint hook also features (optional) a `_record_request(...)` and `_record_response(...)` to add any endpoint-specific span tagging logic. ## Testing The testing for this PR involves snapshot testing for each endpoint, as well as `vcrpy` cassettes used to mock request/responses to OpenAI. ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines) are followed. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] OPTIONAL: PR description includes explicit acknowledgement of the performance implications of the change as reported in the benchmarks PR comment. ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment. --------- Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com> Co-authored-by: Federico Mon <federico.mon@datadoghq.com>

lievan added 5 commits April 5, 2023 00:36

patch OpenAI engine api resource

af21051

set error tag if error is caught

bedd35c

finish span after catching exception

fa77e16

trace completion request and response

14b754b

fix lint

241b78d

lievan requested review from a team as code owners April 6, 2023 14:33

lievan requested review from P403n1x87, Yun-Kim, juanjux and majorgreys April 6, 2023 14:33

lievan marked this pull request as draft April 6, 2023 14:34

brettlangdon requested a review from Kyle-Verhoog April 6, 2023 14:47

Yun-Kim changed the title ~~ddtrace/contrib: support openai integration~~ feat(open-ai): add open-ai integration Apr 6, 2023

lievan added 2 commits April 6, 2023 14:32

support embeddings engine

4b2105a

fix hook errors

9b401fa

Kyle-Verhoog changed the title ~~feat(open-ai): add open-ai integration~~ feat(openai): add OpenAI integration Apr 7, 2023

Kyle-Verhoog force-pushed the evan.li/open-ai branch from 4191e21 to bd1399a Compare April 7, 2023 00:17

Add tests and docs

4268593

Kyle-Verhoog force-pushed the evan.li/open-ai branch from bd1399a to 4268593 Compare April 7, 2023 00:22

Kyle-Verhoog and others added 3 commits April 7, 2023 16:41

Set service name

4b66547

check for supported engines since it differs based on openai version

b4f8350

unpack batched prompt inputs

c54ae73

Kyle-Verhoog assigned Kyle-Verhoog and lievan Apr 7, 2023

Kyle-Verhoog and others added 4 commits April 10, 2023 14:53

Add stats client, report request duration dist

5b6d703

report usage and cost metrics, error metrics

f095bfc

Remove broken unpatching, add patch for chat completions

772f1cc

spec resp and req args

0e1d2d7

lievan and others added 15 commits April 25, 2023 12:31

add warning about unsupported endpoints

f377cd4

add documentation for openai.stream span

1ba5338

small doc change

ac4ac83

change deb to openai[embeddings]

7a324b8

parallelize CI job

297986f

tag estimated metrics, fix embeddings

bf77c2d

just remove how to disalbr requests/aiohttp integrations, oos for ope…

332e55a

…nai docs

clean up comments

a6341c5

all openai tags should be prefixed with openai.

237155a

fix openai prefix

e8c3337

another openai prefix fix

716aafd

doc updates

1fda8b6

Add snapshots for integration tests

9869f46

fmt

2169926

rename files due to module fix

4e28dab

Kyle-Verhoog force-pushed the evan.li/open-ai branch from 95fc7a7 to 4e28dab Compare April 26, 2023 20:40

Kyle-Verhoog added 2 commits April 26, 2023 17:00

fix integration async snapshot test

e75efff

Merge branch '1.x' into evan.li/open-ai

6561705

Kyle-Verhoog enabled auto-merge (squash) April 26, 2023 21:38

Add lock files

c130908

majorgreys approved these changes Apr 26, 2023

View reviewed changes

Kyle-Verhoog disabled auto-merge April 26, 2023 23:32

Kyle-Verhoog merged commit abd4542 into 1.x Apr 26, 2023

Kyle-Verhoog deleted the evan.li/open-ai branch April 26, 2023 23:34

github-actions Bot added this to the v1.14.0 milestone Apr 26, 2023

Yun-Kim mentioned this pull request May 11, 2023

feat(openai): add support for all endpoints #5857

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): add OpenAI integration#5488

feat(openai): add OpenAI integration#5488
Kyle-Verhoog merged 151 commits into
1.xfrom
evan.li/open-ai

lievan commented Apr 6, 2023 •

edited by Kyle-Verhoog

Loading

Uh oh!

pr-commenter Bot commented Apr 7, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

lievan commented Apr 6, 2023 • edited by Kyle-Verhoog Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Logs

Metrics

Testing

Risk

Checklist

Reviewer Checklist

Uh oh!

pr-commenter Bot commented Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lievan commented Apr 6, 2023 •

edited by Kyle-Verhoog

Loading

pr-commenter Bot commented Apr 7, 2023 •

edited

Loading