Skip to content

[8.x](backport #41516) x-pack/metricbeat/module/openai: Add new module#42033

Merged
shmsr merged 4 commits into8.xfrom
mergify/bp/8.x/pr-41516
Dec 19, 2024
Merged

[8.x](backport #41516) x-pack/metricbeat/module/openai: Add new module#42033
shmsr merged 4 commits into8.xfrom
mergify/bp/8.x/pr-41516

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Dec 13, 2024

Proposed commit message

Implement a new module for OpenAI usage collection. This module operates on https://api.openai.com/v1/usage (by default; also configurable for Proxy URLs, etc.) and collects the limited set of usage metrics emitted from the undocumented endpoint.

Example how the usage endpoints emits metrics:

Given timestamps t0, t1, t2, ... tn in ascending order:

  • At t0 (first collection):
   usage_metrics_1: *
  • At t1 (after new API usage):
   usage_metrics_1: *
   usage_metrics_2: *
  • At t2 (continuous collection):
   usage_metrics_1: *
   usage_metrics_2: *
   usage_metrics_3: *

and so on.

Example response:

{
  "object": "list",
  "data": [
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "aggregation_timestamp": 1725389580,
      "n_requests": 1,
      "operation": "completion",
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "n_context_tokens_total": 62,
      "n_generated_tokens_total": 21,
      "email": null,
      "api_key_id": null,
      "api_key_name": null,
      "api_key_redacted": null,
      "api_key_type": null,
      "project_id": null,
      "project_name": null,
      "request_type": ""
    },
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "aggregation_timestamp": 1725389640,
      "n_requests": 1,
      "operation": "completion",
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "n_context_tokens_total": 97,
      "n_generated_tokens_total": 17,
      "email": null,
      "api_key_id": null,
      "api_key_name": null,
      "api_key_redacted": null,
      "api_key_type": null,
      "project_id": null,
      "project_name": null,
      "request_type": ""
    }
  ],
  "tpm_data": [
    {
      "organization_id": "org-xxx",
      "organization_name": "Personal",
      "day_timestamp": 1725321600,
      "snapshot_id": "gpt-4o-mini-2024-07-18",
      "operation": "completion",
      "p90_context_tpm": 97,
      "p90_generated_tpm": 21,
      "p90_provisioned_context_tpm": 0,
      "p90_provisioned_generated_tpm": 0,
      "max_context_tpm": 97,
      "max_generated_tpm": 21,
      "max_provisioned_context_tpm": 0,
      "max_provisioned_generated_tpm": 0
    }
  ],
  "ft_data": [],
  "dalle_api_data": [],
  "whisper_api_data": [],
  "tts_api_data": [],
  "assistant_code_interpreter_data": [],
  "retrieval_storage_data": []
}

As soon as the API is used, usage is generated after a few times. So, if collecting using the module real-time and that too multiple times of the day, it would collect duplicates and it is not good for storage as well as analytics of the usage data.

It's better to collect time.Now() (in UTC) - 24h so that we get full usage collection of the past day (in UTC) and it avoids duplication. So that's why I have introduced a config realtime and set it to false as the collection is 24h delayed; we are now getting daily data. realtime: true will work as any other normal collection where metrics are fetched in set intervals. Our recommendation is to keep realtime: false.

As this is a metricbeat module, we do not have existing package that gives us support to store the cursor. So, in order to avoid pulling already pulled data, timestamps are being stored per API key. Logic for the same is commented in the code on how it is stored. We are using a new custom code to store the state in order to store the cursor and begin from the next available date.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Check the state store
  • Validate with usage dashboard of OpenAI

How to test this PR locally

@mergify mergify bot added the backport label Dec 13, 2024
@mergify mergify bot requested review from a team as code owners December 13, 2024 10:47
@mergify mergify bot requested review from faec and removed request for a team December 13, 2024 10:47
@mergify mergify bot assigned shmsr Dec 13, 2024
@mergify mergify bot requested a review from leehinman December 13, 2024 10:47
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 13, 2024
@botelastic
Copy link
Copy Markdown

botelastic bot commented Dec 13, 2024

This pull request doesn't have a Team:<team> label.

@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Dec 16, 2024

This pull request has not been merged yet. Could you please review and merge it @shmsr? 🙏

@shmsr shmsr enabled auto-merge (squash) December 19, 2024 18:51
@shmsr shmsr merged commit 3474183 into 8.x Dec 19, 2024
@shmsr shmsr deleted the mergify/bp/8.x/pr-41516 branch December 19, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport needs_team Indicates that the issue/PR needs a Team:* label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant