Skip to content

Add TokenCount metric#74

Merged
yisz merged 3 commits intomainfrom
metric/token_count
Aug 4, 2024
Merged

Add TokenCount metric#74
yisz merged 3 commits intomainfrom
metric/token_count

Conversation

@pantonante
Copy link
Contributor

@pantonante pantonante commented Aug 3, 2024

🚀 This description was created by Ellipsis for commit dd610a6

Summary:

Added TokenCount metric to count tokens in retrieved context using tiktoken encoder or an approximation, with tests and documentation.

Key points:

  • Added TokenCount metric in continuous_eval/metrics/retrieval/tokens.py.
  • Updated continuous_eval/metrics/retrieval/__init__.py to include TokenCount.
  • Implemented TokenCount class to count tokens using tiktoken encoder or an approximation.
  • Added tests for TokenCount in tests/retrieval_metrics_test.py.
  • Added documentation for TokenCount in docs/src/content/docs/metrics/Retrieval/Deterministic/token_count.md.
  • Updated metrics overview in docs/src/content/docs/metrics/overview.md to include TokenCount.

Generated with ❤️ by ellipsis.dev

@pantonante pantonante requested a review from yisz August 3, 2024 03:29
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to abedd8b in 36 seconds

More details
  • Looked at 88 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_arh5EbQPeJG41abu


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.


from continuous_eval.metrics.base import Metric

_CHARACTERS_PER_TOKEN = 4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more dynamic or configurable approach for _CHARACTERS_PER_TOKEN instead of a fixed value to better accommodate different languages or text formats.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on d39208a in 16 seconds

More details
  • Looked at 72 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. docs/src/content/docs/metrics/overview.md:96
  • Draft comment:
    The documentation for the TokenCount metric is accurate and aligns with the detailed documentation in token_count.md. It correctly lists the metric under the Deterministic category for Retrieval metrics and provides a brief definition and the required input.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The documentation for the TokenCount metric in the overview.md file is accurate and aligns with the detailed documentation in token_count.md. It correctly lists the metric under the Deterministic category for Retrieval metrics and provides a brief definition and the required input. No issues or inconsistencies are found in this documentation snippet.

Workflow ID: wflow_OHoTzzysnHKRafsY


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on dd610a6 in 34 seconds

More details
  • Looked at 86 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. README.md:18
  • Draft comment:
    The PR description mentions the addition of the TokenCount metric, but the diff does not include changes to any Python files such as continuous_eval/metrics/retrieval/tokens.py. Please ensure that all relevant changes are included in the PR.
  • Reason this comment was not posted:
    Confidence of 0% on close inspection, compared to threshold of 50%.

Workflow ID: wflow_dUTdZ3I9ja94hUpX


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@yisz yisz merged commit a9823db into main Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants