Content safety evals aggregate max from conversations#39083
Merged
MilesHolland merged 21 commits intoAzure:mainfrom Jan 22, 2025
Merged
Content safety evals aggregate max from conversations#39083MilesHolland merged 21 commits intoAzure:mainfrom
MilesHolland merged 21 commits intoAzure:mainfrom
Conversation
nagkumar91
approved these changes
Jan 8, 2025
Collaborator
|
API change check APIView has identified API level changes in this PR and created following API reviews. |
Contributor
There was a problem hiding this comment.
Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.
Files not reviewed (5)
- sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
- sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk
Comments suppressed due to low confidence (3)
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77
- [nitpick] Update the docstring to reflect the correct class name if it is renamed to
ConversationNumericAggregationType.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41
- Missing period at the end of the docstring.
Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42
- The
conversation_aggregation_typeparameter should be explicitly mentioned in the constructor's docstring.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType
diondrapeck
previously requested changes
Jan 14, 2025
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
Added change, but GH is still annoyed about it.
slister1001
approved these changes
Jan 15, 2025
nagkumar91
approved these changes
Jan 15, 2025
ninghu
reviewed
Jan 15, 2025
ninghu
reviewed
Jan 15, 2025
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
ninghu
reviewed
Jan 15, 2025
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py
Show resolved
Hide resolved
ninghu
reviewed
Jan 15, 2025
...tion/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py
Show resolved
Hide resolved
ninghu
reviewed
Jan 15, 2025
…m:MilesHolland/azure-sdk-for-python into jan25/eval/improvement/cs-convo-takes-max
w-javed
pushed a commit
that referenced
this pull request
Jan 23, 2025
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
w-javed
pushed a commit
to w-javed/azure-sdk-for-python
that referenced
this pull request
Jan 23, 2025
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
w-javed
added a commit
that referenced
this pull request
Jan 27, 2025
* Azure AI Evaluation Release 1.2.0 * Azure AI Evaluation Release 1.2.0 * fix the intersphinx references for a new reference methodology (#39332) * handle only deleted files in a <language> - pullrequest build (#39266) Co-authored-by: Scott Beddall <scbedd@microsoft.com> * fix tests weekly (#39338) * [Storage] update perf tests core baseline (#39336) * [Storage] update perf tests core baseline * update storage file baselien * [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (#39105) * code and test * Update CHANGELOG.md for new model properties --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (#39215) * code and test * update testcases * Update CHANGELOG.md to remove method details * Update changelog for quota operations changes * Update release date in CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (#39322) * code and test * Remove duplicate method overloads from changelog * Update CHANGELOG.md to remove instance variables * Fix typo in changelog entry * Update CHANGELOG for version 2.0.0 --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Fix urls (#39259) * Fix urls (#39251) * fix url (#39255) * Fix urls (#39248) * Fix urls (#39246) * Content safety evals aggregate max from conversations (#39083) * add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip * Fix urls (#39129) * Fix urls (#39262) * Sync eng/common directory with azure-sdk-tools for PR 9668 (#39347) * Support incrementing semver prereleases with 'zero' versions * Make tests more explicit --------- Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> * [ServiceBus/EventHub] lock pending deliveries on send (#38067) * [ServiceBus/EventHub] lock pending deliveries on send * remove misc logging * changelog + test * fix tests, remove session lock * remove logging from test * sync with sb * add todo in sender.py tfor temporary fix * bumped versions after jan 22 patch release (#39355) * Sync eng/common directory with azure-sdk-tools for PR 9656 (#39356) * Added label handle sdk-gen pipeline template Added common script to delete label from a PR * Update eng/common/scripts/Invoke-GitHubAPI.ps1 Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> --------- Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * Update package_utils.py (#39361) * [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (#38561) * code and test * Update app_service_environments_create_or_update_multi_role_pool.py * udpate version * update-testcase * update testcases * update format * Update CHANGELOG.md * Update CHANGELOG.md * update version --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * fix: Loosen psutil version requirement (#39354) * Enable sample type checking for cosmos (#39334) This is already passing so enabling in CI so we can continue to validate samples with mypy * update change log * change date format * change date format * change date format --------- Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Scott Beddall <scbedd@microsoft.com> Co-authored-by: Krista Pratico <krpratic@microsoft.com> Co-authored-by: swathipil <76007337+swathipil@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: Xiang Yan <xiangsjtu@gmail.com> Co-authored-by: MilesHolland <108901744+MilesHolland@users.noreply.github.com> Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> Co-authored-by: Peter Wu <162184229+weirongw23-msft@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: kdestin <101366538+kdestin@users.noreply.github.com>
allenkim0129
pushed a commit
to allenkim0129/azure-sdk-for-python
that referenced
this pull request
Jan 27, 2025
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
allenkim0129
pushed a commit
to allenkim0129/azure-sdk-for-python
that referenced
this pull request
Jan 27, 2025
* Azure AI Evaluation Release 1.2.0 * Azure AI Evaluation Release 1.2.0 * fix the intersphinx references for a new reference methodology (Azure#39332) * handle only deleted files in a <language> - pullrequest build (Azure#39266) Co-authored-by: Scott Beddall <scbedd@microsoft.com> * fix tests weekly (Azure#39338) * [Storage] update perf tests core baseline (Azure#39336) * [Storage] update perf tests core baseline * update storage file baselien * [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (Azure#39105) * code and test * Update CHANGELOG.md for new model properties --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (Azure#39215) * code and test * update testcases * Update CHANGELOG.md to remove method details * Update changelog for quota operations changes * Update release date in CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (Azure#39322) * code and test * Remove duplicate method overloads from changelog * Update CHANGELOG.md to remove instance variables * Fix typo in changelog entry * Update CHANGELOG for version 2.0.0 --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Fix urls (Azure#39259) * Fix urls (Azure#39251) * fix url (Azure#39255) * Fix urls (Azure#39248) * Fix urls (Azure#39246) * Content safety evals aggregate max from conversations (Azure#39083) * add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip * Fix urls (Azure#39129) * Fix urls (Azure#39262) * Sync eng/common directory with azure-sdk-tools for PR 9668 (Azure#39347) * Support incrementing semver prereleases with 'zero' versions * Make tests more explicit --------- Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> * [ServiceBus/EventHub] lock pending deliveries on send (Azure#38067) * [ServiceBus/EventHub] lock pending deliveries on send * remove misc logging * changelog + test * fix tests, remove session lock * remove logging from test * sync with sb * add todo in sender.py tfor temporary fix * bumped versions after jan 22 patch release (Azure#39355) * Sync eng/common directory with azure-sdk-tools for PR 9656 (Azure#39356) * Added label handle sdk-gen pipeline template Added common script to delete label from a PR * Update eng/common/scripts/Invoke-GitHubAPI.ps1 Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> --------- Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * Update package_utils.py (Azure#39361) * [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (Azure#38561) * code and test * Update app_service_environments_create_or_update_multi_role_pool.py * udpate version * update-testcase * update testcases * update format * Update CHANGELOG.md * Update CHANGELOG.md * update version --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * fix: Loosen psutil version requirement (Azure#39354) * Enable sample type checking for cosmos (Azure#39334) This is already passing so enabling in CI so we can continue to validate samples with mypy * update change log * change date format * change date format * change date format --------- Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Scott Beddall <scbedd@microsoft.com> Co-authored-by: Krista Pratico <krpratic@microsoft.com> Co-authored-by: swathipil <76007337+swathipil@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: Xiang Yan <xiangsjtu@gmail.com> Co-authored-by: MilesHolland <108901744+MilesHolland@users.noreply.github.com> Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> Co-authored-by: Peter Wu <162184229+weirongw23-msft@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: kdestin <101366538+kdestin@users.noreply.github.com>
l0lawrence
pushed a commit
to l0lawrence/azure-sdk-for-python
that referenced
this pull request
Feb 19, 2025
* add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip
l0lawrence
pushed a commit
to l0lawrence/azure-sdk-for-python
that referenced
this pull request
Feb 19, 2025
* Azure AI Evaluation Release 1.2.0 * Azure AI Evaluation Release 1.2.0 * fix the intersphinx references for a new reference methodology (Azure#39332) * handle only deleted files in a <language> - pullrequest build (Azure#39266) Co-authored-by: Scott Beddall <scbedd@microsoft.com> * fix tests weekly (Azure#39338) * [Storage] update perf tests core baseline (Azure#39336) * [Storage] update perf tests core baseline * update storage file baselien * [AutoRelease] t2-computeschedule-2025-01-10-50036(can only be merged by SDK owner) (Azure#39105) * code and test * Update CHANGELOG.md for new model properties --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-quota-2025-01-16-93059(can only be merged by SDK owner) (Azure#39215) * code and test * update testcases * Update CHANGELOG.md to remove method details * Update changelog for quota operations changes * Update release date in CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * [AutoRelease] t2-servicenetworking-2025-01-21-47646(can only be merged by SDK owner) (Azure#39322) * code and test * Remove duplicate method overloads from changelog * Update CHANGELOG.md to remove instance variables * Fix typo in changelog entry * Update CHANGELOG for version 2.0.0 --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * Fix urls (Azure#39259) * Fix urls (Azure#39251) * fix url (Azure#39255) * Fix urls (Azure#39248) * Fix urls (Azure#39246) * Content safety evals aggregate max from conversations (Azure#39083) * add convo agg type, and have harm evals use max * analysis * correct enum name in docs * refactor checked enum into function field * cl and analysis * change enum name and update CL * change function names to private, allow agg type retrieval * PR comments * test serialization * CL * CI adjustment * try again * perf * skip perf * remove skip * Fix urls (Azure#39129) * Fix urls (Azure#39262) * Sync eng/common directory with azure-sdk-tools for PR 9668 (Azure#39347) * Support incrementing semver prereleases with 'zero' versions * Make tests more explicit --------- Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> * [ServiceBus/EventHub] lock pending deliveries on send (Azure#38067) * [ServiceBus/EventHub] lock pending deliveries on send * remove misc logging * changelog + test * fix tests, remove session lock * remove logging from test * sync with sb * add todo in sender.py tfor temporary fix * bumped versions after jan 22 patch release (Azure#39355) * Sync eng/common directory with azure-sdk-tools for PR 9656 (Azure#39356) * Added label handle sdk-gen pipeline template Added common script to delete label from a PR * Update eng/common/scripts/Invoke-GitHubAPI.ps1 Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> --------- Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * Update package_utils.py (Azure#39361) * [AutoRelease] t2-web-2024-11-15-26155(can only be merged by SDK owner) (Azure#38561) * code and test * Update app_service_environments_create_or_update_multi_role_pool.py * udpate version * update-testcase * update testcases * update format * Update CHANGELOG.md * Update CHANGELOG.md * update version --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> * fix: Loosen psutil version requirement (Azure#39354) * Enable sample type checking for cosmos (Azure#39334) This is already passing so enabling in CI so we can continue to validate samples with mypy * update change log * change date format * change date format * change date format --------- Co-authored-by: Scott Beddall <45376673+scbedd@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Scott Beddall <scbedd@microsoft.com> Co-authored-by: Krista Pratico <krpratic@microsoft.com> Co-authored-by: swathipil <76007337+swathipil@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <119990644+ChenxiJiang333@users.noreply.github.com> Co-authored-by: ChenxiJiang333 <v-chenjiang@microsoft.com> Co-authored-by: Xiang Yan <xiangsjtu@gmail.com> Co-authored-by: MilesHolland <108901744+MilesHolland@users.noreply.github.com> Co-authored-by: Patrick Hallisey <pahallis@microsoft.com> Co-authored-by: Peter Wu <162184229+weirongw23-msft@users.noreply.github.com> Co-authored-by: ray chen <raychen@microsoft.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> Co-authored-by: Yuchao Yan <yuchaoyan@microsoft.com> Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Co-authored-by: kdestin <101366538+kdestin@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a new enum: Aggregation type, as well as a utility class which converts that enum to associated functions.
This enum is then leveraged in the base eval class to control the way that multi-turn conversations have their per-turn results aggregated into a single value. Also adds private functions to inject custom functions directly, and testing for all this.
In the future, this will likely be used to control how evaluation results across multiple evals are aggregated in the
evaluate()function.