MAINT: improve organization of dataset fetch functions (refactoring) by paulinek13 · Pull Request #785 · microsoft/PyRIT

paulinek13 · 2025-03-13T19:06:14Z

Description

Related issue: #775

This PR is about refactoring the dataset fetching functions to improve their organization and maintainability as the codebase grows and new datasets are introduced.

🛠️ The main changes:

Extracting individual dataset fetching functions from fetch_example_datasets.py into separate files (similar to how converters are handled)
Moving dataset tests to a dedicated directory (tests/unit/datasets)
Improving docs by sorting dataset functions alphabetically in both implementation (__init__.py) and docs (api.rst)

✏️ Other modifications:

Added two missing functions related to datasets to the API reference: fetch_babelscape_alert_dataset and fetch_librAI_do_not_answer_dataset
Updated references and imports to maintain functionality after refactoring (and to fix broken tests)
Renamed fetch_example_datasets.py to fetch_examples.py
Updated .pre-commit-config.yaml
Updated /doc files: doc/code/datasets/0_dataset.md, doc/code/datasets/2_fetch_dataset.ipynb, doc/code/datasets/2_fetch_dataset.py

Close #775

Separates each dataset fetching function into its own file for better organization and maintainability.

…n naming

paulinek13 · 2025-03-15T07:53:02Z

This is almost ready to be reviewed. I just have a question:

Should I update the blog post about Datasets and Seed Prompts since, after the changes I've made in this PR, it will no longer be up-to-date? It's about the following paragraph specifically: 2025_02_11.md#loading-datasets-with-seed-prompts

I'll absolutely update the User guide for Datasets. Just wondering whether I should also modify the blog post 😄

romanlutz · 2025-03-15T08:15:41Z

This is almost ready to be reviewed. I just have a question:

Should I update the blog post about Datasets and Seed Prompts since, after the changes I've made in this PR, it will no longer be up-to-date? It's about the following paragraph specifically: 2025_02_11.md#loading-datasets-with-seed-prompts

I'll absolutely update the User guide for Datasets. Just wondering whether I should also modify the blog post 😄

Awesome! We usually don't update blog posts substantially, but this is easy enough of a fix that I'm inclined to make the change. CC @eugeniavkim

I would replace

They are in the fetch_example_datasets.py file.

with

They are in the pyrit.datasets module.

romanlutz

Thank you! This is perfect!

paulinek13 · 2025-03-15T16:59:30Z

I see the checks are failing. I've run pytest tests/unit && pre-commit run --all-files before marking this PR as ready for review and it was successful 🤔, but I used Python 3.11
Now I tried it locally but with Python 3.10 and it's failing as in the checks

I'll try to fix the problem tomorrow 😃

romanlutz · 2025-03-15T21:08:32Z

There might be a naming collision since fetch_examples is both the file and function name. But that's just a guess.

paulinek13 · 2025-03-16T12:09:48Z

There might be a naming collision since fetch_examples is both the file and function name. But that's just a guess.

That's right, renaming did the trick! Thank you so much!

romanlutz · 2025-03-17T03:08:14Z

Fantastic @paulinek13 !!! Thanks once again for a great contribution.

paulinek13 added 10 commits March 13, 2025 13:41

api.rst: sort datasets fetch functions

3ac09aa

rename from fetch_example_datasets.py to fetch_examples.py

5f29af7

refactor: extract dataset fetching functions into separate files

62aee0c

Separates each dataset fetching function into its own file for better organization and maintainability.

fetch_examples.py: remove unused imports

4660e12

update datasets/__init__.py after the refactoring

936ab97

fix tests

7171f95

pre-commit run --all-files

31aa160

move dataset tests to tests/unit/datasets and improve consistency i…

c3bfcf8

…n naming

__init__.py: sort entries in __all__

ef1065c

api.rst: add missing fetch functions

ac9cd96

paulinek13 changed the title ~~[DRAFT] REFACTOR: improve organization and maintainability of dataset fetch functions~~ [DRAFT] MAINT: improve organization and maintainability of dataset fetch functions Mar 15, 2025

paulinek13 changed the title ~~[DRAFT] MAINT: improve organization and maintainability of dataset fetch functions~~ [DRAFT] MAINT: improve organization of dataset fetch functions (refactoring) Mar 15, 2025

paulinek13 added 3 commits March 15, 2025 10:30

update pre-commit config

a1f1426

update User guide for Datasets

1689b86

update 2_fetch_dataset.py and 2_fetch_dataset.ipynb

9450a4b

paulinek13 changed the title ~~[DRAFT] MAINT: improve organization of dataset fetch functions (refactoring)~~ MAINT: improve organization of dataset fetch functions (refactoring) Mar 15, 2025

paulinek13 marked this pull request as ready for review March 15, 2025 11:35

romanlutz approved these changes Mar 15, 2025

View reviewed changes

fix: rename fetch_examples.py -> dataset_helper.py

0525d97

Merge branch 'main' into refactor/775/improve_datasets_organization

dba4dd0

romanlutz merged commit 3779df9 into microsoft:main Mar 17, 2025
18 checks passed

paulinek13 deleted the refactor/775/improve_datasets_organization branch March 17, 2025 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: improve organization of dataset fetch functions (refactoring)#785

MAINT: improve organization of dataset fetch functions (refactoring)#785
romanlutz merged 15 commits intomicrosoft:mainfrom
paulinek13:refactor/775/improve_datasets_organization

paulinek13 commented Mar 13, 2025 •

edited

Loading

Uh oh!

paulinek13 commented Mar 15, 2025 •

edited

Loading

Uh oh!

romanlutz commented Mar 15, 2025

Uh oh!

romanlutz left a comment

Uh oh!

paulinek13 commented Mar 15, 2025 •

edited

Loading

Uh oh!

romanlutz commented Mar 15, 2025

Uh oh!

paulinek13 commented Mar 16, 2025

Uh oh!

Uh oh!

romanlutz commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paulinek13 commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

paulinek13 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romanlutz commented Mar 15, 2025

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

paulinek13 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romanlutz commented Mar 15, 2025

Uh oh!

paulinek13 commented Mar 16, 2025

Uh oh!

Uh oh!

romanlutz commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paulinek13 commented Mar 13, 2025 •

edited

Loading

paulinek13 commented Mar 15, 2025 •

edited

Loading

paulinek13 commented Mar 15, 2025 •

edited

Loading