feat(integration-tests): Introduce IntegrationTestDataset fixtures; move test data to central directory.#2181
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two integration-test datasets (JSON and text) and session-scoped fixtures; refactors dataset model to load and validate dataset metadata from each dataset root and verifies listed log files exist. Changes
Sequence DiagramsequenceDiagram
participant Pytest as pytest
participant Conftest as conftest.py
participant Fixture as datasets.py (fixture)
participant Dataset as IntegrationTestDataset
participant Metadata as metadata.json
participant Logs as logs/
Pytest->>Conftest: load plugins (pytest_plugins)
Conftest->>Fixture: auto-load fixtures module
Pytest->>Fixture: request json_multifile fixture
Fixture->>Dataset: instantiate with path_to_dataset_root
Dataset->>Metadata: read metadata.json
Metadata-->>Dataset: parsed IntegrationTestDatasetMetadata
Dataset->>Dataset: validate begin_ts ≤ end_ts
Dataset->>Logs: check logs_subdir exists and file_names present
Logs-->>Dataset: file existence confirmed
Dataset-->>Fixture: return configured dataset
Fixture-->>Pytest: provide fixture to tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
IntegrationTestDataset fixtures and move test data to central directory.IntegrationTestDataset fixtures; move test data to central directory.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/data/json_multifile/logs/sts-135-2011-07-11.jsonl`:
- Around line 1-8: The JSONL records for mission "STS-135" (each object with
"line_index" 0–7) currently have CRLF endings; change the file so every record
uses LF-only line endings (remove trailing \r from each line) ensuring no CR
characters remain at end-of-line and re-save the JSONL with Unix/LF EOL to
prevent byte-level comparison flakes.
In `@integration-tests/tests/data/text_multifile/logs/apollo-17_day10.txt`:
- Line 4: Replace the misspelled phrase "ground based" with the hyphenated form
"ground-based" in the fixture text line that reads "navigation sightings compare
well with ground based tracking solutions" so it becomes "navigation sightings
compare well with ground-based tracking solutions"; update the test fixture file
content accordingly to preserve surrounding punctuation and spacing.
In `@integration-tests/tests/fixtures/datasets.py`:
- Around line 17-23: The fixtures duplicate dataset-loading logic and access
nested metadata without validation, causing brittle failures; create a shared
loader/validator (e.g., a helper function like load_and_validate_dataset or
load_json_to_dataset) that centralizes path resolution using path_to_dataset and
load_json_to_dict, validates required keys such as
metadata_dict["file_structure"]["logs_subdir"] (and any schema fields), raises
clear errors on missing/invalid fields, and returns an IntegrationTestDataset
populated via dataset_name, path_to_dataset_logs and metadata_dict; replace the
duplicated blocks (including the other occurrence at lines 31-37) to call this
helper so failures are fast and diagnostics are meaningful.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59db8514-97df-4fbd-a009-57b2044ce639
📒 Files selected for processing (15)
integration-tests/tests/conftest.pyintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-08.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-09.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-11.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-19.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-21.jsonlintegration-tests/tests/data/json_multifile/metadata.jsonintegration-tests/tests/data/text_multifile/logs/apollo-17_day01.txtintegration-tests/tests/data/text_multifile/logs/apollo-17_day04.txtintegration-tests/tests/data/text_multifile/logs/apollo-17_day07.txtintegration-tests/tests/data/text_multifile/logs/apollo-17_day10.txtintegration-tests/tests/data/text_multifile/logs/apollo-17_day13.txtintegration-tests/tests/data/text_multifile/metadata.jsonintegration-tests/tests/fixtures/datasets.pyintegration-tests/tests/utils/utils.py
IntegrationTestDataset fixtures; move test data to central directory.IntegrationTestDataset fixtures; move test data to central directory.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/data/text_multifile/metadata.json`:
- Line 16: The metadata file sets "columns_file_name": null but the
IntegrationTestDatasetMetadata model currently types columns_file_name as str;
update the model (IntegrationTestDatasetMetadata) to accept None by changing the
type to Optional[str] or using the union syntax (str | None) for
columns_file_name so Pydantic validation succeeds when the field is null.
In `@integration-tests/tests/utils/classes.py`:
- Line 75: The type annotation for columns_file_name must allow nulls; change
its declaration in the class to be nullable (use "str | None" or
"Optional[str]") so Pydantic accepts null from text_multifile/metadata.json;
also add or update the typing import (Optional) if you choose that form. Ensure
this aligns with the existing columns_file_path property which already handles
None.
- Line 96: The code calls validate_dir_exists on a file path
(metadata_file_path); replace this with a file-specific check: add or reuse a
validate_file_exists utility that accepts the same input type as
validate_dir_exists, verifies Path(...).exists() and Path(...).is_file(), and
raises the same error semantics as the directory validator on failure, then
change the call in the class from validate_dir_exists(self.metadata_file_path)
to validate_file_exists(self.metadata_file_path) (or directly use
Path(self.metadata_file_path).is_file() with identical error handling if you
prefer not to add a helper).
- Around line 103-104: The code incorrectly calls validate_dir_exists on
columns_file_path (a file, not a directory); replace that call with the
file-level check (e.g., validate_file_exists(self.columns_file_path)) or
otherwise validate that self.columns_file_path is an existing file (or use
os.path.isdir/os.path.isfile) inside the same class/method where
columns_file_path is set so the validator matches the resource type; update the
reference to validate_dir_exists to the appropriate file validator for
columns_file_path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f644ae71-5762-4c9e-9cfb-2956cb47710d
📒 Files selected for processing (5)
integration-tests/tests/data/json_multifile/columns.jsonintegration-tests/tests/data/json_multifile/metadata.jsonintegration-tests/tests/data/text_multifile/metadata.jsonintegration-tests/tests/fixtures/datasets.pyintegration-tests/tests/utils/classes.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/utils/classes.py`:
- Around line 98-99: The code reads JSON metadata with
self.metadata_file_path.read_text() which uses the system default encoding;
change the read to explicitly use UTF-8 (e.g., call read_text(encoding="utf-8")
or open the path with encoding="utf-8") before passing the string to
IntegrationTestDatasetMetadata.model_validate_json so self.metadata is parsed
consistently across platforms; update the read that assigns raw_metadata and
keep the subsequent
IntegrationTestDatasetMetadata.model_validate_json(raw_metadata) call unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f46b0472-5f7b-465a-be0f-686da657ca34
📒 Files selected for processing (1)
integration-tests/tests/utils/classes.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/data/README.md`:
- Line 48: The markdown line uses straight (regular) quotes instead of inline
code formatting for boolean literals; update the text so the boolean values are
wrapped in backticks (e.g., replace "True" and "False" with `True` and `False`)
in the README entry that currently reads '`True` if logs are unstructured, else
`False`.' to ensure proper inline code formatting for the boolean values.
In `@integration-tests/tests/utils/classes.py`:
- Around line 103-114: The metadata validation currently accepts absolute paths
or '..' traversal from metadata (e.g., logs_subdir and file_names) which can
escape the dataset root; update the validation after
validate_dir_exists(self.logs_path) and before calling validate_file_exists to
resolve candidate paths (use Path.resolve or os.path.realpath) for both the
logs_subdir-derived path and each file_path_abs, then assert the resolved path
is contained within the dataset root (self.logs_path.resolve()) — if not, fail
the test (pytest.fail) with a clear message; reference functions/attributes:
validate_dir_exists, validate_file_exists, self.logs_path,
self.metadata.file_names, and any logs_subdir handling so the containment check
covers both directory and file entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: af9f042f-69b6-49a6-adc3-0997266b7d83
📒 Files selected for processing (10)
integration-tests/tests/data/README.mdintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-08.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-09.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-11.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-19.jsonlintegration-tests/tests/data/json_multifile/logs/sts-135-2011-07-21.jsonlintegration-tests/tests/data/json_multifile/metadata.jsonintegration-tests/tests/data/text_multifile/metadata.jsonintegration-tests/tests/fixtures/datasets.pyintegration-tests/tests/utils/classes.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration-tests/tests/data/README.md`:
- Around line 73-83: The fixture example incorrectly passes dataset_name to the
IntegrationTestDataset constructor even though IntegrationTestDataset derives
its name from metadata; update the dataset_name fixture to only construct
IntegrationTestDataset with the path (e.g. return
IntegrationTestDataset(path_to_dataset_root=integration_test_path_config.test_data_path
/ dataset_name)) and keep the fixture name/IntegrationTestPathConfig usage the
same so the runtime won't fail due to an unexpected keyword argument.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 970428ed-e7b2-489d-aa97-488e961e7651
📒 Files selected for processing (1)
integration-tests/tests/data/README.md
There was a problem hiding this comment.
shall we start creating integration-test-specific docs in <project-root>/docs/ or do you think this README.md is fine to be kept here for now?
There was a problem hiding this comment.
fwiw I have already started the docs but #1686 is on hold until the bulk of the testing suite is finished.
There was a problem hiding this comment.
- We have integration specific docs inside
docs/src/dev-docs/testing/integration-tests.md. - I think we can move this README.md to the docs directory in a follow-up PR. Preferably after the current refactoring operation is complete.
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Updated the documentation format for dataset fields.
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
- Polish README
- Use
*_dirinstead of*_pathfor directory properties.
| | `end_ts` | The latest timestamp present in the dataset (ms). | | ||
| | `logs_subdir` | The name of the subdirectory containing logs. | | ||
| | `file_names` | A list of the files within `logs_subdir`. | | ||
| | `single_match_wildcard_query` | A wildcard query which, when searched, will match a single log message in the dataset. | |
There was a problem hiding this comment.
| | `single_match_wildcard_query` | A wildcard query which, when searched, will match a single log message in the dataset. | | |
| | `single_match_wildcard_query` | A wildcard query that should match exactly one log message in the dataset. | |
There was a problem hiding this comment.
How about
| | `single_match_wildcard_query` | A wildcard query which, when searched, will match a single log message in the dataset. | | |
| | `single_match_wildcard_query` | A wildcard query that matches exactly one log message in the dataset. | |
| return self.dataset_root_dir / "metadata.json" | ||
|
|
||
| @property | ||
| def logs_path(self) -> Path: |
There was a problem hiding this comment.
| def logs_path(self) -> Path: | |
| def logs_dir(self) -> Path: |
Still prefer using _dir suffix but let's wait for @junhaoliao
There was a problem hiding this comment.
Let's use logs_path - i agree it's more clear in representing an absolute path. we should probably document this convention somewhere in our developer guidelines though
| return self.dataset_root_dir / "metadata.json" | ||
|
|
||
| @property | ||
| def logs_path(self) -> Path: |
There was a problem hiding this comment.
Let's use logs_path - i agree it's more clear in representing an absolute path. we should probably document this convention somewhere in our developer guidelines though
… move test data to central directory. (y-scope#2181) Co-authored-by: Junhao Liao <junhao.liao@yscope.com> Co-authored-by: Junhao Liao <junhao@junhao.ca>
Description
This PR stores test datasets in a centralized
/datadirectory, and makes them available to all tests through session-scoped fixtures indatasets.py. Each test dataset directory carries ametadata.jsonfile which specifies certain information about the dataset needed for testing. The information in this file is loaded into a pydantic object (IntegrationTestDatasetMetadata) which validates the format of the file.NOTE: the old test datasets located at
integration-tests/tests/package_tests/clp_json/data/json-multifileandintegration-tests/tests/package_tests/clp_text/data/text-multifileare not removed in this PR, and they are still referenced in code. They will be removed when appropriate as the rest of the package testing code is updated and developed.Checklist
breaking change.
Validation performed
Tested on development branch; all tests pass.
Summary by CodeRabbit
Tests
Documentation
Chores