Skip to content

fix(webui): Fix dataset handling in compression job creation to use default when value is empty (fixes #1854).#1855

Merged
junhaoliao merged 4 commits into
y-scope:mainfrom
junhaoliao:empty-dataset
Jan 13, 2026
Merged

fix(webui): Fix dataset handling in compression job creation to use default when value is empty (fixes #1854).#1855
junhaoliao merged 4 commits into
y-scope:mainfrom
junhaoliao:empty-dataset

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Jan 8, 2026

Copy link
Copy Markdown
Member

Description

as the title says, we missed to provide override when the input field is empty.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  1. launched the webui. go to the ingestion page
  2. selected a sample file with the select tree component
  3. entered "123" in the Dataset field, then emptied the field.
  4. clicked the "Submit" button and observed the job was submitted and completed successfully.

also, as a sanity test:
Repeat 1 & 2
3. entered "samples" in the Dataset field
4. clicked the "Submit" button and observed the job was submitted and completed successfully

Checked the "Search" tab. In the Dataset drop-down, both "default" and "samples" are listed as options. Performed a search with query string "1" for each dataset and observed results were returned.

Summary by CodeRabbit

  • Bug Fixes
    • Dataset handling during ingestion now treats empty dataset entries as missing and applies the default dataset; explicitly provided non-empty dataset values are preserved, ensuring payloads always include a valid dataset value.

✏️ Tip: You can customize this high-level summary in your review settings.

@junhaoliao junhaoliao requested a review from a team as a code owner January 8, 2026 20:20
@junhaoliao junhaoliao requested a review from davemarco January 8, 2026 20:20
@coderabbitai

coderabbitai Bot commented Jan 8, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Submit-path payload construction for dataset now treats undefined or empty-string ("") values the same: when values.dataset is undefined or has length 0, payload.dataset is set to CLP_DEFAULT_DATASET_NAME; otherwise payload.dataset is set to the provided non-empty value.

Changes

Cohort / File(s) Summary
Payload dataset assignment logic
components/webui/client/src/pages/IngestPage/Compress/index.tsx
Changed dataset assignment to default to CLP_DEFAULT_DATASET_NAME when values.dataset is undefined or an empty string (length === 0); otherwise sets payload.dataset to the provided non-empty value.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing dataset handling to use a default value when the input field is empty, and directly references the associated issue (#1854).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @components/webui/client/src/pages/IngestPage/Compress/index.tsx:
- Around line 61-63: The code currently only sets payload.dataset to
CLP_DEFAULT_DATASET_NAME when values.dataset is undefined or empty, so non-empty
custom dataset values are never sent; update the logic around values.dataset
(the same block handling payload.dataset) to assign payload.dataset =
values.dataset when values.dataset is defined and non-empty (i.e., add an else
branch or use a ternary) while keeping the fallback to CLP_DEFAULT_DATASET_NAME
for undefined/empty inputs; reference payload.dataset, values.dataset, and
CLP_DEFAULT_DATASET_NAME to locate and modify the code.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bab9d9a and 21ffaf7.

📒 Files selected for processing (1)
  • components/webui/client/src/pages/IngestPage/Compress/index.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/webui/client/src/pages/IngestPage/Compress/index.tsx
🧠 Learnings (3)
📓 Common learnings
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
📚 Learning: 2025-08-14T05:06:59.010Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1050
File: components/clp-package-utils/clp_package_utils/scripts/archive_manager.py:178-178
Timestamp: 2025-08-14T05:06:59.010Z
Learning: In the decompress.py script, the --dataset argument was only added to the json_extraction_parser but not to the ir_extraction_parser, causing AttributeError when handle_extract_stream_cmd tries to access parsed_args.dataset for IR extraction commands.

Applied to files:

  • components/webui/client/src/pages/IngestPage/Compress/index.tsx
📚 Learning: 2025-09-15T22:20:40.750Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.

Applied to files:

  • components/webui/client/src/pages/IngestPage/Compress/index.tsx
🧬 Code graph analysis (1)
components/webui/client/src/pages/IngestPage/Compress/index.tsx (1)
components/webui/common/src/config.ts (1)
  • CLP_DEFAULT_DATASET_NAME (51-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: check-generated
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)

Comment thread components/webui/client/src/pages/IngestPage/Compress/index.tsx
Comment thread components/webui/client/src/pages/IngestPage/Compress/index.tsx
@junhaoliao junhaoliao requested a review from davemarco January 13, 2026 18:41
@junhaoliao

Copy link
Copy Markdown
Member Author

the docs CI is failing which is unrelated to the changes in the current PR: https://github.com/y-scope/clp/actions/runs/20968400563/job/60264957848

@coderabbitai create an issue to track the CI failure

@coderabbitai

coderabbitai Bot commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@junhaoliao I'll create an issue to track the docs CI failure. Let me first gather the details about the failure.


🏁 Script executed:

#!/bin/bash
# Get details about the workflow run
gh run view 20968400563 --repo y-scope/clp --json name,conclusion,displayTitle,headBranch,event,workflowName,url

# Get the job details and logs
gh run view 20968400563 --repo y-scope/clp --log --attempt 1 | grep -A 20 -B 5 "Error\|error\|fail\|FAIL" | head -100

Length of output: 12754


🏁 Script executed:

#!/bin/bash
# Create GitHub issue to track the docs CI failure
gh issue create \
  --repo y-scope/clp \
  --title "Docs CI workflow failing with operation canceled during dependency installation" \
  --body "## Description

The \`clp-docs\` CI workflow is failing with an operation canceled error during the dependency installation step on macOS runners.

## Workflow Run Details

- **Workflow Run**: https://github.com/y-scope/clp/actions/runs/20968400563
- **Job**: https://github.com/y-scope/clp/actions/runs/20968400563/job/60264957848
- **Workflow**: clp-docs
- **Runner**: macos-15
- **Error**: \`##[error]The operation was canceled.\`

## Context

This failure occurred during PR #1855 (https://github.com/y-scope/clp/pull/1855) but is unrelated to the changes in that PR, which only modify webui dataset handling.

The error occurred while installing coreutils via homebrew:
\`\`\`
🍺  /opt/homebrew/Cellar/coreutils/9.9: 453 files, 12.4MB
##[error]The operation was canceled.
\`\`\`

## Reported By

@junhaoliao in https://github.com/y-scope/clp/pull/1855#discussion_r1907469819" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao I've created issue #1866 to track the docs CI failure: #1866

The issue documents the operation canceled error that occurred during the coreutils installation on the macos-15 runner, and includes links back to this PR and the failing workflow run.


🧠 Learnings used
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-08-04T18:38:33.130Z
Learning: User haiqi96 requested creating a GitHub issue to document a documentation discrepancy where Task version requirements in docs weren't updated after yscope-utils upgrade in PR #1158.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 has a regression that breaks shell command processing, particularly rsync commands with brace expansion (e.g., `file.{d.ts,js,wasm}`). This causes CI failures in clp-ffi-js project (issue #110), so CLP should avoid v3.44.1 and use v3.44.0 instead, which fixes the dynamic variable bug without the shell processing regression.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

@davemarco davemarco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@junhaoliao junhaoliao merged commit b85e089 into y-scope:main Jan 13, 2026
21 checks passed
@junhaoliao junhaoliao deleted the empty-dataset branch January 14, 2026 22:24
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants