Compression scheduler crashes on job submission in CLP Text package due to missing datasets table

## Bug Report

### Description
The compression scheduler crashes when processing any job submission in the CLP Text package due to attempting to query a non-existent `clp_datasets` table.

### Environment
- **CLP Version**: Current main branch
- **Storage Engine**: CLP Text (not CLP-S)
- **Database**: MariaDB

### Steps to Reproduce
1. `cd clp-package/sbin`
2. `./start-clp.sh`
3. `./compress.sh ~/samples/hive-24h`

### Expected Behavior
Job should complete successfully with compression statistics displayed:
```
2025-08-18T01:28:52.264 INFO [compress] Compression job 1 submitted.
2025-08-18T01:28:57.353 INFO [compress] Compressed 79.16MB into 1.74MB (45.42x). Speed: 60.15MB/s.
2025-08-18T01:28:58.858 INFO [compress] Compressed 1.08GB into 28.37MB (38.91x). Speed: 391.22MB/s.
2025-08-18T01:28:59.363 INFO [compress] Compressed 1.58GB into 41.66MB (38.82x). Speed: 486.25MB/s.
2025-08-18T01:29:00.371 INFO [compress] Compression finished.
2025-08-18T01:29:00.371 INFO [compress] Compressed 1.99GB into 45.22MB (45.03x). Speed: 512.79MB/s.
```

### Actual Behavior
Compression scheduler crashes with the following error:
```
2025-08-17 23:31:57,128 compression_scheduler [INFO] Starting compression scheduler
2025-08-17 23:31:57,130 compression_scheduler [ERROR] Error in scheduling.
Traceback (most recent call last):
  File "/opt/clp/lib/python3/site-packages/job_orchestration/scheduler/compress/compression_scheduler.py", line 430, in main
    search_and_schedule_new_tasks(
  File "/opt/clp/lib/python3/site-packages/job_orchestration/scheduler/compress/compression_scheduler.py", line 171, in search_and_schedule_new_tasks
    existing_datasets = fetch_existing_datasets(
  File "/opt/clp/lib/python3/site-packages/clp_py_utils/clp_metadata_db_utils.py", line 194, in fetch_existing_datasets
    db_cursor.execute(f"SELECT name FROM \`{get_datasets_table_name(table_prefix)}\`")
mariadb.ProgrammingError: Table 'clp-db.clp_datasets' doesn't exist
```

### Root Cause
The code unconditionally calls `fetch_existing_datasets` which tries to query the `clp_datasets` table. However, this table only exists for CLP-S storage engine, not for CLP Text package.

### Proposed Solution
Add storage engine check before fetching existing datasets, as suggested by @haiqi96:
- Only call `fetch_existing_datasets` when storage engine is CLP-S
- Initialize `existing_datasets` as empty set for other storage engines

### Additional Context
- Issue affects: `components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py`
- Related PR: https://github.com/y-scope/clp/pull/1144
- Comment: https://github.com/y-scope/clp/pull/1144#discussion_r2281128455

### Reporter
@junhaoliao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression scheduler crashes on job submission in CLP Text package due to missing datasets table #1214

Bug Report

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Solution

Additional Context

Reporter

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Compression scheduler crashes on job submission in CLP Text package due to missing datasets table #1214

Description

Bug Report

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Solution

Additional Context

Reporter

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions