Skip to content

Datasets are prematurely added to the metadata database #2091

@quinntaylormitchell

Description

@quinntaylormitchell

Bug

If a compression command fails due to invalid paths being given in the compression command, the dataset will still be entered into the metadata database. This creates inaccurate behaviour when using, for example, dataset-manager utilities.

CLP version

v0.10.0

Environment

Ubuntu jammy

Reproduction steps

  1. Start clp-json and compress twice: once with a path that does exist, and once with a path that doesn’t exist.
  2. Run dataset-manager.sh list; you’ll see both of the datasets that you specified, even though the compression for the second one
  3. Run ls in clp-package/var/data/archives, and you will see that the only directory present there is the successful dataset.
  4. Run dataset-manager.sh del --all and you’ll see both of the datasets being deleted from the archives and the metadata database.

Sample output:

quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package$ cd /home/quinnmitchell/clp/build/clp-package \
&& ./sbin/compress.sh \
        --timestamp-key 'timestamp' \
        --dataset 'invalid' \
        /home/quinnmitchell/invalid
Container clp-package-clp-runtime-run-5e60ef3d5841 Creating 
Container clp-package-clp-runtime-run-5e60ef3d5841 Created 
2026-03-10T16:55:16.946 INFO [compress] Compression job 8 submitted.
2026-03-10T16:55:17.449 ERROR [compress] Compression failed. At least one of your input paths could not be processed. See the error log at 'user/job_8_failed_paths.txt' inside your configured logs directory (`logs_directory`) for more details.
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package$ cd /home/quinnmitchell/clp/build/clp-package \
&& ./sbin/compress.sh \
        --timestamp-key 'timestamp' \
        --dataset 'real' \
        /home/quinnmitchell/clp/integration-tests/tests/data/json_multifile/logs
Container clp-package-clp-runtime-run-2f7351001714 Creating 
Container clp-package-clp-runtime-run-2f7351001714 Created 
2026-03-10T16:55:42.001 INFO [compress] Compression job 9 submitted.
2026-03-10T16:55:43.007 INFO [compress] Compressed 8.90KB into 3.02KB (2.95x). Speed: 16.77KB/s.
2026-03-10T16:55:43.509 INFO [compress] Compression finished.
2026-03-10T16:55:43.509 INFO [compress] Compressed 8.90KB into 3.02KB (2.95x). Speed: 15.32KB/s.
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package$ cd /home/quinnmitchell/clp/build/clp-package \
&& ./sbin/admin-tools/dataset-manager.sh list
Container clp-package-clp-runtime-run-a0f884875a5e Creating 
Container clp-package-clp-runtime-run-a0f884875a5e Created 
2026-03-10T16:55:52.274 INFO [dataset_manager] Found 2 datasets.
2026-03-10T16:55:52.275 INFO [dataset_manager] invalid
2026-03-10T16:55:52.275 INFO [dataset_manager] real
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package$ cd /home/quinnmitchell/clp/build/clp-package/var/data/archives     
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package/var/data/archives$ ls
real
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package/var/data/archives$ cd /home/quinnmitchell/clp/build/clp-package
quinnmitchell@baker21:/home/quinnmitchell/clp/build/clp-package$ ./sbin/admin-tools/dataset-manager.sh del --all
Container clp-package-clp-runtime-run-fdf79ab1b8d4 Creating 
Container clp-package-clp-runtime-run-fdf79ab1b8d4 Created 
2026-03-10T17:05:09.261 INFO [dataset_manager] Deleted archives of dataset `invalid`.
2026-03-10T17:05:09.512 INFO [dataset_manager] Deleted dataset `invalid` from the metadata database.
2026-03-10T17:05:09.513 INFO [dataset_manager] Deleted archives of dataset `real`.
2026-03-10T17:05:09.787 INFO [dataset_manager] Deleted dataset `real` from the metadata database.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions