fix(job-orchestration): Handle garbage collector recovery file loss when k8s emptyDir volume is recycled

## Problem

When the garbage-collector pod crashes and Kubernetes recycles its `emptyDir` volume (mounted at `/var/tmp`), the recovery file written by `archive_garbage_collector.py` is lost. Without this recovery file, the garbage collector loses track of archives that were already scheduled for deletion, potentially leaving orphaned archives in storage that are never cleaned up.

This issue was identified during the review of #1834, which moved the recovery file from `clp_config.logs_directory` to `clp_config.tmp_directory`.

## Steps to Reproduce

1. Deploy CLP on Kubernetes.
2. Start a garbage collection job.
3. Crash the garbage-collector pod mid-run.
4. Allow Kubernetes to recycle the `emptyDir` volume.
5. Observe that the recovery file is gone; previously scheduled archives may not be cleaned up.

## Expected Behaviour

The garbage collector should be resilient to pod restarts and volume recycling — orphaned archives should still be identified and cleaned up correctly.

## Possible Solutions

- Persist the recovery file to a durable volume (e.g., a PersistentVolumeClaim) rather than an `emptyDir` volume.
- Redesign the garbage collection logic to be idempotent and not rely on a recovery file that may be lost across restarts.

## References

- PR: https://github.com/y-scope/clp/pull/1834
- Comment: https://github.com/y-scope/clp/pull/1834#issuecomment-2575063900

Raised by @junhaoliao.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(job-orchestration): Handle garbage collector recovery file loss when k8s emptyDir volume is recycled #2260

Problem

Steps to Reproduce

Expected Behaviour

Possible Solutions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(job-orchestration): Handle garbage collector recovery file loss when k8s emptyDir volume is recycled #2260

Description

Problem

Steps to Reproduce

Expected Behaviour

Possible Solutions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions