Skip to content

Recompute all the cache entries with error code "UnexpectedError"#1117

Merged
severo merged 4 commits into
mainfrom
retry-errors-as-env-var
May 2, 2023
Merged

Recompute all the cache entries with error code "UnexpectedError"#1117
severo merged 4 commits into
mainfrom
retry-errors-as-env-var

Conversation

@severo

@severo severo commented May 2, 2023

Copy link
Copy Markdown
Collaborator

To be launched after #1116 is merged, to recompute the erroneous UnexpectedError entries

@github-actions

github-actions Bot commented May 2, 2023

Copy link
Copy Markdown

ArgoCD Diff for commit ad28895

Updated at 5/2/2023, 3:00:24 PM CEST

App: datasets-server-dev
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

Details
===== apps/Deployment datasets-server/dev-datasets-server-admin ======
--- /tmp/argocd-diff2587627133/dev-datasets-server-admin-live.yaml	2023-05-02 13:00:23.539369092 +0000
+++ /tmp/argocd-diff2587627133/dev-datasets-server-admin	2023-05-02 13:00:23.535369043 +0000
@@ -437,7 +437,7 @@
           value: "1"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-2829133
+        image: huggingface/datasets-server-services-admin:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 3

===== apps/Deployment datasets-server/dev-datasets-server-api ======
--- /tmp/argocd-diff3604918176/dev-datasets-server-api-live.yaml	2023-05-02 13:00:23.555369290 +0000
+++ /tmp/argocd-diff3604918176/dev-datasets-server-api	2023-05-02 13:00:23.555369290 +0000
@@ -390,7 +390,7 @@
           value: "1"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-2829133
+        image: huggingface/datasets-server-services-api:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 3

===== apps/Deployment datasets-server/dev-datasets-server-worker-all ======
--- /tmp/argocd-diff1458673115/dev-datasets-server-worker-all-live.yaml	2023-05-02 13:00:23.599369835 +0000
+++ /tmp/argocd-diff1458673115/dev-datasets-server-worker-all	2023-05-02 13:00:23.595369785 +0000
@@ -506,7 +506,7 @@
           value: "1"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-2829133
+        image: huggingface/datasets-server-services-worker:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         name: dev-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/dev-datasets-server-worker-light ======
--- /tmp/argocd-diff3166863738/dev-datasets-server-worker-light-live.yaml	2023-05-02 13:00:23.627370181 +0000
+++ /tmp/argocd-diff3166863738/dev-datasets-server-worker-light	2023-05-02 13:00:23.623370132 +0000
@@ -508,7 +508,7 @@
         - name: WORKER_JOB_TYPES_BLOCKED
           value: /config-names,/split-names-from-streaming,config-parquet-and-info,split-first-rows-from-parquet,split-first-rows-from-streaming,split-opt-in-out-urls-scan
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-2829133
+        image: huggingface/datasets-server-services-worker:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         name: dev-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/dev-datasets-server-job-metrics-collector ======
--- /tmp/argocd-diff4092199544/dev-datasets-server-job-metrics-collector-live.yaml	2023-05-02 13:00:23.647370429 +0000
+++ /tmp/argocd-diff4092199544/dev-datasets-server-job-metrics-collector	2023-05-02 13:00:23.647370429 +0000
@@ -171,7 +171,7 @@
               value: mongodb://dev-datasets-server-mongodb
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-2829133
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f2ed26e
             imagePullPolicy: IfNotPresent
             name: dev-datasets-server-metrics-collector
             resources:

App: datasets-server-prod
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

Details
===== apps/Deployment datasets-server/prod-datasets-server-admin ======
--- /tmp/argocd-diff2597776784/prod-datasets-server-admin-live.yaml	2023-05-02 13:00:24.467380574 +0000
+++ /tmp/argocd-diff2597776784/prod-datasets-server-admin	2023-05-02 13:00:24.463380524 +0000
@@ -459,7 +459,7 @@
           value: "9"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-d5ab5fa
+        image: huggingface/datasets-server-services-admin:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 3

===== apps/Deployment datasets-server/prod-datasets-server-api ======
--- /tmp/argocd-diff1536126752/prod-datasets-server-api-live.yaml	2023-05-02 13:00:24.483380772 +0000
+++ /tmp/argocd-diff1536126752/prod-datasets-server-api	2023-05-02 13:00:24.483380772 +0000
@@ -413,7 +413,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-d5ab5fa
+        image: huggingface/datasets-server-services-api:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 3

===== apps/Deployment datasets-server/prod-datasets-server-worker-all ======
--- /tmp/argocd-diff4052745654/prod-datasets-server-worker-all-live.yaml	2023-05-02 13:00:24.535381415 +0000
+++ /tmp/argocd-diff4052745654/prod-datasets-server-worker-all	2023-05-02 13:00:24.531381365 +0000
@@ -523,7 +523,7 @@
           value: "5"
         - name: WORKER_JOB_TYPES_BLOCKED
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-d5ab5fa
+        image: huggingface/datasets-server-services-worker:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-light ======
--- /tmp/argocd-diff2942554837/prod-datasets-server-worker-light-live.yaml	2023-05-02 13:00:24.567381811 +0000
+++ /tmp/argocd-diff2942554837/prod-datasets-server-worker-light	2023-05-02 13:00:24.563381761 +0000
@@ -525,7 +525,7 @@
         - name: WORKER_JOB_TYPES_BLOCKED
           value: /config-names,/split-names-from-streaming,config-parquet-and-info,split-first-rows-from-parquet,split-first-rows-from-streaming,split-opt-in-out-urls-scan
         - name: WORKER_JOB_TYPES_ONLY
-        image: huggingface/datasets-server-services-worker:sha-d5ab5fa
+        image: huggingface/datasets-server-services-worker:sha-f2ed26e
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ======
--- /tmp/argocd-diff3667376274/prod-datasets-server-job-backfill-live.yaml	2023-05-02 13:00:24.575381910 +0000
+++ /tmp/argocd-diff3667376274/prod-datasets-server-job-backfill	2023-05-02 13:00:24.575381910 +0000
@@ -189,9 +189,10 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: backfill
+            - name: CACHE_MAINTENANCE_BACKFILL_ERROR_CODES_TO_RETRY
             - name: LOG_LEVEL
               value: debug
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-d5ab5fa
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f2ed26e
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-backfill
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-metrics-collector ======
--- /tmp/argocd-diff3941492670/prod-datasets-server-job-metrics-collector-live.yaml	2023-05-02 13:00:24.587382058 +0000
+++ /tmp/argocd-diff3941492670/prod-datasets-server-job-metrics-collector	2023-05-02 13:00:24.587382058 +0000
@@ -190,7 +190,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-d5ab5fa
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-f2ed26e
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-metrics-collector
             resources:

Legend Status
The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️ The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑 There was an error generating the ArgoCD diffs due to changes in this PR.

@HuggingFaceDocBuilder

HuggingFaceDocBuilder commented May 2, 2023

Copy link
Copy Markdown
Collaborator

The documentation is not available anymore as the PR was closed or merged.

dataset=dataset,
processing_graph=processing_graph,
revision=dataset_info.sha,
error_codes_to_retry=error_codes_to_retry,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be an argument for .backfill() instead ?

@severo severo May 2, 2023

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, .backfill() only applies the plan computed when we created the DatasetState (the list of tasks is computed when the DatasetState is created, and it requires the error_codes_to_retry parameter).

Moving it to .backfill() would change the logic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see ! LGTM then

@severo

severo commented May 2, 2023

Copy link
Copy Markdown
Collaborator Author

Thanks. Waiting for #1116 before merging this one

@severo severo merged commit 810a549 into main May 2, 2023
@severo severo deleted the retry-errors-as-env-var branch May 2, 2023 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants