Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

fix/msp: make deadlineSeconds job-level configuration, apply in timeout#63017

Merged
bobheadxi merged 1 commit into
mainfrom
msp-job-deadlineseconds
May 31, 2024
Merged

fix/msp: make deadlineSeconds job-level configuration, apply in timeout#63017
bobheadxi merged 1 commit into
mainfrom
msp-job-deadlineseconds

Conversation

@bobheadxi

@bobheadxi bobheadxi commented May 31, 2024

Copy link
Copy Markdown
Member

In a rushed POC of MSP jobs, I did some pretty bad copy-pasting (evidenced by all the service-specific docstrings I have removed in this PR) and made a bad configuration decision here, resulting in a few issues:

  1. schedule.deadline is not actually applied to Cloud Run jobs, causing jobs to time out earlier than desired
  2. schedule.deadline is not the right place to configure a deadline, because all jobs need a configurable deadline, not just those with schedules. This change moves schedule.deadline to deadlineSeconds.

Closes CORE-145

Test plan

$ sg msp generate gatekeeper prod
$ git diff
diff --git a/services/gatekeeper/service.yaml b/services/gatekeeper/service.yaml
index fd6a3812..ce4b02e3 100644
--- a/services/gatekeeper/service.yaml
+++ b/services/gatekeeper/service.yaml
@@ -48,4 +48,4 @@ environments:
           - "primary"
     schedule:
       cron: 0 * * * *
-      deadline: 1800 # 30 minutes
+    deadlineSeconds: 1800 # 30 minutes
diff --git a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
index 3c2c295e..f83b32b9 100644
--- a/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
+++ b/services/gatekeeper/terraform/prod/stacks/cloudrun/cdk.tf.json
@@ -281,7 +281,7 @@
                   },
                   {
                     "name": "JOB_EXECUTION_DEADLINE",
-                    "value": "600s"
+                    "value": "1800s"
                   }
                 ],
                 "image": "us.gcr.io/sourcegraph-dev/abuse-ban-bot:${var.resolved_image_tag}",
@@ -302,7 +302,7 @@
               }
             ],
             "service_account": "${data.terraform_remote_state.cross-stack-reference-input-iam.outputs.cross-stack-output-google_service_accountiam-workload-accountemail}",
-            "timeout": "300s",
+            "timeout": "1800s",
             "volumes": [
             ],
             "vpc_access": {
@@ -341,7 +341,7 @@
             "uniqueId": "job_scheduler"
           }
         },
-        "attempt_deadline": "600s",
+        "attempt_deadline": "1800s",
         "depends_on": [
           "google_cloud_run_v2_job_iam_member.cloudrun_scheduler_job_invoker"
         ],

Changelog

  • MSP jobs: schedule.deadline is deprecated, use the top-level deadlineSeconds instead. Configured deadlines are now correctly applied as the Cloud Run job execution timeout as well.

@bobheadxi bobheadxi requested review from a team and chrsmith May 31, 2024 21:06
@cla-bot cla-bot Bot added the cla-signed label May 31, 2024
@bobheadxi bobheadxi marked this pull request as ready for review May 31, 2024 21:09
@bobheadxi bobheadxi enabled auto-merge (squash) May 31, 2024 21:14
@bobheadxi bobheadxi merged commit 012db75 into main May 31, 2024
@bobheadxi bobheadxi deleted the msp-job-deadlineseconds branch May 31, 2024 21:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants