Skip to content

[cocoon] When the merge queue is torn down; unlock content hash #172694

@jtmcdole

Description

@jtmcdole

The tree outage yesterday was due to a build in the merge queue getting shuffled around. What happens:

  1. When an engine change enters the queue, a doc is created for its unique hash (if not a rollback).
  2. While waiting, the queue got "stuck" for other reasons and an element removed ahead of the engine change.
  3. The queue is torn down and all CI jobs canceled. The leads to the doc in step 1 eventually beigned marked as a failure
  4. When the job is rescheduled, it does not recreate the doc and instead marks the job as "waiting".
  5. Special case: since we're still building all git-hash artifacts, step 4 doesn't wait and instead builds without the "content_hash" flag that signals to LUCI recipes to upload to two locations.
  6. Once the commit exists the queue, all tests will fail because the hash wasn't uploaded.

Step 3 needs to happen the moment Github tells use the queue is being destroyed.
Step 4 should have the first engine artifact take ownership and add itself to the.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority issues at the top of the work listteam-infraOwned by Infrastructure teamtriaged-infraTriaged by Infrastructure team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions