Skip to content

roachtest/mixedversion: dedicated versioned workload binary#155397

Closed
williamchoe3 wants to merge 3 commits intocockroachdb:masterfrom
williamchoe3:wchoe/154274-roachtest-mixedversion-sunc-workload-binary-followup-schemachange-test
Closed

roachtest/mixedversion: dedicated versioned workload binary#155397
williamchoe3 wants to merge 3 commits intocockroachdb:masterfrom
williamchoe3:wchoe/154274-roachtest-mixedversion-sunc-workload-binary-followup-schemachange-test

Conversation

@williamchoe3
Copy link
Copy Markdown
Contributor

@williamchoe3 williamchoe3 commented Oct 14, 2025

11/6/25: Currently, mixedversion relies on including non supported versioned during the upgrade plan. Although "skip-version" upgrades are allowed, that only includes 24.x+. Also the specific workload for this test schemachange doesn't support skip versions. Given all of the, the upgrade plan would be at most 3. The current upgrade plan is 2. That level of coverage increase in addition to the mixedversion and workload expectations that this would break make this change challenging.
https://cockroachlabs.slack.com/archives/D08V7URRX52/p1762458712973069?thread_ts=1762458595.101659&cid=D08V7URRX52

Acknowledging the lack of coverage here, but putting this aside as there is no great path forward.


Informs #154274

Enables versioned workload binaries for mixedversion tests

New Bucket vs Reuse Existing Bucket

Existing gs://cockroach-edge-artifacts-prod policy for prefix cockroach/

    { name = "cockroach-edge-artifacts-prod", cleanup = [{
      age    = 30
      prefix = "cockroach/"
    }], public = true, enable_versioning = false }

While we can create a new bucket, we can also continue to use the gs://cockroach-edge-artifacts-prod bucket, but change the prefix of latest key workload binary objects. This approach would let us be able to reuse the publish-artifacts edge argument without having to write new support for a new one or add a flag for a new bucket. (still needs a change, but just changing the prefix is relatively simple IMO)
Discussion: https://cockroachlabs.slack.com/archives/C03SG8QKYRJ/p1760976858868889
Also this way avoids adding new lifecycle policies to the bucket
If we want the new bucket, here's the terraform PR: https://github.com/cockroachlabs/crl-infrastructure/pull/4277

Dependencies

publish-artifacts: need unsupported workload binary versions

Change Workload binary keys from something like
gs://cockroach-edge-artifacts-prod/cockroach/workload.release-24.1
to a different key prefix
i.e. gs://cockroach-edge-artifacts-prod/workload/workload.release-24.1
publish-artifacts PR: #155789

  • Needs to be backported (to everything? 🥹)

#156820

#156822
#156855
#156863
#156883
#156885
#156887

New Upgrade Plan selector that only uses supported versions

Didn't realize we wouldn't be able to backport changes to non supported versions. Therefore, if a test wants to use a a dedicated versioned workload binary, then it's upgrade path can only contain supported versions.

The question is how would that list be maintained?

i.e. the naive approach would to just be to hard code a list in master. Then every time a release becomes unsupported, that change would need to made to master and backported to all currently supported versions. Also everytime a new release branch get's cut, that change would need to be made to master (wouldn't need to back port this one). That seems like a lot of overhead.

If this information was centralized, I think that would be more intuitive, but keeping centralized config seems to be an anti pattern with all the other config being version controlled

roachprod: StageApplication change to use new latest key prefix

roachprod PR: #155803

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

craig bot pushed a commit that referenced this pull request Oct 22, 2025
155772: sql/inspect: skip REFCURSOR stored columns in consistency check r=spilchen a=spilchen

REFCURSOR values cannot be compared for equality, which breaks the `INSPECT` command when an index stores a column of this type. During a consistency check, INSPECT attempts to compare stored column payloads between the primary and secondary indexes. This fails for REFCURSOR values because the column type cannot be used in an equality comparison.

This change skips REFCURSOR stored columns during the primary/secondary comparison.

Informs: #155676
Epic: CRDB-55075

Release note (bug fix): INSPECT can now be run on tables with indexes that store REFCURSOR-typed columns.

155789: publish-artifacts: change GCS latest key prefix for Workload binary r=rail a=williamchoe3

Informs #154274 

In gcs `gs://cockroach-edge-artifacts-prod`, change `latest` Workload binary key prefix from `cockroach` to `workload`
i.e.
```
gs://cockroach-edge-artifacts-prod/cockroach/workload.release-24.1
to
gs://cockroach-edge-artifacts-prod/workload/workload.release-24.1
```
This will allow `latest` workload binary to not be deleted after 30 days so tests can download and use workload binaries from unsupported versions (that no longer have nightlies running against them)

Notes:
* Will need to backfill currently unsupported versions 
* ".../workload/workload...." while descriptive also sounds redundant, open to suggestions if folks don't like that prefix
* The non latest copy of the obj i.e. `gs://cockroach-edge-artifacts-prod/cockroach/workload.ee80eab77ca6434f4be7c33c0f6d29dd0b9d07b1` is still prefixed with `cockroach` so these objects will get deleted after 30 days with the current lifecycle policy

Edited `PutNonRelease`'s docstring to remove "redirect" terminology, looks to be from a previous s3 implementation which is not a feature of GCS.
* Note: I don't feel that strongly about the docstring, it just confused me a bit when reading the code and just wanted it to match what the function is actually doing 

Also noticed there's no unit test coverage for non master branches, added 1 release branch test with simple coverage

Also noticed there's a slight difference between mockGCSProvider.PutObject & release.PutObject
```
# obj latest key names on gs://cockroach-edge-artifacts-prod
gs://cockroach-edge-artifacts-prod/cockroach/workload.release-23.2

# obj latest key names in unit tests
gs://edge-binaries-bucket/workload/workload.release-25.4/no-cache
```
Actual `release.PutObject` does not modify the key name based on the `PutObjectInput.CacheControl` field, but in `mockGCSProvider.PutObject` the `"/no-cache"` gets appended
I changed the behavior in the mock to match the actual function, and modified the unit tests
FWIW `PutObjectInput.CacheControl` is only set to `False` currently
* It looks like this might be a hold over from the S3 implementation? I didn't read that diff but let me know if I should revert this change

This PR is a dependency for these roachtest changes: #155397
Also discussed other options like using a new bucket or using the release bucket here: https://cockroachlabs.slack.com/archives/C03SG8QKYRJ/p1760977289072049?thread_ts=1760976858.868889&cid=C03SG8QKYRJ 
Decided on this approach for simplicity 

For verification, I'm relying on the unit tests. Just assumed this wouldn't work on my local, but if there's a simple way to verify this E2E let me know and i'll do it 

Will squash and write a proper commit msg after review, leaving commit separate incase i need to revert anything

155797: *: replace some usages of global rand r=yuzefovich a=yuzefovich

**sem/builtins: use RNG from eval context for 'st_generatepoints'**

***: replace some usages of global rand**

When we use `rand.*` methods like `math/rand.Intn`, under the hood we hit
the global rand source, which is protected by a mutex, so we could be subject
to contention on that lock. This commit updates some of such usages in
favor of object-specific rand source to avoid that. Note that allocating
a separate rand source is not free (the allocation itself is about 5KiB
in size), so we modify spots where the lifecycle matches that of the
server or when the affected code is heavy already / not on the hot path
(also when it's clear that the access is from within a single
goroutine).

Epic: None
Release note: None

155865: roachtest: remove metamorphicBufferedSender r=wenyihu6 a=wenyihu6

This setting is now enabled by default, so there is no need to enable this
metamorphically.

Epic: none 
Release note: none 

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: wenyihu6 <wenyi@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants