roachprod: implement GetVMSpecs for IBM#155368
Conversation
|
e.g. willchoe-1760458579-02-n1cpu4-0001.json |
93622ea to
01cc611
Compare
|
|
||
| // GetVMSpecs implements the vm.GetVMSpecs interface method which returns a | ||
| // map from VM.Name to a map of VM attributes | ||
| func (p *Provider) GetVMSpecs( |
There was a problem hiding this comment.
FWIW, I think relying on a global OperationTimeout is kind of a bad pattern anyway. Instead it should just take in a context which is passed in by FetchVMSpecs; it already creates one with a 5 minute timeout that isn't respected by this method.
There was a problem hiding this comment.
Agreed, though it's not necessarily incompatible as if you context.WithTimeout() on a context that already has a timeout, it ends up honoring the lowest one.
FYI, I've started added some context passing in there for the List() paths.
There was a problem hiding this comment.
Ah didn't notice the outer context in the roachtest caller. With OperationTimeout was just reusing the pattern I saw in Azure's provider impl https://github.com/cockroachdb/cockroach/blob/0345413bde1ca99f79a806be00871283169b8bf6/pkg/roachprod/vm/azure/azure.go
So the better approach is to reimplement all the provider methods with a *WithContext version like ListWithContext https://github.com/cockroachdb/cockroach/pull/154893/files#diff-ca2fc7d620174ca0b2964983db21a35a21515be7e850d5ce6f944f9421cfd1fbR808
We can do that as a part of the roachprod centralized changes? In the mean time i can adjust this OperationTimeout to just match what roachprod currently sets?
There was a problem hiding this comment.
Agreed that we can (and will have to) fully implement context later on.
Though, since you're in there, you should probably already implement GetVMSpecsWithContext(ctx context.Context, l *logger.Logger, vms vm.List) and satisfy the current interface with:
func (p *Provider) GetVMSpecs(l *logger.Logger, vms vm.List) (map[string]map[string]interface{}, error) {
return `GetVmSpecsWithContext(context.Background(), l, vms)
}
golgeek
left a comment
There was a problem hiding this comment.
Few nits, otherwise looks good!
|
|
||
| // Extract the spec of all VMs and create a map from VM name to spec. | ||
| vmSpecs := make(map[string]map[string]interface{}) | ||
| for _, vmInstance := range vms { |
There was a problem hiding this comment.
I wonder if it could be worth it to move this to a ctxgroup.WithContext (pkg/util/ctxgroup) to parallelize the instance calls, potentially with a hard limit of 4 or 5 on the error group to avoid spiking too many go routines.
Did you try it with a big-ish number of VMs? Was it fast?
There was a problem hiding this comment.
it seemed fast in the context of an entire roachtest run, but i can time how it is serially with like 10
There was a problem hiding this comment.
6 sec serial vs 1 sec for parallel (limit 5) for 10 nodes
will go with the ctxgroup
test-teardown: 2025/11/04 22:09:45 provider.go:898: Getting specs for 10 VMs serially
test-teardown: 2025/11/04 22:09:45 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0001
test-teardown: 2025/11/04 22:09:46 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0002
test-teardown: 2025/11/04 22:09:47 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0003
test-teardown: 2025/11/04 22:09:47 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0004
test-teardown: 2025/11/04 22:09:48 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0005
test-teardown: 2025/11/04 22:09:48 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0006
test-teardown: 2025/11/04 22:09:49 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0007
test-teardown: 2025/11/04 22:09:49 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0008
test-teardown: 2025/11/04 22:09:50 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0009
test-teardown: 2025/11/04 22:09:50 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0010
test-teardown: 2025/11/04 22:09:51 provider.go:930: Finished getting specs for 10 VMs
parallel
test-teardown: 2025/11/04 22:43:15 provider.go:899: Getting specs for 10 VMs serially
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0005
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0001
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0002
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0004
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0003
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0006
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0007
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0008
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0009
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0010
test-teardown: 2025/11/04 22:43:16 provider.go:974: Finished getting specs for 10 VMs
|
|
||
| // GetVMSpecs implements the vm.GetVMSpecs interface method which returns a | ||
| // map from VM.Name to a map of VM attributes | ||
| func (p *Provider) GetVMSpecs( |
There was a problem hiding this comment.
Agreed that we can (and will have to) fully implement context later on.
Though, since you're in there, you should probably already implement GetVMSpecsWithContext(ctx context.Context, l *logger.Logger, vms vm.List) and satisfy the current interface with:
func (p *Provider) GetVMSpecs(l *logger.Logger, vms vm.List) (map[string]map[string]interface{}, error) {
return `GetVmSpecsWithContext(context.Background(), l, vms)
}
|
Ran Local single node, multi node passes Will run a smoke with Also just ripped out the operation level timeout, although i did add a 1 min timeout per goroutine, left a TODO to pass in a context when the interface get's updated @golgeek @DarrylWong could you take another pass when you get a chance, mainly just the |
ba9e94d to
3063197
Compare
|
Need this one to go through before I can smoke test on CI #156958 |
1529406 to
77e63ce
Compare
pkg/roachprod/vm/ibm/provider.go
Outdated
| ) (map[string]map[string]interface{}, error) { | ||
| // TODO replace background context with passed in context after *WithContext | ||
| // interface methods are implemented for vm.Provider | ||
| vmSpecs, err := p.GetVMSpecsWithContext(context.Background(), l, vms) |
There was a problem hiding this comment.
nit: return p.GetVMSpecsWithContext(...?
Implementation for previously unimplemented interface method `GetVMSpecs()` for IBM. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs`. Informs cockroachdb#146202 Epic: None Release note: None
77e63ce to
3c0cfae
Compare
|
tftr! bors r=DarrylWong,golgeek |
154467: roachprod: azure GetVMSpecs() implementation r=DarrylWong a=williamchoe3 Informs #146202 Implementation for previously unimplemented interface method `GetVMSpecs()` for Azure. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs` Also changed `azureIDPattern` to match the optional type name pairs that may exist in a fully qualified azure resource id. See comments for more details. Previously, if these optional type name pairs existed, they were just a part of the `resourceName` field. Currently, I didn't see any of these child resource ids being parsed, but if we were to ever parse those in the future the previous implementation wouldn't make sense. * Note: Although the regex selects child resource type and names in the last matching group, they are not saved to AzureId struct because they are not being used For Azure, `client.Get(...)` allows you to pass in an optional `expander` parameter which seems to provide additional information for `"instanceView"` and runs an optionally passed in user script (passed in on vm creation) when `"userData"` is passed in. * `"userData"` doesn't seem relevant for us "retrieves the UserData property as part of the VM model view that was provided by the user during the VM Create/Update operation" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP * "The `instanceView` of an Azure Virtual Machine (VM) provides a snapshot of the runtime properties and status of the VM" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view?view=rest-compute-2025-04-01&tabs=HTTP Seems nice, but there's no diff in the json returned by `"instanceView"` and the default when `expander` is not set (referred to `model view` in some places on thier docs) ```go InstanceViewTypesInstanceView InstanceViewTypes = "instanceView" InstanceViewTypesUserData InstanceViewTypes = "userData" ``` IBM PR: #155368 155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3 Informs #146202 Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs * API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance Also added comments to help contextualize some parts that I thought would be helpful while doing some debug. See comment for e.g. vm specs 156504: sql: enable test tenants r=yuzefovich a=yuzefovich **sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests** These were redundant because we have GetRequests in all call sites. **sql: enable test tenants** Some tests have been adjusted to work with test tenants, others have specific issues to investigate further. Fixes: #143114 Epic: CRDB-48945 156547: sql: add ColumnGeneratedAsIdentity as a new schema change element r=shghasemi a=shghasemi Previously, GeneratedAsIdentity was part of the Column element. Creating a new schema change element will allow the declarative schema changer to add/remove/update such columns without dropping the Column element. Epic: [CRDB-31283](https://cockroachlabs.atlassian.net/browse/CRDB-31283) Part of: #142918 Release note: None 156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.). Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization. This will be used by the new distributed merge pipeline. Closes #156578 Epic: CRDB-48845 Release note: none Co-authored by: `@jeffswenson` 156987: sql: remove now-stale growstack for internal executor r=yuzefovich a=yuzefovich As of 4a5c503, we now grow stack unconditionally for all async tasks. Some time ago we added an ability to growstack in the async task of the internal executor when used in the LDR, and now that has become redundant, so we remove this special case. Epic: None Release note: None Co-authored-by: William Choe <williamchoe3@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Shadi Ghasemitaheri <shadi.ghasemitaheri@cockroachlabs.com> Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
|
Build failed (retrying...): |
155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3 Informs #146202 Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs * API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance Also added comments to help contextualize some parts that I thought would be helpful while doing some debug. See comment for e.g. vm specs 156504: sql: enable test tenants r=yuzefovich a=yuzefovich **sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests** These were redundant because we have GetRequests in all call sites. **sql: enable test tenants** Some tests have been adjusted to work with test tenants, others have specific issues to investigate further. Fixes: #143114 Epic: CRDB-48945 156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.). Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization. This will be used by the new distributed merge pipeline. Closes #156578 Epic: CRDB-48845 Release note: none Co-authored by: `@jeffswenson` Co-authored-by: William Choe <williamchoe3@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
|
Build failed (retrying...): |
154467: roachprod: azure GetVMSpecs() implementation r=DarrylWong a=williamchoe3 Informs #146202 Implementation for previously unimplemented interface method `GetVMSpecs()` for Azure. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs` Also changed `azureIDPattern` to match the optional type name pairs that may exist in a fully qualified azure resource id. See comments for more details. Previously, if these optional type name pairs existed, they were just a part of the `resourceName` field. Currently, I didn't see any of these child resource ids being parsed, but if we were to ever parse those in the future the previous implementation wouldn't make sense. * Note: Although the regex selects child resource type and names in the last matching group, they are not saved to AzureId struct because they are not being used For Azure, `client.Get(...)` allows you to pass in an optional `expander` parameter which seems to provide additional information for `"instanceView"` and runs an optionally passed in user script (passed in on vm creation) when `"userData"` is passed in. * `"userData"` doesn't seem relevant for us "retrieves the UserData property as part of the VM model view that was provided by the user during the VM Create/Update operation" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP * "The `instanceView` of an Azure Virtual Machine (VM) provides a snapshot of the runtime properties and status of the VM" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view?view=rest-compute-2025-04-01&tabs=HTTP Seems nice, but there's no diff in the json returned by `"instanceView"` and the default when `expander` is not set (referred to `model view` in some places on thier docs) ```go InstanceViewTypesInstanceView InstanceViewTypes = "instanceView" InstanceViewTypesUserData InstanceViewTypes = "userData" ``` IBM PR: #155368 156547: sql: add ColumnGeneratedAsIdentity as a new schema change element r=shghasemi a=shghasemi Previously, GeneratedAsIdentity was part of the Column element. Creating a new schema change element will allow the declarative schema changer to add/remove/update such columns without dropping the Column element. Epic: [CRDB-31283](https://cockroachlabs.atlassian.net/browse/CRDB-31283) Part of: #142918 Release note: None 156953: kvclient/kvcoord: remove stale TODO in Step r=yuzefovich a=yuzefovich We currently have a TODO in Step to add an assertion that stepping is only done on the RootTxn which references a now-closed issue. I did a little bit archeology, and in bd255e5 we consciously removed the assertion like this since stepping on a LeafTxn should be no-op, should just work, and is actually needed behavior for internal executors when they run on remote nodes as part of DistSQL plan. (In connExecutor we currently unconditionally Step the txn on the main query path.) Thus I don't think we want to implement the TODO and should just remove it. Epic: None Release note: None 156987: sql: remove now-stale growstack for internal executor r=yuzefovich a=yuzefovich As of 4a5c503, we now grow stack unconditionally for all async tasks. Some time ago we added an ability to growstack in the async task of the internal executor when used in the LDR, and now that has become redundant, so we remove this special case. Epic: None Release note: None Co-authored-by: William Choe <williamchoe3@gmail.com> Co-authored-by: Shadi Ghasemitaheri <shadi.ghasemitaheri@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3 Informs #146202 Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs * API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance Also added comments to help contextualize some parts that I thought would be helpful while doing some debug. See comment for e.g. vm specs 156504: sql: enable test tenants r=yuzefovich a=yuzefovich **sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests** These were redundant because we have GetRequests in all call sites. **sql: enable test tenants** Some tests have been adjusted to work with test tenants, others have specific issues to investigate further. Fixes: #143114 Epic: CRDB-48945 157018: sql: don't store plan gist as FastValue in context r=yuzefovich a=yuzefovich Given that in 67e45fd we introduced a different way to include plan gists into sentry reports, I don't think we need to include the plan gists into the context anymore, so we can free up the FastValue context slot for a better use. Epic: None Release note: None Co-authored-by: William Choe <williamchoe3@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
|
Build failed (retrying...): |
Informs #146202
Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.
See comment for e.g. vm specs