Skip to content

roachprod: implement GetVMSpecs for IBM#155368

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
williamchoe3:wchoe/146202-roachprod-ibm-vm-specs
Nov 6, 2025
Merged

roachprod: implement GetVMSpecs for IBM#155368
craig[bot] merged 1 commit intocockroachdb:masterfrom
williamchoe3:wchoe/146202-roachprod-ibm-vm-specs

Conversation

@williamchoe3
Copy link
Copy Markdown
Contributor

@williamchoe3 williamchoe3 commented Oct 14, 2025

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@williamchoe3
Copy link
Copy Markdown
Contributor Author

williamchoe3 commented Oct 14, 2025

e.g. willchoe-1760458579-02-n1cpu4-0001.json

{
  "availability_policy": {
    "host_failure": "restart"
  },
  "bandwidth": 8000,
  "boot_volume_attachment": {
    "device": {
      "id": "02v7-0074dc4f-7979-4a3c-92c0-e6725319774f-22zbf"
    },
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/volume_attachments/02v7-0074dc4f-7979-4a3c-92c0-e6725319774f",
    "id": "02v7-0074dc4f-7979-4a3c-92c0-e6725319774f",
    "name": "tricking-saffron-gerbil-parting",
    "volume": {
      "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::volume:r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/volumes/r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
      "id": "r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
      "name": "willchoe-1760458579-02-n1cpu4-0001-boot",
      "resource_type": "volume"
    }
  },
  "cluster_network_attachments": [],
  "confidential_compute_mode": "disabled",
  "created_at": "2025-10-14T16:19:50.000Z",
  "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::instance:02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d",
  "disks": [],
  "enable_secure_boot": false,
  "health_reasons": [],
  "health_state": "ok",
  "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d",
  "id": "02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d",
  "image": {
    "crn": "crn:v1:bluemix:public:is:br-sao:a/811f8abfbd32425597dc7ba40da98fa6::image:r042-03b53508-46fb-40af-8506-95da8f5f9c5a",
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/images/r042-03b53508-46fb-40af-8506-95da8f5f9c5a",
    "id": "r042-03b53508-46fb-40af-8506-95da8f5f9c5a",
    "name": "ibm-ubuntu-22-04-5-minimal-s390x-5",
    "resource_type": "image"
  },
  "lifecycle_reasons": [],
  "lifecycle_state": "stable",
  "memory": 16,
  "metadata_service": {
    "enabled": true,
    "protocol": "http",
    "response_hop_limit": 1
  },
  "name": "willchoe-1760458579-02-n1cpu4-0001",
  "network_attachments": [
    {
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/network_attachments/02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
      "id": "02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
      "name": "eth0",
      "primary_ip": {
        "address": "10.250.133.16",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740/reserved_ips/02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
        "id": "02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
        "name": "dispersed-climatic-thaw-parsley",
        "resource_type": "subnet_reserved_ip"
      },
      "resource_type": "instance_network_attachment",
      "subnet": {
        "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::subnet:02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "id": "02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "name": "subnet-roachprod-br-sao-3",
        "resource_type": "subnet"
      },
      "virtual_network_interface": {
        "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::virtual-network-interface:02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/virtual_network_interfaces/02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
        "id": "02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
        "name": "magnetism-broadcast-acting-reptilian",
        "resource_type": "virtual_network_interface"
      }
    }
  ],
  "network_interfaces": [
    {
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/network_interfaces/02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
      "id": "02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
      "name": "eth0",
      "primary_ip": {
        "address": "10.250.133.16",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740/reserved_ips/02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
        "id": "02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
        "name": "dispersed-climatic-thaw-parsley",
        "resource_type": "subnet_reserved_ip"
      },
      "resource_type": "network_interface",
      "subnet": {
        "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::subnet:02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "id": "02v7-07f44e65-6e2b-4962-9360-f02467524740",
        "name": "subnet-roachprod-br-sao-3",
        "resource_type": "subnet"
      }
    }
  ],
  "primary_network_attachment": {
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/network_attachments/02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
    "id": "02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
    "name": "eth0",
    "primary_ip": {
      "address": "10.250.133.16",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740/reserved_ips/02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
      "id": "02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
      "name": "dispersed-climatic-thaw-parsley",
      "resource_type": "subnet_reserved_ip"
    },
    "resource_type": "instance_network_attachment",
    "subnet": {
      "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::subnet:02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "id": "02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "name": "subnet-roachprod-br-sao-3",
      "resource_type": "subnet"
    },
    "virtual_network_interface": {
      "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::virtual-network-interface:02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/virtual_network_interfaces/02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
      "id": "02v7-ea511e0b-da2d-4e37-92f0-a0f64cdda37f",
      "name": "magnetism-broadcast-acting-reptilian",
      "resource_type": "virtual_network_interface"
    }
  },
  "primary_network_interface": {
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/network_interfaces/02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
    "id": "02v7-9721af2e-3f14-4c38-b36b-ee732e6c772d",
    "name": "eth0",
    "primary_ip": {
      "address": "10.250.133.16",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740/reserved_ips/02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
      "id": "02v7-fa4a4156-5b05-46df-b78f-6e951836b20f",
      "name": "dispersed-climatic-thaw-parsley",
      "resource_type": "subnet_reserved_ip"
    },
    "resource_type": "network_interface",
    "subnet": {
      "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::subnet:02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/subnets/02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "id": "02v7-07f44e65-6e2b-4962-9360-f02467524740",
      "name": "subnet-roachprod-br-sao-3",
      "resource_type": "subnet"
    }
  },
  "profile": {
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/instance/profiles/bz2-4x16",
    "name": "bz2-4x16",
    "resource_type": "instance_profile"
  },
  "reservation_affinity": {
    "policy": "automatic",
    "pool": []
  },
  "resource_group": {
    "href": "https://resource-controller.cloud.ibm.com/v2/resource_groups/133a1bb0ffe3489db6fa2ecac5576b7a",
    "id": "133a1bb0ffe3489db6fa2ecac5576b7a",
    "name": "unknown"
  },
  "resource_type": "instance",
  "startable": true,
  "status": "running",
  "status_reasons": [],
  "total_network_bandwidth": 6000,
  "total_volume_bandwidth": 2000,
  "vcpu": {
    "architecture": "s390x",
    "count": 4,
    "manufacturer": "ibm"
  },
  "volume_attachments": [
    {
      "device": {
        "id": "02v7-0074dc4f-7979-4a3c-92c0-e6725319774f-22zbf"
      },
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/volume_attachments/02v7-0074dc4f-7979-4a3c-92c0-e6725319774f",
      "id": "02v7-0074dc4f-7979-4a3c-92c0-e6725319774f",
      "name": "tricking-saffron-gerbil-parting",
      "volume": {
        "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::volume:r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/volumes/r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
        "id": "r042-e68c8131-263f-47b9-b3b8-062c4b2449eb",
        "name": "willchoe-1760458579-02-n1cpu4-0001-boot",
        "resource_type": "volume"
      }
    },
    {
      "device": {
        "id": "02v7-28607ffd-4017-4a91-825b-ec2c422129c9-pzl7r"
      },
      "href": "https://br-sao.iaas.cloud.ibm.com/v1/instances/02v7_cb88ae46-4e8b-409b-9e7e-27e24987fc3d/volume_attachments/02v7-28607ffd-4017-4a91-825b-ec2c422129c9",
      "id": "02v7-28607ffd-4017-4a91-825b-ec2c422129c9",
      "name": "headphone-uncorrupt-acrobat-tidings",
      "volume": {
        "crn": "crn:v1:bluemix:public:is:br-sao-3:a/5bec889150cb494b9f952290a8144f36::volume:r042-a4112b6b-5667-4f28-9510-73f1d283109b",
        "href": "https://br-sao.iaas.cloud.ibm.com/v1/volumes/r042-a4112b6b-5667-4f28-9510-73f1d283109b",
        "id": "r042-a4112b6b-5667-4f28-9510-73f1d283109b",
        "name": "willchoe-1760458579-02-n1cpu4-0001-data-0000",
        "resource_type": "volume"
      }
    }
  ],
  "vpc": {
    "crn": "crn:v1:bluemix:public:is:br-sao:a/5bec889150cb494b9f952290a8144f36::vpc:r042-f912ad6e-eb86-4f7e-9a0d-0c7f44318139",
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/vpcs/r042-f912ad6e-eb86-4f7e-9a0d-0c7f44318139",
    "id": "r042-f912ad6e-eb86-4f7e-9a0d-0c7f44318139",
    "name": "vpc-roachprod-br-sao",
    "resource_type": "vpc"
  },
  "zone": {
    "href": "https://br-sao.iaas.cloud.ibm.com/v1/regions/br-sao/zones/br-sao-3",
    "name": "br-sao-3"
  }
}

@williamchoe3 williamchoe3 changed the title roachprod: implement vmSpecs() for IBM roachprod: implement GetVMSpecs() for IBM Oct 14, 2025
@williamchoe3 williamchoe3 changed the title roachprod: implement GetVMSpecs() for IBM roachprod: implement GetVMSpecs for IBM Oct 14, 2025
@williamchoe3 williamchoe3 force-pushed the wchoe/146202-roachprod-ibm-vm-specs branch 2 times, most recently from 93622ea to 01cc611 Compare October 14, 2025 18:29
@williamchoe3 williamchoe3 marked this pull request as ready for review October 14, 2025 18:32
@williamchoe3 williamchoe3 requested a review from a team as a code owner October 14, 2025 18:32
@williamchoe3 williamchoe3 requested review from DarrylWong and golgeek and removed request for a team October 14, 2025 18:32

// GetVMSpecs implements the vm.GetVMSpecs interface method which returns a
// map from VM.Name to a map of VM attributes
func (p *Provider) GetVMSpecs(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think relying on a global OperationTimeout is kind of a bad pattern anyway. Instead it should just take in a context which is passed in by FetchVMSpecs; it already creates one with a 5 minute timeout that isn't respected by this method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though it's not necessarily incompatible as if you context.WithTimeout() on a context that already has a timeout, it ends up honoring the lowest one.

FYI, I've started added some context passing in there for the List() paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't notice the outer context in the roachtest caller. With OperationTimeout was just reusing the pattern I saw in Azure's provider impl https://github.com/cockroachdb/cockroach/blob/0345413bde1ca99f79a806be00871283169b8bf6/pkg/roachprod/vm/azure/azure.go

So the better approach is to reimplement all the provider methods with a *WithContext version like ListWithContext https://github.com/cockroachdb/cockroach/pull/154893/files#diff-ca2fc7d620174ca0b2964983db21a35a21515be7e850d5ce6f944f9421cfd1fbR808

We can do that as a part of the roachprod centralized changes? In the mean time i can adjust this OperationTimeout to just match what roachprod currently sets?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we can (and will have to) fully implement context later on.

Though, since you're in there, you should probably already implement GetVMSpecsWithContext(ctx context.Context, l *logger.Logger, vms vm.List) and satisfy the current interface with:

func (p *Provider) GetVMSpecs(l *logger.Logger, vms vm.List) (map[string]map[string]interface{}, error) {
    return `GetVmSpecsWithContext(context.Background(), l, vms)
}

Copy link
Copy Markdown
Contributor

@golgeek golgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, otherwise looks good!


// Extract the spec of all VMs and create a map from VM name to spec.
vmSpecs := make(map[string]map[string]interface{})
for _, vmInstance := range vms {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could be worth it to move this to a ctxgroup.WithContext (pkg/util/ctxgroup) to parallelize the instance calls, potentially with a hard limit of 4 or 5 on the error group to avoid spiking too many go routines.

Did you try it with a big-ish number of VMs? Was it fast?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seemed fast in the context of an entire roachtest run, but i can time how it is serially with like 10

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 sec serial vs 1 sec for parallel (limit 5) for 10 nodes
will go with the ctxgroup

test-teardown: 2025/11/04 22:09:45 provider.go:898: Getting specs for 10 VMs serially
test-teardown: 2025/11/04 22:09:45 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0001
test-teardown: 2025/11/04 22:09:46 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0002
test-teardown: 2025/11/04 22:09:47 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0003
test-teardown: 2025/11/04 22:09:47 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0004
test-teardown: 2025/11/04 22:09:48 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0005
test-teardown: 2025/11/04 22:09:48 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0006
test-teardown: 2025/11/04 22:09:49 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0007
test-teardown: 2025/11/04 22:09:49 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0008
test-teardown: 2025/11/04 22:09:50 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0009
test-teardown: 2025/11/04 22:09:50 provider.go:913: Getting specs for instance willchoe-1762293720-01-n10cpu4-0010
test-teardown: 2025/11/04 22:09:51 provider.go:930: Finished getting specs for 10 VMs

parallel

test-teardown: 2025/11/04 22:43:15 provider.go:899: Getting specs for 10 VMs serially
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0005
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0001
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0002
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0004
test-teardown: 2025/11/04 22:43:15 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0003
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0006
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0007
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0008
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0009
test-teardown: 2025/11/04 22:43:16 provider.go:919: Getting specs for instance willchoe-1762295754-01-n10cpu4-0010
test-teardown: 2025/11/04 22:43:16 provider.go:974: Finished getting specs for 10 VMs


// GetVMSpecs implements the vm.GetVMSpecs interface method which returns a
// map from VM.Name to a map of VM attributes
func (p *Provider) GetVMSpecs(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we can (and will have to) fully implement context later on.

Though, since you're in there, you should probably already implement GetVMSpecsWithContext(ctx context.Context, l *logger.Logger, vms vm.List) and satisfy the current interface with:

func (p *Provider) GetVMSpecs(l *logger.Logger, vms vm.List) (map[string]map[string]interface{}, error) {
    return `GetVmSpecsWithContext(context.Background(), l, vms)
}

@williamchoe3
Copy link
Copy Markdown
Contributor Author

Ran Local single node, multi node passes

Will run a smoke with ALWAYS_COLLECT_ARTIFACTS=true, I think i need to make a change here though https://github.com/cockroachdb/cockroach/blob/5bf1f0dffc17493fc1d2c9e89c6178bc62c09734/build/teamcity/cockroach/nightlies/roachtest_nightly_ibm.sh so i can pass the env var / flag through TC

Also just ripped out the operation level timeout, although i did add a 1 min timeout per goroutine, left a TODO to pass in a context when the interface get's updated

@golgeek @DarrylWong could you take another pass when you get a chance, mainly just the ctxgroup usage

@williamchoe3 williamchoe3 force-pushed the wchoe/146202-roachprod-ibm-vm-specs branch from ba9e94d to 3063197 Compare November 5, 2025 19:27
@williamchoe3
Copy link
Copy Markdown
Contributor Author

Need this one to go through before I can smoke test on CI #156958

@williamchoe3
Copy link
Copy Markdown
Contributor Author

@williamchoe3 williamchoe3 force-pushed the wchoe/146202-roachprod-ibm-vm-specs branch from 1529406 to 77e63ce Compare November 6, 2025 15:27
) (map[string]map[string]interface{}, error) {
// TODO replace background context with passed in context after *WithContext
// interface methods are implemented for vm.Provider
vmSpecs, err := p.GetVMSpecsWithContext(context.Background(), l, vms)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return p.GetVMSpecsWithContext(...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o ya oops thx

Implementation for previously unimplemented interface method `GetVMSpecs()` for
IBM. This will allow for cluster information to be fetched in roachtest via
`FetchVMSpecs`.

Informs cockroachdb#146202

Epic: None
Release note: None
@williamchoe3 williamchoe3 force-pushed the wchoe/146202-roachprod-ibm-vm-specs branch from 77e63ce to 3c0cfae Compare November 6, 2025 15:42
@williamchoe3
Copy link
Copy Markdown
Contributor Author

tftr!

bors r=DarrylWong,golgeek

craig bot pushed a commit that referenced this pull request Nov 6, 2025
154467: roachprod: azure GetVMSpecs() implementation r=DarrylWong a=williamchoe3

Informs #146202 

Implementation for previously unimplemented interface method `GetVMSpecs()` for Azure. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs`

Also changed `azureIDPattern` to match the optional type name pairs that may exist in a fully qualified azure resource id. See comments for more details.  Previously, if these optional type name pairs existed, they were just a part of the `resourceName` field. Currently, I didn't see any of these child resource ids being parsed, but if we were to ever parse those in the future the previous implementation wouldn't make sense.
* Note: Although the regex selects child resource type and names in the last matching group, they are not saved to AzureId struct because they are not being used

For Azure, `client.Get(...)` allows you to pass in an optional `expander` parameter which seems to provide additional information for `"instanceView"` and runs an optionally passed in user script (passed in on vm creation) when `"userData"` is passed in.
*  `"userData"` doesn't seem relevant for us "retrieves the UserData property as part of the VM model view that was provided by the user during the VM Create/Update operation" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP
  *  "The `instanceView` of an Azure Virtual Machine (VM) provides a snapshot of the runtime properties and status of the VM" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view?view=rest-compute-2025-04-01&tabs=HTTP Seems nice, but there's no diff in the json returned by `"instanceView"` and the default when `expander` is not set (referred to `model view` in some places on thier docs) 
```go
InstanceViewTypesInstanceView InstanceViewTypes = "instanceView"  
InstanceViewTypesUserData InstanceViewTypes = "userData"
```


IBM PR: #155368


155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
* API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance 

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs


156504: sql: enable test tenants r=yuzefovich a=yuzefovich

**sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests**

These were redundant because we have GetRequests in all call sites.

**sql: enable test tenants**

Some tests have been adjusted to work with test tenants, others have specific issues to investigate further.

Fixes: #143114
Epic: CRDB-48945

156547: sql: add ColumnGeneratedAsIdentity as a new schema change element r=shghasemi a=shghasemi

Previously, GeneratedAsIdentity was part of the Column element.
Creating a new schema change element will allow the declarative schema changer
to add/remove/update such columns without dropping the Column element.

Epic: [CRDB-31283](https://cockroachlabs.atlassian.net/browse/CRDB-31283)

Part of: #142918

Release note: None

156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

156987: sql: remove now-stale growstack for internal executor r=yuzefovich a=yuzefovich

As of 4a5c503, we now grow stack unconditionally for all async tasks. Some time ago we added an ability to growstack in the async task of the internal executor when used in the LDR, and now that has become redundant, so we remove this special case.

Epic: None
Release note: None

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Shadi Ghasemitaheri <shadi.ghasemitaheri@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 6, 2025

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Nov 6, 2025
155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
* API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance 

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs


156504: sql: enable test tenants r=yuzefovich a=yuzefovich

**sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests**

These were redundant because we have GetRequests in all call sites.

**sql: enable test tenants**

Some tests have been adjusted to work with test tenants, others have specific issues to investigate further.

Fixes: #143114
Epic: CRDB-48945

156940: util/taskset: add generic work distribution mechanism r=spilchen a=spilchen

Adds a new taskset package that provides task coordination for a number of workers. TaskSet hands out integer identifiers (TaskIDs) that workers claim and process, with the caller responsible for mapping TaskIDs to actual work (file indices, key ranges, batches, etc.).

Features round-robin initial distribution across workers and locality when getting a task ID. Not thread-safe; callers provide external synchronization.

This will be used by the new distributed merge pipeline.

Closes #156578

Epic: CRDB-48845
Release note: none

Co-authored by: `@jeffswenson`

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 6, 2025

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Nov 6, 2025
154467: roachprod: azure GetVMSpecs() implementation r=DarrylWong a=williamchoe3

Informs #146202 

Implementation for previously unimplemented interface method `GetVMSpecs()` for Azure. This will allow for cluster information to be fetched in roachtest via `FetchVMSpecs`

Also changed `azureIDPattern` to match the optional type name pairs that may exist in a fully qualified azure resource id. See comments for more details.  Previously, if these optional type name pairs existed, they were just a part of the `resourceName` field. Currently, I didn't see any of these child resource ids being parsed, but if we were to ever parse those in the future the previous implementation wouldn't make sense.
* Note: Although the regex selects child resource type and names in the last matching group, they are not saved to AzureId struct because they are not being used

For Azure, `client.Get(...)` allows you to pass in an optional `expander` parameter which seems to provide additional information for `"instanceView"` and runs an optionally passed in user script (passed in on vm creation) when `"userData"` is passed in.
*  `"userData"` doesn't seem relevant for us "retrieves the UserData property as part of the VM model view that was provided by the user during the VM Create/Update operation" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP
  *  "The `instanceView` of an Azure Virtual Machine (VM) provides a snapshot of the runtime properties and status of the VM" https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view?view=rest-compute-2025-04-01&tabs=HTTP Seems nice, but there's no diff in the json returned by `"instanceView"` and the default when `expander` is not set (referred to `model view` in some places on thier docs) 
```go
InstanceViewTypesInstanceView InstanceViewTypes = "instanceView"  
InstanceViewTypesUserData InstanceViewTypes = "userData"
```


IBM PR: #155368


156547: sql: add ColumnGeneratedAsIdentity as a new schema change element r=shghasemi a=shghasemi

Previously, GeneratedAsIdentity was part of the Column element.
Creating a new schema change element will allow the declarative schema changer
to add/remove/update such columns without dropping the Column element.

Epic: [CRDB-31283](https://cockroachlabs.atlassian.net/browse/CRDB-31283)

Part of: #142918

Release note: None

156953: kvclient/kvcoord: remove stale TODO in Step r=yuzefovich a=yuzefovich

We currently have a TODO in Step to add an assertion that stepping is only done on the RootTxn which references a now-closed issue. I did a little bit archeology, and in bd255e5 we consciously removed the assertion like this since stepping on a LeafTxn should be no-op, should just work, and is actually needed behavior for internal executors when they run on remote nodes as part of DistSQL plan. (In connExecutor we currently unconditionally Step the txn on the main query path.) Thus I don't think we want to implement the TODO and should just remove it.

Epic: None
Release note: None

156987: sql: remove now-stale growstack for internal executor r=yuzefovich a=yuzefovich

As of 4a5c503, we now grow stack unconditionally for all async tasks. Some time ago we added an ability to growstack in the async task of the internal executor when used in the LDR, and now that has become redundant, so we remove this special case.

Epic: None
Release note: None

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Shadi Ghasemitaheri <shadi.ghasemitaheri@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Nov 6, 2025
155368: roachprod: implement GetVMSpecs for IBM r=DarrylWong,golgeek a=williamchoe3

Informs #146202

Implementation for previously unimplemented interface method GetVMSpecs() for IBM. This will allow for cluster information to be fetched in roachtest via FetchVMSpecs
* API: https://cloud.ibm.com/apidocs/resource-controller/resource-controller?code=go#get-resource-instance 

Also added comments to help contextualize some parts that I thought would be helpful while doing some debug.

See comment for e.g. vm specs


156504: sql: enable test tenants r=yuzefovich a=yuzefovich

**sql: remove some redundant kv.TestingIsRangeLookupRequest calls in tests**

These were redundant because we have GetRequests in all call sites.

**sql: enable test tenants**

Some tests have been adjusted to work with test tenants, others have specific issues to investigate further.

Fixes: #143114
Epic: CRDB-48945

157018: sql: don't store plan gist as FastValue in context r=yuzefovich a=yuzefovich

Given that in 67e45fd we introduced a different way to include plan gists into sentry reports, I don't think we need to include the plan gists into the context anymore, so we can free up the FastValue context slot for a better use.

Epic: None
Release note: None

Co-authored-by: William Choe <williamchoe3@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 6, 2025

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 6, 2025

@craig craig bot merged commit 82eca8e into cockroachdb:master Nov 6, 2025
32 of 34 checks passed
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 20, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 21, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Nov 21, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Dec 1, 2025
rishabh7m added a commit to crltest-issue-syncing/cockroach-fork that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants