threads_per_core=2 produces incorrect topology in cloud.conf

*Describe the bug*
When advanced_machine_features.threads_per_core is set to 2 (SMT enabled), the auto-generated cloud.conf has incorrect socket/core/thread topology. Total CPUs are correct but topology breakdown is wrong, breaking CPU affinity and NUMA-aware scheduling.

Root cause: `util.py` line 2030 in `template_machine_conf()` hardcodes `machine_conf.threads_per_core = 1`. The `getThreadsPerCore()` helper is called but only used for the CPU divisor, never assigned to the machine config.

*Steps to reproduce*

  1. Create a nodeset with SMT enabled:
  ```yaml
  - id: compute_node
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
    settings:
      machine_type: n2d-highmem-32
      advanced_machine_features:
        threads_per_core: 2
```
  2. Deploy cluster
  3. Compare cloud.conf NodeName line with slurmd -C output on a compute node

*Expected behavior*

cloud.conf topology should match slurmd -C:
  Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 CPUs=32

*Actual behavior*

cloud.conf generates:
  Boards=1 SocketsPerBoard=1 CoresPerSocket=32 ThreadsPerCore=1 CPUs=32

Impact: task/affinity CPU binding broken, NUMA scheduling wrong (1×32 vs real 2×8×2), CR_Core_Memory can schedule two jobs on the same physical core.

  Version (gcluster --version)

  v1.90.0 (built from main branch, commit 8fb2919fd) and likely all before.

  Blueprint

```yaml
  - id: compute_node
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
    settings:
      machine_type: n2d-highmem-32
      advanced_machine_features:
        threads_per_core: 2
```

  Output and logs

cloud.conf (generated):
NodeName=slurm0-computenode-[0-1] State=CLOUD RealMemory=254064 Boards=1 SocketsPerBoard=1 CoresPerSocket=32 ThreadsPerCore=1 CPUs=32

slurmd -C (actual):
NodeName=slurm0-computenode-0 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=257414

  Execution environment

  - OS: Rocky Linux 8 (HPC image)
  - Machine type: n2d-highmem-32
  
*Additional context*

  Workaround: override via node_conf in the blueprint:
  node_conf:
    SocketsPerBoard: 2
    CoresPerSocket: 8
    ThreadsPerCore: 2

The fix in `util.py` `template_machine_conf()` would be to use `getThreadsPerCore(template)` instead of hardcoding 1, and derive `cores_per_socket accounting` for the thread count.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threads_per_core=2 produces incorrect topology in cloud.conf #5668

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

threads_per_core=2 produces incorrect topology in cloud.conf #5668

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions