Skip to content

Fix GPSampler crash when default torch device is CUDA#6397

Merged
kAIto47802 merged 1 commit intooptuna:masterfrom
Quant-Quasar:fix/issue-6113-gp-sampler-gpu
Jan 7, 2026
Merged

Fix GPSampler crash when default torch device is CUDA#6397
kAIto47802 merged 1 commit intooptuna:masterfrom
Quant-Quasar:fix/issue-6113-gp-sampler-gpu

Conversation

@Quant-Quasar
Copy link
Copy Markdown
Contributor

@Quant-Quasar Quant-Quasar commented Dec 21, 2025

Related to #6113

Motivation

When users set torch.set_default_device("cuda"), GPSampler internal tensors are created directly on the GPU. The internal Gaussian Process implementation currently attempts to call .numpy() directly on these tensors (e.g., in _cache_matrix).

This raises:
TypeError: can't convert cuda:0 device type tensor to numpy

The reporter (#6113) identified this issue 6 months ago but did not submit a PR.

Description of the changes

I have modified optuna/_gp/gp.py to ensure robust handling of GPU-resident tensors.

  • Replaced .detach().numpy() with .detach().cpu().numpy() in 5 critical locations:
    • GPRegressor.length_scales
    • GPRegressor._cache_matrix (twice)
    • GPRegressor._fit_kernel_params
    • loss_func

This change safely moves tensors to host memory before conversion. It is a no-op for tensors already on the CPU, ensuring zero regression for standard use cases.

Verification:

  • Created a local reproduction script with torch.set_default_device("cuda") and confirmed the crash is resolved.
  • Ran existing GP regression tests (pytest tests/gp_tests/test_gp.py and tests/samplers_tests/test_gp.py), achieving 51/51 passes.

@not522
Copy link
Copy Markdown
Member

not522 commented Dec 24, 2025

Thanks for your PR! The changes look good, but I'm still checking to make sure all issues have been resolved.

@not522
Copy link
Copy Markdown
Member

not522 commented Dec 24, 2025

@kAIto47802 Could you review this PR?

@not522
Copy link
Copy Markdown
Member

not522 commented Dec 26, 2025

I think this PR can be merged, but please give us some time to confirm. Since we're approaching the year-end and New Year holidays, we'll review the details after New Year's Day.

@Quant-Quasar
Copy link
Copy Markdown
Contributor Author

Thank you for the update! I completely understand the timing around the year-end holidays.
Please take your time, and I’m happy to address any feedback or follow up after New Year’s.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 4, 2026

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Jan 4, 2026
@c-bata c-bata removed the stale Exempt from stale bot labeling. label Jan 5, 2026
Copy link
Copy Markdown
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for waiting. LGTM!

@not522 not522 removed their assignment Jan 6, 2026
Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. LGTM.

@kAIto47802 kAIto47802 merged commit c6d2d19 into optuna:master Jan 7, 2026
14 checks passed
@not522 not522 added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jan 14, 2026
@not522 not522 added this to the v4.7.0 milestone Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants