Oops! That page is private.
Take me homePopular topics
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark)DGX Spark / GB10
More…
Recent topics
cuptiProfilerGetCounterAvailability Causes a Segmentation Fault with Cuda Toolkit 13.0.0 When Using Dynamic Shared LibrariesCUPTI – CUDA Profiler Tools Interface
NVFORTRAN ignores options to compile without AVX2 instructions (SandyBridge processor)nvc, nvc++ and nvfortran
Anyone have a solution for LoRA training of recent MoE models like Qwen3.5-35B-A3B or Gemma-4-26B-A4B *and* successfully running in vLLM?DGX Spark / GB10
Built a Peta-scale out-of-core PyTorch engine on an 8GB laptop GPU that processes a 150GB dataset into 130GB of geometry using inverted batch-streaminCUDA Programming and Performance
cuBLAS batched FP32 SGEMM dispatcher picks suboptimal kernel on RTX 5090 (sm_120)CUDA Programming and Performance
More…