[DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache by yuan-luo · Pull Request #19148 · sgl-project/sglang

yuan-luo · 2026-02-22T09:23:49Z

Motivation

In DeepSeek v3.2, after the Indexer produces key in bf16 (roughly (N, 128)), it needs to populate NSA’s index_k_with_scale_buffer. The previous implementation used two steps:

Quantization: act_quant(key, ...) converts bf16 keys into k_fp8: FP8(E4M3) key bytes (128 dims) and k_scale: per-token FP32 scale (NSA uses one scale for the 128-d block)
Store into cache: token_to_kv_pool.set_index_k_scale_buffer(layer_id, loc, k_fp8, k_scale) writes k_fp8 and k_scale into the paged index_k_with_scale_buffer using out_cache_loc.

This path requires at least two kernel launches (quant + store). Under CUDA Graph / multi-stream execution, launch and sync overhead becomes more noticeable.

This PR is to introduce a JIT-compiled CUDA kernel that fuses quantization and store:
Inside the kernel, it will do:

per-token absmax reduction
compute FP32 scale (max(1e-4, absmax)/FP8_MAX)
FP8(E4M3) quantize + pack
compute page/offset from loc and write K(128B) + scale(4B) into the same paged buffer in one pass

Inspired by @DarkSharpness

Before PR:

After PR:

Performance improved slightly. Will do more testing.

gsm8k no drops.

Server:
➜  sglang_dev3 python3 -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3.2-Exp \
  --trust-remote-code \
  --tp-size 8 --dp-size 8 --enable-dp-attention \
  --tool-call-parser deepseekv31 \
  --reasoning-parser deepseek-v3 \
  --chat-template ./examples/chat_template/tool_chat_template_deepseekv32.jinja

Client:
➜  sglang git:(main) python3 benchmark/gsm8k/bench_sglang.py --num-questions 200 --parallel 128 --num-shots 8 --port 30000
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:19<00:00, 10.46it/s]
Accuracy: 0.975
Invalid: 0.000
Latency: 19.125 s
Output throughput: 1038.949 token/s

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-22T09:24:06Z

Summary of Changes

Hello @yuan-luo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly optimizes the key cache population mechanism within the DeepSeek V3.2 architecture. By consolidating the quantization of BF16 keys and their subsequent storage into a single, JIT-compiled CUDA kernel, the change aims to minimize computational overhead and improve overall inference performance. The implementation includes robust fallback logic, ensuring system stability even if the optimized kernel cannot be utilized.

Highlights

Fused Quantization and Storage Kernel: Introduced a new JIT-compiled CUDA kernel that fuses the quantization of BF16 keys to FP8 and the storage of these keys along with their scales into the NSA index K-cache. This combines what was previously two separate kernel launches into a single, optimized operation.
Performance Optimization: The fusion of quantization and storage into one kernel aims to reduce kernel launch and synchronization overhead, particularly beneficial under CUDA Graph or multi-stream execution, leading to slight performance improvements.
Conditional Kernel Usage with Fallback: The system now conditionally uses the new fused JIT kernel if it can be successfully loaded and specific conditions (CUDA, page_size=64, non-fnuz) are met. Otherwise, it gracefully falls back to the original two-step quantization and storage process.
DeepSeek V3.2 Integration: This change specifically addresses the key population process in DeepSeek V3.2, where BF16 keys are converted and stored in NSA's index_k_with_scale_buffer.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/jit_kernel/csrc/nsa/fused_store_index_cache.cuh
- Added a new CUDA kernel fused_store_indexer_cache to perform per-token absmax reduction, FP32 scale computation, FP8(E4M3) quantization and packing, and direct storage into a paged buffer.
- Included helper device functions for FP8 clipping and packing.
- Defined FusedStoreCacheIndexerKernel to wrap the CUDA kernel for TVM FFI integration, handling tensor verification and kernel launch parameters.
python/sglang/jit_kernel/fused_store_index_cache.py
- Added a new Python module to provide JIT-compiled CUDA kernel wrappers.
- Implemented _jit_nsa_fused_store_module to load the fused_store_indexer_cache CUDA kernel.
- Introduced can_use_nsa_fused_store to check for successful kernel loading and cache the result.
- Provided fused_store_index_k_cache as the main entry point, handling tensor shape normalization, contiguity, and dtype assertions before calling the JIT kernel.
python/sglang/srt/layers/attention/nsa/nsa_indexer.py
- Imported can_use_nsa_fused_store and fused_store_index_k_cache for the new fused kernel functionality.
- Modified _forward_cuda_k_only to conditionally use the new fused_store_index_k_cache or fall back to the original two-step process for key storage.
- Added a new private method _store_index_k_cache to encapsulate the logic for storing NSA indexer K cache, prioritizing the fused JIT kernel under specific conditions.
- Updated forward_cuda to utilize the new _store_index_k_cache method for key storage in both dual-stream and single-stream execution paths, replacing the previous direct act_quant and set_index_k_scale_buffer calls.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

yuan-luo · 2026-02-22T09:24:38Z

/tag-and-rerun-ci

gemini-code-assist

Code Review

This pull request introduces a JIT-compiled CUDA kernel to fuse quantization and storage of the NSA indexer K cache, aiming to reduce kernel launch overhead and improve performance. The changes include a new CUDA kernel, a Python wrapper for JIT compilation, and modifications to nsa_indexer.py to use this new fused kernel.

My review has identified a critical race condition in the new CUDA kernel where multiple threads in a warp attempt to write to the same memory location. I've also found some opportunities for code cleanup by removing unused functions and a redundant buffer fetch. Addressing these points will improve the correctness and maintainability of the new implementation.

yuan-luo · 2026-02-22T10:43:58Z

#11989

yuan-luo · 2026-02-22T12:32:34Z

/rerun-failed-ci

yuan-luo · 2026-02-22T15:01:52Z

/rerun-failed-ci

yuan-luo · 2026-02-23T00:18:41Z

/rerun-failed-ci

yuan-luo · 2026-02-23T01:16:59Z

/rerun-failed-ci

yuan-luo · 2026-02-23T02:24:31Z

/rerun-failed-ci

yuan-luo · 2026-02-23T06:20:06Z

/rerun-failed-ci

yuan-luo · 2026-02-23T08:28:36Z

/rerun-failed-ci

yuan-luo · 2026-02-23T09:05:47Z

/rerun-failed-ci

DarkSharpness · 2026-02-23T09:44:05Z

Can we add PDL support for this kernel? I'm not sure if this will bring performance improvement.

BBuf · 2026-02-23T10:28:36Z

+  /// NOTE: 132 = 128 + 4
+  constexpr int64_t kPageBytes = 132 << kPageBits;
+
+  // each warp handles 128 elements, 1 warp, each block handles multiple rows


Suggested change

// each warp handles 128 elements, 1 warp, each block handles multiple rows

// each warp handles 128 elements, each block handles multiple rows

BBuf · 2026-02-23T10:35:26Z

+
+        # Fast path: JIT fused store (CUDA, page_size=64, non-fnuz)
+        if can_use_nsa_fused_store() and _is_cuda and (not _is_fp8_fnuz):
+            if forward_batch.token_to_kv_pool.page_size == 64:


Can we make those two if code to onlt one if?

BBuf · 2026-02-23T10:36:29Z

+                layer_id=layer_id
+            )
+            fused_store_index_k_cache(key, buf, forward_batch.out_cache_loc)
+        else:


can_use_nsa_fused_store func has a fallback now, why we need another fallback here?

can_use_nsa_fused_store func itself doesn't have fallback.
There are two branches in forward_cuda, both needs a separate fallback:

fast path(seqlen<2048): it skips topk computation and calls _forward_cuda_k_only --> fused_store_index_k_cache

normal path: it conducts topk computation and calls _store_index_k_cache()

yuan-luo · 2026-02-23T13:16:04Z

Can we add PDL support for this kernel? I'm not sure if this will bring performance improvement.

Addressed and refactored code.

Fridge003 · 2026-02-23T14:58:50Z

@yuan-luo Can you please test the result of gpqa and aime25 as shown here: https://docs.sglang.io/basic_usage/deepseek_v32.html#accuracy-test-with-gpqa-diamond

Fridge003 · 2026-02-23T15:02:14Z

Also can you please test on some extreme workloads (e.g. 128k input), to make sure it doesn't crack due to any IMA -like errors (although with int64 out cache loc this shouldn't happen)

Fridge003 · 2026-02-23T16:20:49Z

Can we add a test for this jit kernel

yuan-luo · 2026-02-24T07:09:38Z

@yuan-luo Can you please test the result of gpqa and aime25 as shown here: https://docs.sglang.io/basic_usage/deepseek_v32.html#accuracy-test-with-gpqa-diamond

gpqa result:

➜  sglang_dev3 git:(support_nsa_fuse_store_k_cache) ✗ python3 -m sglang.test.run_eval --port 30000 --eval-name gpqa \
  --num-examples 198 --max-tokens 128000 --repeat 1 \
  --top-p 0.95 --temperature 1.0 --thinking-mode deepseek-v3
ChatCompletionSampler initialized with self.system_message=None self.temperature=1.0 self.max_tokens=128000 self.reasoning_effort=None self.extra_body={'chat_template_kwargs': {'thinking': True}}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 198/198 [08:55<00:00,  2.70s/it]
Total latency: 535.045 s
Score: 0.823
[METRIC] gpqa_score=0.8232323232323232 labels={"model": "deepseek-ai/DeepSeek-V3.2-Exp", "eval": "gpqa"}
[METRIC] gpqa_latency=535.0447993390262 labels={"model": "deepseek-ai/DeepSeek-V3.2-Exp", "eval": "gpqa"}
Writing report to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.html
{'chars': np.float64(1464.7676767676767), 'chars:std': np.float64(345.69935932254305), 'score:std': np.float64(0.38147197173296354), 'score': np.float64(0.8232323232323232)}
Writing results to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.json

yuan-luo · 2026-02-24T09:43:27Z

AIME25 result:

Server:
➜  python git:(support_nsa_fuse_store_k_cache) ✗ python3 -m sglang.launch_server \
  --model-path deepseek-ai/DeepSeek-V3.2-Exp \
  --trust-remote-code \
  --tp-size 8 --dp-size 8 --enable-dp-attention \
  --tool-call-parser deepseekv31 \
  --reasoning-parser deepseek-v3 \
  --chat-template ../examples/chat_template/tool_chat_template_deepseekv32.jinja

Client:
➜  bench_script bash bench_aime.sh
[09:08:03] WARNING  Cluster config is not specified. Running locally without containers. Only a subset of features is supported and you're responsible for installing any required dependencies. It's recommended to run `ns setup`  cluster.py:354
                    to define appropriate configs!
[W 2026-02-24T09:08:03.794] Cluster config is not specified. Running locally without containers. Only a subset of features is supported and you're responsible for installing any required dependencies. It's recommended to run `ns setup` to define appropriate configs!
[09:08:03] INFO     Optional environment variable GEMINI_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable OPENAI_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable NVIDIA_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable AZURE_OPENAI_API_KEY not found in user environment; skipping.                                                                                                                      cluster.py:255
           INFO     Adding optional environment variable HF_TOKEN from default factory                                                                                                                                               cluster.py:251
           INFO     Optional environment variable NGC_API_KEY not found in user environment; skipping.                                                                                                                               cluster.py:255
           INFO     Optional environment variable WANDB_API_KEY not found in user environment; skipping.                                                                                                                             cluster.py:255
           INFO     Adding a task with commands:                                                                                                                                                                                         exp.py:523
           INFO     Adding optional environment variable NEMO_SKILLS_SANDBOX_PORT from cluster config                                                                                                                                cluster.py:232
           INFO     Not running from a git repo, trying to upload installed package. Make sure there are no extra files in /usr/local/lib/python3.12/dist-packages/nemo_skills/*                                                    packager.py:203
           INFO     Main command(s): export HYDRA_FULL_ERROR=1 && export PYTHONPATH=$PYTHONPATH:/nemo_run/code && cd /nemo_run/code && python -m nemo_skills.dataset.prepare  aime25 --parallelism 20 --retries 3                        exp.py:604
nemo-run/0 Preparing aime25 (attempt 1/4)
Starting AIME25 evaluation with model deepseek-ai/DeepSeek-V3.2-Exp on port 30000 using backend sglang...
[09:08:17] INFO     Starting evaluation job                                                                                                                                                                                             eval.py:475
           INFO     Extra arguments that will be passed to the underlying script: ++chat_template_kwargs.thinking=true ++inference.temperature=1.0 ++inference.top_p=0.95 ++inference.tokens_to_generate=64000                          eval.py:476
           WARNING  Cluster config is not specified. Running locally without containers. Only a subset of features is supported and you're responsible for installing any required dependencies. It's recommended to run `ns setup`  cluster.py:354
                    to define appropriate configs!
[W 2026-02-24T09:08:17.329] Cluster config is not specified. Running locally without containers. Only a subset of features is supported and you're responsible for installing any required dependencies. It's recommended to run `ns setup` to define appropriate configs!
           INFO     Optional environment variable GEMINI_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable NGC_API_KEY not found in user environment; skipping.                                                                                                                               cluster.py:255
           INFO     Adding optional environment variable HF_TOKEN from default factory                                                                                                                                               cluster.py:251
           INFO     Optional environment variable NVIDIA_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable OPENAI_API_KEY not found in user environment; skipping.                                                                                                                            cluster.py:255
           INFO     Optional environment variable WANDB_API_KEY not found in user environment; skipping.                                                                                                                             cluster.py:255
           INFO     Optional environment variable AZURE_OPENAI_API_KEY not found in user environment; skipping.                                                                                                                      cluster.py:255
[09:08:17] INFO     Adding a task with commands:                                                                                                                                                                                         exp.py:523
           INFO     Adding optional environment variable NEMO_SKILLS_SANDBOX_PORT from cluster config                                                                                                                                cluster.py:232
           INFO     Not running from a git repo, trying to upload installed package. Make sure there are no extra files in /usr/local/lib/python3.12/dist-packages/nemo_skills/*                                                    packager.py:203
           INFO     Main command(s): export PYTHONPATH=$PYTHONPATH:/nemo_run/code && cd /nemo_run/code && (  export HYDRA_FULL_ERROR=1 && python -m nemo_skills.inference.generate  ++skip_filled=True                                   exp.py:604
                    ++input_file=/nemo_run/code/nemo_skills/dataset/aime25/test.jsonl ++output_file=nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs0.jsonl     ++inference.random_seed=0
                    ++inference.temperature=0.7     ++inference.top_k=-1     ++inference.top_p=0.95   ++prompt_config=generic/math ++eval_type=math ++eval_config.split=test ++server.base_url=http://localhost:30000/v1
                    ++server.model=deepseek-ai/DeepSeek-V3.2-Exp ++server.server_type=sglang ++chat_template_kwargs.thinking=true ++inference.temperature=1.0 ++inference.top_p=0.95 ++inference.tokens_to_generate=64000  && touch
                    nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs0.jsonl.done   ) & (  export HYDRA_FULL_ERROR=1 && python -m nemo_skills.inference.generate  ++skip_filled=True
                    ++input_file=/nemo_run/code/nemo_skills/dataset/aime25/test.jsonl ++output_file=nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs1.jsonl     ++inference.random_seed=1
                    ++inference.temperature=0.7     ++inference.top_k=-1     ++inference.top_p=0.95   ++prompt_config=generic/math ++eval_type=math ++eval_config.split=test ++server.base_url=http://localhost:30000/v1
                    ++server.model=deepseek-ai/DeepSeek-V3.2-Exp ++server.server_type=sglang ++chat_template_kwargs.thinking=true ++inference.temperature=1.0 ++inference.top_p=0.95 ++inference.tokens_to_generate=64000  && touch
                    nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs1.jsonl.done   ) & (  export HYDRA_FULL_ERROR=1 && python -m nemo_skills.inference.generate  ++skip_filled=True
                    ++input_file=/nemo_run/code/nemo_skills/dataset/aime25/test.jsonl ++output_file=nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs2.jsonl     ++inference.random_seed=2
                    ++inference.temperature=0.7     ++inference.top_k=-1     ++inference.top_p=0.95   ++prompt_config=generic/math ++eval_type=math ++eval_config.split=test ++server.base_url=http://localhost:30000/v1
                    ++server.model=deepseek-ai/DeepSeek-V3.2-Exp ++server.server_type=sglang ++chat_template_kwargs.thinking=true ++inference.temperature=1.0 ++inference.top_p=0.95 ++inference.tokens_to_generate=64000  && touch
                    nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs2.jsonl.done   ) & (  export HYDRA_FULL_ERROR=1 && python -m nemo_skills.inference.generate  ++skip_filled=True
                    ++input_file=/nemo_run/code/nemo_skills/dataset/aime25/test.jsonl ++output_file=nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs3.jsonl     ++inference.random_seed=3
                    ++inference.temperature=0.7     ++inference.top_k=-1     ++inference.top_p=0.95   ++prompt_config=generic/math ++eval_type=math ++eval_config.split=test ++server.base_url=http://localhost:30000/v1
                    ++server.model=deepseek-ai/DeepSeek-V3.2-Exp ++server.server_type=sglang ++chat_template_kwargs.thinking=true ++inference.temperature=1.0 ++inference.top_p=0.95 ++inference.tokens_to_generate=64000  && touch
                    nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs3.jsonl.done   ) & wait
           INFO     Adding a task with commands:                                                                                                                                                                                         exp.py:523
           INFO     Not running from a git repo, trying to upload installed package. Make sure there are no extra files in /usr/local/lib/python3.12/dist-packages/nemo_skills/*                                                    packager.py:203
           INFO     Main command(s): python -m nemo_skills.pipeline.summarize_results nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results     --benchmarks aime25     --save_metrics_path                            exp.py:604
                    nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/metrics.json     --metric_type=math  --wandb_project=nemo-skills
nemo-run/0 2026-02-24 09:08:24 INFO  Config used: GenerationTaskConfig(input_file='/usr/local/lib/python3.12/dist-packages/nemo_skills/dataset/aime25/test.jsonl', output_file='nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs2.jsonl', prompt_config='generic/math', use_completions_api=False, tokenizer=None, chat_template_kwargs={'thinking': True}, prompt_format='ns', prompt_suffix='', system_message=None, code_tags=None, examples_type=None, server={'base_url': 'http://localhost:30000/v1', 'model': 'deepseek-ai/DeepSeek-V3.2-Exp', 'server_type': 'sglang'}, sandbox={}, wait_for_sandbox=False, start_assistant_response_key=None, inference=InferenceConfig(endpoint_type=<EndpointType.chat: 'chat'>, temperature=1.0, top_k=-1, top_p=0.95, min_p=0.0, random_seed=2, tokens_to_generate=64000, repetition_penalty=1.0, top_logprobs=None, timeout=14400, reasoning_effort=None, extra_body={}), max_samples=-1, skip_filled=True, max_concurrent_requests=512, num_chunks=None, chunk_id=None, add_generation_stats=True, count_prompt_tokens=False, generation_key='generation', async_position_key='_async_position', dry_run=False, code_execution=False, total_code_executions_in_prompt=None, override_max_code_executions=False, stop_phrase=None, parallel_thinking=ParallelThinkingConfig(temperature=0.6, tokens_to_generate=None, parse_reasoning=False, parse_reasoning_solutions=True, end_reasoning_string='</think>', endpoint_type=<EndpointType.chat: 'chat'>, tokenizer=None, chat_template_kwargs={}, start_assistant_response_key=None, count_prompt_tokens=False, mode=None, genselect=GenSelectSpecificConfig(prompt_config='generic/genselect', regex='Judg[e]?ment: (\\d+)'), gensynthesis=GenSynthesisSpecificConfig(prompt_config='generic/gensynthesis', regex='<NEW_SOLUTION>(.*?)</NEW_SOLUTION>'), solution_length_cap=16384, window_size=8, solution_key='generation', filter_incomplete_solutions=True, generation_dir=None, num_initial_solutions=None), tool_modules=None, tool_overrides={}, schema_overrides={}, max_tool_calls=-1, parse_reasoning=False, end_reasoning_string='</think>', enable_litellm_cache=False, drop_content_types=['audio_url', 'input_audio'], enable_audio=False, enable_audio_chunking=True, audio_chunk_task_types=None, chunk_audio_threshold_sec=30, eval_type='math', eval_config={'split': 'test'}, structured_output=None)
nemo-run/0 2026-02-24 09:08:24 INFO  Prompt used: PromptConfig(user='Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{{}}.\n\n{examples}{problem}', system=None, code_tags=None, few_shot_examples=FewShotExamplesConfig(prefix='Here are some examples of problems and solutions you can refer to.\n\n', template='Problem:\n{problem}\n\nSolution:\n{solution}\n\n\n\n\n\n', suffix='Here is the problem you need to solve:\n', examples_type=None, retrieval_field=None, retrieval_file=None, retrieved_entries=10, retrieved_few_shots=5, randomize_retrieved_entries=False, max_retrieved_chars=100000000, max_retrieved_chars_field='reference_solution', retriever=None), image_field=None, image_position='before')
nemo-run/0 Waiting for the server to start at http://localhost:30000/v1
nemo-run/0 2026-02-24 09:08:24 INFO  Evaluator supports per-datapoint evals, will interleave evaluation with generation.
nemo-run/0 2026-02-24 09:08:24 INFO  Async loop is maintaining 512 generations in parallel. Use max_concurrent_requests to control the number of concurrent requests.
nemo-run/0 2026-02-24 09:08:24 WARNING  File `nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs2.jsonl-async` not found, starting from scratch
nemo-run/0 2026-02-24 09:08:24 INFO  Example prompt:
nemo-run/0 Data dictionary: {'id': 'aime25-0', 'problem': 'Find the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$', 'expected_answer': '70', 'reference_solution': 'This means that  $a(b+7)=9b+7$  where  $a$  is a natural number. Rearranging we get  $(a-9)(b+7)=-56$ . Since  $b>9$ ,  $b=49,21$ . Thus the answer is  $49+21=\\boxed{70}$', '_async_position': 0}
nemo-run/0 Prompt: [{'role': 'user', 'content': 'Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\nFind the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$'}]
nemo-run/0 Remaining generations:   0%|          | 0/30 [00:00<?, ?it/s]nemo-run/0 2026-02-24 09:08:24 INFO  Config used: GenerationTaskConfig(input_file='/usr/local/lib/python3.12/dist-packages/nemo_skills/dataset/aime25/test.jsonl', output_file='nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs1.jsonl', prompt_config='generic/math', use_completions_api=False, tokenizer=None, chat_template_kwargs={'thinking': True}, prompt_format='ns', prompt_suffix='', system_message=None, code_tags=None, examples_type=None, server={'base_url': 'http://localhost:30000/v1', 'model': 'deepseek-ai/DeepSeek-V3.2-Exp', 'server_type': 'sglang'}, sandbox={}, wait_for_sandbox=False, start_assistant_response_key=None, inference=InferenceConfig(endpoint_type=<EndpointType.chat: 'chat'>, temperature=1.0, top_k=-1, top_p=0.95, min_p=0.0, random_seed=1, tokens_to_generate=64000, repetition_penalty=1.0, top_logprobs=None, timeout=14400, reasoning_effort=None, extra_body={}), max_samples=-1, skip_filled=True, max_concurrent_requests=512, num_chunks=None, chunk_id=None, add_generation_stats=True, count_prompt_tokens=False, generation_key='generation', async_position_key='_async_position', dry_run=False, code_execution=False, total_code_executions_in_prompt=None, override_max_code_executions=False, stop_phrase=None, parallel_thinking=ParallelThinkingConfig(temperature=0.6, tokens_to_generate=None, parse_reasoning=False, parse_reasoning_solutions=True, end_reasoning_string='</think>', endpoint_type=<EndpointType.chat: 'chat'>, tokenizer=None, chat_template_kwargs={}, start_assistant_response_key=None, count_prompt_tokens=False, mode=None, genselect=GenSelectSpecificConfig(prompt_config='generic/genselect', regex='Judg[e]?ment: (\\d+)'), gensynthesis=GenSynthesisSpecificConfig(prompt_config='generic/gensynthesis', regex='<NEW_SOLUTION>(.*?)</NEW_SOLUTION>'), solution_length_cap=16384, window_size=8, solution_key='generation', filter_incomplete_solutions=True, generation_dir=None, num_initial_solutions=None), tool_modules=None, tool_overrides={}, schema_overrides={}, max_tool_calls=-1, parse_reasoning=False, end_reasoning_string='</think>', enable_litellm_cache=False, drop_content_types=['audio_url', 'input_audio'], enable_audio=False, enable_audio_chunking=True, audio_chunk_task_types=None, chunk_audio_threshold_sec=30, eval_type='math', eval_config={'split': 'test'}, structured_output=None)
nemo-run/0 2026-02-24 09:08:24 INFO  Prompt used: PromptConfig(user='Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{{}}.\n\n{examples}{problem}', system=None, code_tags=None, few_shot_examples=FewShotExamplesConfig(prefix='Here are some examples of problems and solutions you can refer to.\n\n', template='Problem:\n{problem}\n\nSolution:\n{solution}\n\n\n\n\n\n', suffix='Here is the problem you need to solve:\n', examples_type=None, retrieval_field=None, retrieval_file=None, retrieved_entries=10, retrieved_few_shots=5, randomize_retrieved_entries=False, max_retrieved_chars=100000000, max_retrieved_chars_field='reference_solution', retriever=None), image_field=None, image_position='before')
nemo-run/0 Waiting for the server to start at http://localhost:30000/v1
nemo-run/0 Waiting for the server to start at http://localhost:30000/v1
nemo-run/0 2026-02-24 09:08:24 INFO  Config used: GenerationTaskConfig(input_file='/usr/local/lib/python3.12/dist-packages/nemo_skills/dataset/aime25/test.jsonl', output_file='nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs3.jsonl', prompt_config='generic/math', use_completions_api=False, tokenizer=None, chat_template_kwargs={'thinking': True}, prompt_format='ns', prompt_suffix='', system_message=None, code_tags=None, examples_type=None, server={'base_url': 'http://localhost:30000/v1', 'model': 'deepseek-ai/DeepSeek-V3.2-Exp', 'server_type': 'sglang'}, sandbox={}, wait_for_sandbox=False, start_assistant_response_key=None, inference=InferenceConfig(endpoint_type=<EndpointType.chat: 'chat'>, temperature=1.0, top_k=-1, top_p=0.95, min_p=0.0, random_seed=3, tokens_to_generate=64000, repetition_penalty=1.0, top_logprobs=None, timeout=14400, reasoning_effort=None, extra_body={}), max_samples=-1, skip_filled=True, max_concurrent_requests=512, num_chunks=None, chunk_id=None, add_generation_stats=True, count_prompt_tokens=False, generation_key='generation', async_position_key='_async_position', dry_run=False, code_execution=False, total_code_executions_in_prompt=None, override_max_code_executions=False, stop_phrase=None, parallel_thinking=ParallelThinkingConfig(temperature=0.6, tokens_to_generate=None, parse_reasoning=False, parse_reasoning_solutions=True, end_reasoning_string='</think>', endpoint_type=<EndpointType.chat: 'chat'>, tokenizer=None, chat_template_kwargs={}, start_assistant_response_key=None, count_prompt_tokens=False, mode=None, genselect=GenSelectSpecificConfig(prompt_config='generic/genselect', regex='Judg[e]?ment: (\\d+)'), gensynthesis=GenSynthesisSpecificConfig(prompt_config='generic/gensynthesis', regex='<NEW_SOLUTION>(.*?)</NEW_SOLUTION>'), solution_length_cap=16384, window_size=8, solution_key='generation', filter_incomplete_solutions=True, generation_dir=None, num_initial_solutions=None), tool_modules=None, tool_overrides={}, schema_overrides={}, max_tool_calls=-1, parse_reasoning=False, end_reasoning_string='</think>', enable_litellm_cache=False, drop_content_types=['audio_url', 'input_audio'], enable_audio=False, enable_audio_chunking=True, audio_chunk_task_types=None, chunk_audio_threshold_sec=30, eval_type='math', eval_config={'split': 'test'}, structured_output=None)
nemo-run/0 2026-02-24 09:08:24 INFO  Prompt used: PromptConfig(user='Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{{}}.\n\n{examples}{problem}', system=None, code_tags=None, few_shot_examples=FewShotExamplesConfig(prefix='Here are some examples of problems and solutions you can refer to.\n\n', template='Problem:\n{problem}\n\nSolution:\n{solution}\n\n\n\n\n\n', suffix='Here is the problem you need to solve:\n', examples_type=None, retrieval_field=None, retrieval_file=None, retrieved_entries=10, retrieved_few_shots=5, randomize_retrieved_entries=False, max_retrieved_chars=100000000, max_retrieved_chars_field='reference_solution', retriever=None), image_field=None, image_position='before')
nemo-run/0 2026-02-24 09:08:24 INFO  Evaluator supports per-datapoint evals, will interleave evaluation with generation.
nemo-run/0 2026-02-24 09:08:24 INFO  Async loop is maintaining 512 generations in parallel. Use max_concurrent_requests to control the number of concurrent requests.
nemo-run/0 2026-02-24 09:08:24 WARNING  File `nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs1.jsonl-async` not found, starting from scratch
nemo-run/0 2026-02-24 09:08:24 INFO  Example prompt:
nemo-run/0 Data dictionary: {'id': 'aime25-0', 'problem': 'Find the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$', 'expected_answer': '70', 'reference_solution': 'This means that  $a(b+7)=9b+7$  where  $a$  is a natural number. Rearranging we get  $(a-9)(b+7)=-56$ . Since  $b>9$ ,  $b=49,21$ . Thus the answer is  $49+21=\\boxed{70}$', '_async_position': 0}
nemo-run/0 Prompt: [{'role': 'user', 'content': 'Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\nFind the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$'}]
nemo-run/0 Remaining generations:   0%|          | 0/30 [00:00<?, ?it/s]2026-02-24 09:08:24 INFO  Evaluator supports per-datapoint evals, will interleave evaluation with generation.
nemo-run/0 2026-02-24 09:08:24 INFO  Async loop is maintaining 512 generations in parallel. Use max_concurrent_requests to control the number of concurrent requests.
nemo-run/0 2026-02-24 09:08:24 WARNING  File `nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs3.jsonl-async` not found, starting from scratch
nemo-run/0 2026-02-24 09:08:24 INFO  Example prompt:
nemo-run/0 Data dictionary: {'id': 'aime25-0', 'problem': 'Find the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$', 'expected_answer': '70', 'reference_solution': 'This means that  $a(b+7)=9b+7$  where  $a$  is a natural number. Rearranging we get  $(a-9)(b+7)=-56$ . Since  $b>9$ ,  $b=49,21$ . Thus the answer is  $49+21=\\boxed{70}$', '_async_position': 0}
nemo-run/0 Prompt: [{'role': 'user', 'content': 'Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\nFind the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$'}]
nemo-run/0 Remaining generations:   0%|          | 0/30 [00:00<?, ?it/s]nemo-run/0 2026-02-24 09:08:24 INFO  Config used: GenerationTaskConfig(input_file='/usr/local/lib/python3.12/dist-packages/nemo_skills/dataset/aime25/test.jsonl', output_file='nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs0.jsonl', prompt_config='generic/math', use_completions_api=False, tokenizer=None, chat_template_kwargs={'thinking': True}, prompt_format='ns', prompt_suffix='', system_message=None, code_tags=None, examples_type=None, server={'base_url': 'http://localhost:30000/v1', 'model': 'deepseek-ai/DeepSeek-V3.2-Exp', 'server_type': 'sglang'}, sandbox={}, wait_for_sandbox=False, start_assistant_response_key=None, inference=InferenceConfig(endpoint_type=<EndpointType.chat: 'chat'>, temperature=1.0, top_k=-1, top_p=0.95, min_p=0.0, random_seed=0, tokens_to_generate=64000, repetition_penalty=1.0, top_logprobs=None, timeout=14400, reasoning_effort=None, extra_body={}), max_samples=-1, skip_filled=True, max_concurrent_requests=512, num_chunks=None, chunk_id=None, add_generation_stats=True, count_prompt_tokens=False, generation_key='generation', async_position_key='_async_position', dry_run=False, code_execution=False, total_code_executions_in_prompt=None, override_max_code_executions=False, stop_phrase=None, parallel_thinking=ParallelThinkingConfig(temperature=0.6, tokens_to_generate=None, parse_reasoning=False, parse_reasoning_solutions=True, end_reasoning_string='</think>', endpoint_type=<EndpointType.chat: 'chat'>, tokenizer=None, chat_template_kwargs={}, start_assistant_response_key=None, count_prompt_tokens=False, mode=None, genselect=GenSelectSpecificConfig(prompt_config='generic/genselect', regex='Judg[e]?ment: (\\d+)'), gensynthesis=GenSynthesisSpecificConfig(prompt_config='generic/gensynthesis', regex='<NEW_SOLUTION>(.*?)</NEW_SOLUTION>'), solution_length_cap=16384, window_size=8, solution_key='generation', filter_incomplete_solutions=True, generation_dir=None, num_initial_solutions=None), tool_modules=None, tool_overrides={}, schema_overrides={}, max_tool_calls=-1, parse_reasoning=False, end_reasoning_string='</think>', enable_litellm_cache=False, drop_content_types=['audio_url', 'input_audio'], enable_audio=False, enable_audio_chunking=True, audio_chunk_task_types=None, chunk_audio_threshold_sec=30, eval_type='math', eval_config={'split': 'test'}, structured_output=None)
nemo-run/0 2026-02-24 09:08:24 INFO  Prompt used: PromptConfig(user='Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{{}}.\n\n{examples}{problem}', system=None, code_tags=None, few_shot_examples=FewShotExamplesConfig(prefix='Here are some examples of problems and solutions you can refer to.\n\n', template='Problem:\n{problem}\n\nSolution:\n{solution}\n\n\n\n\n\n', suffix='Here is the problem you need to solve:\n', examples_type=None, retrieval_field=None, retrieval_file=None, retrieved_entries=10, retrieved_few_shots=5, randomize_retrieved_entries=False, max_retrieved_chars=100000000, max_retrieved_chars_field='reference_solution', retriever=None), image_field=None, image_position='before')
nemo-run/0 Waiting for the server to start at http://localhost:30000/v1
nemo-run/0 2026-02-24 09:08:25 INFO  Evaluator supports per-datapoint evals, will interleave evaluation with generation.
nemo-run/0 2026-02-24 09:08:25 INFO  Async loop is maintaining 512 generations in parallel. Use max_concurrent_requests to control the number of concurrent requests.
nemo-run/0 2026-02-24 09:08:25 WARNING  File `nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/output-rs0.jsonl-async` not found, starting from scratch
nemo-run/0 2026-02-24 09:08:25 INFO  Example prompt:
nemo-run/0 Data dictionary: {'id': 'aime25-0', 'problem': 'Find the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$', 'expected_answer': '70', 'reference_solution': 'This means that  $a(b+7)=9b+7$  where  $a$  is a natural number. Rearranging we get  $(a-9)(b+7)=-56$ . Since  $b>9$ ,  $b=49,21$ . Thus the answer is  $49+21=\\boxed{70}$', '_async_position': 0}
nemo-run/0 Prompt: [{'role': 'user', 'content': 'Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\nFind the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$'}]
nemo-run/0 Remaining generations:  77%|███████▋  | 23/30 [11:10<07:08, 61.18s/it]

nemo-run/0 Remaining generations: 100%|██████████| 30/30 [22:02<00:00, 44.08s/it]
nemo-run/0 Remaining generations: 100%|██████████| 30/30 [24:10<00:00, 48.34s/it]]
nemo-run/0 Remaining generations: 100%|██████████| 30/30 [24:49<00:00, 49.66s/it]
nemo-run/0 Remaining generations: 100%|██████████| 30/30 [26:36<00:00, 53.20s/it]
nemo-run_1/0 ---------------------------------------- aime25 ----------------------------------------
nemo-run_1/0 evaluation_mode  | num_entries | avg_tokens | gen_seconds | symbolic_correct | no_answer
nemo-run_1/0 pass@1[avg-of-4] | 30          | 14033      | 1596        | 87.50% ± 3.19%   | 0.00%
nemo-run_1/0 majority@4       | 30          | 14033      | 1596        | 90.00%           | 0.00%
nemo-run_1/0 pass@4           | 30          | 14033      | 1596        | 93.33%           | 0.00%
nemo-run_1/0
nemo-run_1/0
nemo-run_1/0 Metrics are saved to nemo_skills_aime25_dsv32-fp8_output_sglang_20260224_090809/eval-results/aime25/metrics.json

yuan-luo · 2026-02-24T09:43:41Z

Can we add a test for this jit kernel

WIP.

Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>

yuan-luo · 2026-02-26T03:32:50Z

Can we add a test for this jit kernel

@Fridge003 #19389

…gl-project#19148) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>

yuan-luo requested review from BBuf, DarkSharpness, Fridge003, HaiShaw, hlu1 and hubertlu-tw as code owners February 22, 2026 09:23

yuan-luo changed the title ~~[WIP][DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache~~ [DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache Feb 22, 2026

github-actions Bot added the run-ci label Feb 22, 2026

gemini-code-assist Bot reviewed Feb 22, 2026

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/nsa/fused_store_index_cache.cuh

Comment thread python/sglang/jit_kernel/csrc/nsa/fused_store_index_cache.cuh Outdated

Comment thread python/sglang/srt/layers/attention/nsa/nsa_indexer.py Outdated

yuan-luo requested a review from ispobock February 22, 2026 09:27

yuan-luo changed the title ~~[DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache~~ [WIP][DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache Feb 22, 2026

yuan-luo force-pushed the support_nsa_fuse_store_k_cache branch from 9c6ddb2 to 75bca33 Compare February 22, 2026 10:23

yuan-luo changed the title ~~[WIP][DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache~~ [DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache Feb 22, 2026

yuan-luo mentioned this pull request Feb 22, 2026

[Feature] NSA optimization roadmap #11989

Closed

yuan-luo force-pushed the support_nsa_fuse_store_k_cache branch 2 times, most recently from 055d29f to 781acbd Compare February 22, 2026 13:07

BBuf reviewed Feb 23, 2026

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/nsa/fused_store_index_cache.cuh

BBuf reviewed Feb 23, 2026

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/nsa/fused_store_index_cache.cuh

BBuf approved these changes Feb 23, 2026

View reviewed changes

yuan-luo force-pushed the support_nsa_fuse_store_k_cache branch from f63bef3 to c4d59ad Compare February 23, 2026 14:21

Fridge003 mentioned this pull request Feb 23, 2026

[Roadmap] DeepSeek v3.2 (GLM 5) Optimization #15025

Open

40 tasks

Fridge003 reviewed Feb 23, 2026

View reviewed changes

Comment thread python/sglang/srt/layers/attention/nsa/nsa_indexer.py Outdated

DarkSharpness approved these changes Feb 24, 2026

View reviewed changes

yuan-luo force-pushed the support_nsa_fuse_store_k_cache branch from c4d59ad to 350c76d Compare February 25, 2026 01:23

luoyuan.luo and others added 2 commits February 26, 2026 10:07

Support fuse nsa store k cache

bff046f

Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>

refactor code based on review comments

8f6a1f3

yuan-luo force-pushed the support_nsa_fuse_store_k_cache branch from 350c76d to 8f6a1f3 Compare February 26, 2026 02:09

yuan-luo requested a review from kkHuang-amd as a code owner February 26, 2026 02:09

BBuf merged commit 4e843f1 into sgl-project:main Feb 26, 2026
31 of 93 checks passed

yuan-luo mentioned this pull request Feb 26, 2026

[JIT-kernel] Add unit test for nsa indexer fused_store_k_cache #19389

Merged

5 tasks

zhaotyer mentioned this pull request Apr 14, 2026

[Bug] HiCache causes NCCL timeout in TP mode (TP=16) #20859

Closed

5 tasks

	// each warp handles 128 elements, 1 warp, each block handles multiple rows
	// each warp handles 128 elements, each block handles multiple rows

Conversation

yuan-luo commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

yuan-luo commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuan-luo commented Feb 22, 2026

Uh oh!

yuan-luo commented Feb 22, 2026

Uh oh!

yuan-luo commented Feb 22, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

DarkSharpness commented Feb 23, 2026

Uh oh!

Uh oh!

BBuf Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuan-luo Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuan-luo Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuan-luo Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuan-luo commented Feb 23, 2026

Uh oh!

Fridge003 commented Feb 23, 2026

Uh oh!

Fridge003 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Feb 23, 2026

Uh oh!

yuan-luo commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuan-luo commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuan-luo commented Feb 22, 2026 •

edited

Loading

Fridge003 commented Feb 23, 2026 •

edited

Loading

yuan-luo commented Feb 24, 2026 •

edited

Loading

yuan-luo commented Feb 24, 2026 •

edited

Loading