[Bug] Qwen3.5 does not work with pipeline parallelism

### Checklist

- [ ] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Describe the bug

I tried using pipeline parallel size = 2 to run the qwen3.5 model, but unfortunately it failed. The log is as follows:
```
root@mccxadmin:/data/059/sglang# CUDA_VISIBLE_DEVICES=6,7 python -m sglang.launch_server --model /data/Qwen3.5-35B-A3B/ --pp-size 2 
[2026-02-27 19:23:48] WARNING server_args.py:1777: Disabling overlap schedule since mamba no_buffer is not compatible with overlap schedule, try to use --disable-radix-cache if overlap schedule is necessary
[2026-02-27 19:23:48] INFO server_args.py:1864: Attention backend not specified. Use fa3 backend by default.
[2026-02-27 19:23:48] WARNING server_args.py:2312: Pipeline parallelism is incompatible with overlap schedule.
[2026-02-27 19:23:49] server_args=ServerArgs(model_path='/data/Qwen3.5-35B-A3B/', tokenizer_path='/data/Qwen3.5-35B-A3B/', tokenizer_mode='auto', tokenizer_worker_num=1, skip_tokenizer_init=False, load_format='auto', model_loader_extra_config='{}', trust_remote_code=False, context_length=None, is_embedding=False, enable_multimodal=None, revision=None, model_impl='auto', host='127.0.0.1', port=30000, fastapi_root_path='', grpc_mode=False, skip_server_warmup=False, warmups=None, nccl_port=None, checkpoint_engine_wait_weights_before_ready=False, dtype='auto', quantization=None, quantization_param_path=None, kv_cache_dtype='auto', enable_fp32_lm_head=False, modelopt_quant=None, modelopt_checkpoint_restore_path=None, modelopt_checkpoint_save_path=None, modelopt_export_path=None, quantize_and_serve=False, rl_quant_profile=None, mem_fraction_static=0.83783765625, max_running_requests=None, max_queued_requests=None, max_total_tokens=None, chunked_prefill_size=8192, enable_dynamic_chunking=False, max_prefill_tokens=16384, prefill_max_requests=None, schedule_policy='fcfs', enable_priority_scheduling=False, abort_on_priority_when_disabled=False, schedule_low_priority_values_first=False, priority_scheduling_preemption_threshold=10, schedule_conservativeness=1.0, page_size=1, swa_full_tokens_ratio=0.8, disable_hybrid_swa_memory=False, radix_eviction_policy='lru', enable_prefill_delayer=False, prefill_delayer_max_delay_passes=30, prefill_delayer_token_usage_low_watermark=None, prefill_delayer_forward_passes_buckets=None, prefill_delayer_wait_seconds_buckets=None, device='cuda', tp_size=1, pp_size=2, pp_max_micro_batch_size=None, pp_async_batch_depth=0, stream_interval=1, stream_output=False, random_seed=185116399, constrained_json_whitespace_pattern=None, constrained_json_disable_any_whitespace=False, watchdog_timeout=300, soft_watchdog_timeout=None, dist_timeout=None, download_dir=None, model_checksum=None, base_gpu_id=0, gpu_id_step=1, sleep_on_idle=False, custom_sigquit_handler=None, log_level='info', log_level_http=None, log_requests=False, log_requests_level=2, log_requests_format='text', log_requests_target=None, uvicorn_access_log_exclude_prefixes=[], crash_dump_folder=None, show_time_cost=False, enable_metrics=False, enable_metrics_for_all_schedulers=False, tokenizer_metrics_custom_labels_header='x-custom-labels', tokenizer_metrics_allowed_custom_labels=None, extra_metric_labels=None, bucket_time_to_first_token=None, bucket_inter_token_latency=None, bucket_e2e_request_latency=None, collect_tokens_histogram=False, prompt_tokens_buckets=None, generation_tokens_buckets=None, gc_warning_threshold_secs=0.0, decode_log_interval=40, enable_request_time_stats_logging=False, kv_events_config=None, enable_trace=False, otlp_traces_endpoint='localhost:4317', export_metrics_to_file=False, export_metrics_to_file_dir=None, api_key=None, admin_api_key=None, served_model_name='/data/Qwen3.5-35B-A3B/', weight_version='default', chat_template=None, hf_chat_template_name=None, completion_template=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, tool_call_parser=None, tool_server=None, sampling_defaults='model', dp_size=1, load_balance_method='round_robin', attn_cp_size=1, moe_dp_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', preferred_sampling_params=None, enable_lora=None, enable_lora_overlap_loading=None, max_lora_rank=None, lora_target_modules=None, lora_paths=None, max_loaded_loras=None, max_loras_per_batch=8, lora_eviction_policy='lru', lora_backend='csgmv', max_lora_chunk_size=16, attention_backend='fa3', decode_attention_backend=None, prefill_attention_backend=None, sampling_backend='flashinfer', grammar_backend='xgrammar', mm_attention_backend=None, fp8_gemm_runner_backend='auto', fp4_gemm_runner_backend='flashinfer_cutlass', nsa_prefill_backend=None, nsa_decode_backend=None, disable_flashinfer_autotune=False, mamba_backend='triton', speculative_algorithm=None, speculative_draft_model_path=None, speculative_draft_model_revision=None, speculative_draft_load_format=None, speculative_num_steps=None, speculative_eagle_topk=None, speculative_num_draft_tokens=None, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, speculative_attention_mode='prefill', speculative_draft_attention_backend=None, speculative_moe_runner_backend='auto', speculative_moe_a2a_backend=None, speculative_draft_model_quantization=None, speculative_ngram_min_match_window_size=1, speculative_ngram_max_match_window_size=12, speculative_ngram_min_bfs_breadth=1, speculative_ngram_max_bfs_breadth=10, speculative_ngram_match_type='BFS', speculative_ngram_branch_length=18, speculative_ngram_capacity=10000000, enable_multi_layer_eagle=False, ep_size=1, moe_a2a_backend='none', moe_runner_backend='auto', flashinfer_mxfp4_moe_precision='default', enable_flashinfer_allreduce_fusion=False, enable_aiter_allreduce_fusion=False, deepep_mode='auto', ep_num_redundant_experts=0, ep_dispatch_algorithm=None, init_expert_location='trivial', enable_eplb=False, eplb_algorithm='auto', eplb_rebalance_num_iterations=1000, eplb_rebalance_layers_per_chunk=None, eplb_min_rebalancing_utilization_threshold=1.0, expert_distribution_recorder_mode=None, expert_distribution_recorder_buffer_size=1000, enable_expert_distribution_metrics=False, deepep_config=None, moe_dense_tp_size=None, elastic_ep_backend=None, enable_elastic_expert_backup=False, mooncake_ib_device=None, max_mamba_cache_size=None, mamba_ssm_dtype=None, mamba_full_memory_ratio=0.9, mamba_scheduler_strategy='no_buffer', mamba_track_interval=256, linear_attn_backend='triton', linear_attn_decode_backend=None, linear_attn_prefill_backend=None, enable_hierarchical_cache=False, hicache_ratio=2.0, hicache_size=0, hicache_write_policy='write_through', hicache_io_backend='kernel', hicache_mem_layout='layer_first', disable_hicache_numa_detect=False, hicache_storage_backend=None, hicache_storage_prefetch_policy='best_effort', hicache_storage_backend_extra_config=None, hierarchical_sparse_attention_extra_config=None, enable_lmcache=False, kt_weight_path=None, kt_method='AMXINT4', kt_cpuinfer=None, kt_threadpool_count=2, kt_num_gpu_experts=None, kt_max_deferred_experts_per_token=None, dllm_algorithm=None, dllm_algorithm_config=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, cpu_offload_gb=0, offload_group_size=-1, offload_num_in_group=1, offload_prefetch_step=1, offload_mode='cpu', multi_item_scoring_delimiter=None, disable_radix_cache=False, cuda_graph_max_bs=256, cuda_graph_bs=[1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256], disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_profile_cuda_graph=False, enable_cudagraph_gc=False, enable_layerwise_nvtx_marker=False, enable_nccl_nvls=False, enable_symm_mem=False, disable_flashinfer_cutlass_moe_fp4_allgather=False, enable_tokenizer_batch_encode=False, disable_tokenizer_batch_decode=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, enable_mscclpp=False, enable_torch_symm_mem=False, disable_overlap_schedule=True, enable_mixed_chunk=False, enable_dp_attention=False, enable_dp_lm_head=False, enable_two_batch_overlap=False, enable_single_batch_overlap=False, tbo_token_distribution_threshold=0.48, enable_torch_compile=False, enable_piecewise_cuda_graph=False, enable_torch_compile_debug_mode=False, torch_compile_max_bs=32, piecewise_cuda_graph_max_tokens=8192, piecewise_cuda_graph_tokens=[4, 8, 12, 16, 20, 24, 28, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 832, 896, 960, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328, 3584, 3840, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 8192], piecewise_cuda_graph_compiler='eager', torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, triton_attention_split_tile_size=None, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, enable_weights_cpu_backup=False, enable_draft_weights_cpu_backup=False, allow_auto_truncate=False, enable_custom_logit_processor=False, flashinfer_mla_disable_ragged=False, disable_shared_experts_fusion=False, disable_chunked_prefix_cache=False, disable_fast_image_processor=False, keep_mm_feature_on_device=False, enable_return_hidden_states=False, enable_return_routed_experts=False, scheduler_recv_interval=1, numa_node=None, enable_deterministic_inference=False, rl_on_policy_target=None, enable_attn_tp_input_scattered=False, enable_nsa_prefill_context_parallel=False, nsa_prefill_cp_mode='round-robin-split', enable_fused_qk_norm_rope=False, enable_precise_embedding_interpolation=False, enable_dynamic_batch_tokenizer=False, dynamic_batch_tokenizer_batch_size=32, dynamic_batch_tokenizer_batch_timeout=0.002, debug_tensor_dump_output_folder=None, debug_tensor_dump_layers=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False, disaggregation_mode='null', disaggregation_transfer_backend='mooncake', disaggregation_bootstrap_port=8998, disaggregation_decode_tp=None, disaggregation_decode_dp=None, disaggregation_prefill_pp=1, disaggregation_ib_device=None, disaggregation_decode_enable_offload_kvcache=False, num_reserved_decode_tokens=512, disaggregation_decode_polling_interval=1, encoder_only=False, language_only=False, encoder_transfer_backend='zmq_to_scheduler', encoder_urls=[], custom_weight_loader=[], weight_loader_disable_mmap=False, remote_instance_weight_loader_seed_instance_ip=None, remote_instance_weight_loader_seed_instance_service_port=None, remote_instance_weight_loader_send_weights_group_ports=None, remote_instance_weight_loader_backend='nccl', remote_instance_weight_loader_start_seed_via_transfer_engine=False, enable_pdmux=False, pdmux_config_path=None, sm_group_num=8, mm_max_concurrent_calls=32, mm_per_request_timeout=10.0, enable_broadcast_mm_inputs_process=False, enable_prefix_mm_cache=False, mm_enable_dp_encoder=False, mm_process_config={}, limit_mm_data_per_request=None, enable_mm_global_cache=False, decrypted_config_file=None, decrypted_draft_config_file=None, forward_hooks=None)
[2026-02-27 19:23:49] Ignore import error when loading sglang.srt.multimodal.processors.glmasr: cannot import name 'GlmAsrConfig' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/__init__.py)
[2026-02-27 19:23:51] Using default HuggingFace chat template with detected content format: openai
[2026-02-27 19:23:57 PP0] Mamba selective_state_update backend initialized: triton
[2026-02-27 19:23:57 PP0] Init torch distributed begin.
[2026-02-27 19:23:57 PP1] Mamba selective_state_update backend initialized: triton
[2026-02-27 19:23:57 PP1] Init torch distributed begin.
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[2026-02-27 19:23:57 PP0] sglang is using nccl==2.27.5
[2026-02-27 19:23:58 PP0] Init torch distributed ends. elapsed=1.32 s, mem usage=0.97 GB
[2026-02-27 19:23:58 PP1] Init torch distributed ends. elapsed=1.18 s, mem usage=0.97 GB
[2026-02-27 19:23:58 PP0] Ignore import error when loading sglang.srt.models.glm_ocr: No module named 'transformers.models.glm_ocr'
[2026-02-27 19:23:58 PP0] Ignore import error when loading sglang.srt.models.glm_ocr_nextn: No module named 'transformers.models.glm_ocr'
[2026-02-27 19:23:58 PP0] Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/__init__.py)
[2026-02-27 19:23:58 PP1] Ignore import error when loading sglang.srt.models.glm_ocr: No module named 'transformers.models.glm_ocr'
[2026-02-27 19:23:58 PP1] Ignore import error when loading sglang.srt.models.glm_ocr_nextn: No module named 'transformers.models.glm_ocr'
[2026-02-27 19:23:58 PP1] Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/__init__.py)
[2026-02-27 19:23:59 PP0] Load weight begin. avail mem=138.29 GB
[2026-02-27 19:23:59 PP1] Load weight begin. avail mem=138.26 GB
[2026-02-27 19:23:59 PP0] Multimodal attention backend not set. Use fa3.
[2026-02-27 19:23:59 PP0] Using fa3 as multimodal attention backend.
[2026-02-27 19:23:59 PP1] Multimodal attention backend not set. Use fa3.
[2026-02-27 19:23:59 PP1] Using fa3 as multimodal attention backend.
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
[2026-02-27 19:23:59 PP0] using attn output gate!
[2026-02-27 19:23:59 PP1] using attn output gate!
Loading safetensors checkpoint shards:   0% Completed | 0/14 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   7% Completed | 1/14 [00:01<00:19,  1.51s/it]
Loading safetensors checkpoint shards:  14% Completed | 2/14 [00:03<00:19,  1.63s/it]
Loading safetensors checkpoint shards:  21% Completed | 3/14 [00:04<00:18,  1.69s/it]
Loading safetensors checkpoint shards:  29% Completed | 4/14 [00:06<00:16,  1.69s/it]
Loading safetensors checkpoint shards:  36% Completed | 5/14 [00:08<00:15,  1.72s/it]
Loading safetensors checkpoint shards:  43% Completed | 6/14 [00:10<00:14,  1.76s/it]
[2026-02-27 19:24:09 PP0] Parameter model.norm.weight not found in params_dict
[2026-02-27 19:24:10 PP1] Parameter visual.pos_embed.weight not found in params_dict
Loading safetensors checkpoint shards:  50% Completed | 7/14 [00:10<00:09,  1.35s/it]
Loading safetensors checkpoint shards:  57% Completed | 8/14 [00:12<00:08,  1.39s/it]
[2026-02-27 19:24:11 PP0] Parameter lm_head.weight not found in params_dict
[2026-02-27 19:24:12 PP1] Parameter model.embed_tokens.weight not found in params_dict
Loading safetensors checkpoint shards:  64% Completed | 9/14 [00:13<00:06,  1.38s/it]
Loading safetensors checkpoint shards:  71% Completed | 10/14 [00:15<00:05,  1.43s/it]
Loading safetensors checkpoint shards:  79% Completed | 11/14 [00:16<00:04,  1.53s/it]
Loading safetensors checkpoint shards:  86% Completed | 12/14 [00:18<00:03,  1.60s/it]
Loading safetensors checkpoint shards:  93% Completed | 13/14 [00:20<00:01,  1.65s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:21<00:00,  1.58s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:21<00:00,  1.56s/it]

[2026-02-27 19:24:21 PP1] Load weight end. elapsed=22.18 s, type=Qwen3_5MoeForConditionalGeneration, dtype=torch.bfloat16, avail mem=73.72 GB, mem usage=64.54 GB.
[2026-02-27 19:24:21 PP0] Load weight end. elapsed=22.21 s, type=Qwen3_5MoeForConditionalGeneration, dtype=torch.bfloat16, avail mem=73.75 GB, mem usage=64.54 GB.
[2026-02-27 19:24:21 PP1] Using KV cache dtype: torch.bfloat16
[2026-02-27 19:24:21 PP0] Using KV cache dtype: torch.bfloat16
[2026-02-27 19:24:21 PP1] Mamba Cache is allocated. max_mamba_cache_size: 404, conv_state size: 0.56GB, ssm_state size: 23.73GB 
[2026-02-27 19:24:21 PP0] Mamba Cache is allocated. max_mamba_cache_size: 404, conv_state size: 0.56GB, ssm_state size: 23.73GB 
[2026-02-27 19:24:21 PP1] KV Cache is allocated. #tokens: 1417058, K size: 13.51 GB, V size: 13.51 GB
[2026-02-27 19:24:21 PP0] KV Cache is allocated. #tokens: 1417058, K size: 13.51 GB, V size: 13.51 GB
[2026-02-27 19:24:21 PP0] Memory pool end. avail mem=22.25 GB
[2026-02-27 19:24:21 PP1] Memory pool end. avail mem=22.22 GB
[2026-02-27 19:24:21 PP0] Linear attention kernel backend: decode=triton, prefill=triton
[2026-02-27 19:24:21 PP0] Using hybrid linear attention backend for hybrid GDN models.
[2026-02-27 19:24:21 PP0] GDN kernel dispatcher: decode=TritonGDNKernel, extend=TritonGDNKernel, verify=TritonGDNKernel
[2026-02-27 19:24:21 PP0] Capture cuda graph begin. This can take up to several minutes. avail mem=22.15 GB
[2026-02-27 19:24:21 PP0] Capture cuda graph bs [1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 134]
[2026-02-27 19:24:21 PP1] Linear attention kernel backend: decode=triton, prefill=triton
[2026-02-27 19:24:21 PP1] Using hybrid linear attention backend for hybrid GDN models.
[2026-02-27 19:24:21 PP1] GDN kernel dispatcher: decode=TritonGDNKernel, extend=TritonGDNKernel, verify=TritonGDNKernel
[2026-02-27 19:24:21 PP1] Capture cuda graph begin. This can take up to several minutes. avail mem=22.12 GB
[2026-02-27 19:24:21 PP1] Capture cuda graph bs [1, 2, 4, 8, 12, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 134]
Capturing batches (bs=134 avail_mem=21.86 GB):   0%|                                                                                                                                    | 0/21 [00:00<?, ?it/s]
[2026-02-27 19:24:22 PP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/data/059/sglang/python/sglang/srt/managers/scheduler.py", line 3118, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/managers/scheduler.py", line 363, in __init__
    self.init_model_worker()
  File "/data/059/sglang/python/sglang/srt/managers/scheduler.py", line 559, in init_model_worker
    self.init_tp_model_worker()
  File "/data/059/sglang/python/sglang/srt/managers/scheduler.py", line 517, in init_tp_model_worker
    self.tp_worker = TpModelWorker(
                     ^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/managers/tp_worker.py", line 247, in __init__
    self._init_model_runner()
  File "/data/059/sglang/python/sglang/srt/managers/tp_worker.py", line 330, in _init_model_runner
    self._model_runner = ModelRunner(
                         ^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/model_executor/model_runner.py", line 416, in __init__
    self.initialize(min_per_gpu_memory)
  File "/data/059/sglang/python/sglang/srt/model_executor/model_runner.py", line 622, in initialize
    self.init_device_graphs()
  File "/data/059/sglang/python/sglang/srt/model_executor/model_runner.py", line 2171, in init_device_graphs
    self.graph_runner = graph_runners[self.device](self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 567, in __init__
    self.capture()
  File "/data/059/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 723, in capture
    _capture_one_stream()
  File "/data/059/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 710, in _capture_one_stream
    ) = self.capture_one_batch_size(bs, forward, stream_idx)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 929, in capture_one_batch_size
    run_once()
  File "/data/059/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 916, in run_once
    logits_output_or_pp_proxy_tensors = forward(
                                        ^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/models/qwen3_vl.py", line 1270, in forward
    hidden_states = general_mm_embed_routine(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/managers/mm_utils.py", line 1074, in general_mm_embed_routine
    embed_tokens = language_model.get_input_embeddings()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/059/sglang/python/sglang/srt/models/qwen3_5.py", line 711, in get_input_embeddings
    return self.embed_tokens
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
    raise AttributeError(
AttributeError: 'Qwen3_5MoeForCausalLM' object has no attribute 'embed_tokens'

[2026-02-27 19:24:22] Received sigquit from a child process. It usually means the child failed.
Killed
```

### Reproduction

python -m sglang.launch_server --model /data/Qwen3.5-35B-A3B/ --pp-size 2 

### Environment

```
python3 -m sglang.check_env
Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu129
sglang: 0.5.9
sgl_kernel: 0.3.21
flashinfer_python: 0.6.3
flashinfer_cubin: 0.6.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.2
fastapi: 0.123.5
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.32.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.5
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    NIC9    NIC10   CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE    0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE    0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE    0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     SYS     SYS     NODE    0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    NODE    NODE    SYS     48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     SYS     SYS     SYS     NODE    PIX     NODE    NODE    NODE    NODE    SYS     48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    NODE    NODE    SYS     48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE    PIX     SYS     48-95,144-191   1               N/A
NIC0    PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE                            
NIC1    NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE                            
NIC2    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    SYS     SYS     SYS     SYS     SYS     SYS     NODE                            
NIC3    NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      SYS     SYS     SYS     SYS     SYS     SYS     NODE                            
NIC4    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    NODE    NODE    SYS                             
NIC5    SYS     SYS     SYS     SYS     NODE    PIX     NODE    NODE    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE    NODE    NODE    SYS                             
NIC6    SYS     SYS     SYS     SYS     NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    NODE    NODE    SYS                             
NIC7    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      PIX     NODE    SYS                             
NIC8    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX      X      NODE    SYS                             
NIC9    SYS     SYS     SYS     SYS     NODE    NODE    NODE    PIX     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    NODE     X      SYS                             
NIC10   NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS      X                              

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_5
  NIC4: mlx5_6
  NIC5: mlx5_7
  NIC6: mlx5_8
  NIC7: mlx5_9
  NIC8: mlx5_10
  NIC9: mlx5_11
  NIC10: mlx5_bond_0


ulimit soft: 1048576
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Qwen3.5 does not work with pipeline parallelism #19500

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Qwen3.5 does not work with pipeline parallelism #19500

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions