TypeError in Llama-4-Maverick-17B-128E-Instruct-FP8 Resolved with Workaround

### Issue Description
Loading `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` with `transformers==4.53.0.dev0` or `4.51.0` initially failed with `TypeError: argument of type 'NoneType' is not iterable` in `transformers.modeling_utils.post_init`. This occurred with `Llama4ForConditionalGeneration` and `AutoModelForCausalLM` on a 4x NVIDIA RTX A6000 setup (~196GB VRAM, CUDA 12.4, Python 3.12.3, Ubuntu 24.04.2). A similar `TypeError` was seen previously with 2x GPUs. The issue was resolved using `transformers==4.51.0` and `config={"parallel_style": "none"}`.

### Steps to Reproduce
1. Install dependencies:
   ```bash
   pip install torch==2.4.1 torchvision==0.19.1 accelerate==1.7.0 compressed-tensors==0.9.4 transformers==4.51.0

2. Confirm model files (~389GB, 84 .safetensors) at /mnt/data/ai_super_palace/models/llama4/.

3. Run (failed):
from transformers import Llama4ForConditionalGeneration
model = Llama4ForConditionalGeneration.from_pretrained(
    '/mnt/data/ai_super_palace/models/llama4',
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True
)
print('Model OK')

4. Error:
TypeError: argument of type 'NoneType' is not iterable

5: Workaround (succeeded):
import os
import torch
os.environ["TORCHVISION_DISABLE_NMS"] = "1"
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    '/mnt/data/ai_super_palace/models/llama4',
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True,
    offload_folder="/mnt/data/ai_super_palace/models/llama4/offload",
    config={"parallel_style": "none"}
)
print('Model OK')


**Environment**
Transformers: 4.51.0 (or 4.53.0.dev0, commit b369a654)
Python: 3.12.3
PyTorch: 2.4.1
CUDA: 12.4
Accelerate: 1.7.0
Compressed-tensors: 0.9.4
OS: Ubuntu 24.04.2 LTS
Hardware: 4x NVIDIA RTX A6000 (~196GB VRAM)
Model: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
**Additional Details**
Model card requires transformers>=4.51.0, supports FP8 via compressed-tensors.
Previous errors: NameError: init_empty_weights, ValueError: requires accelerate, ImportError: requires compressed-tensors.
Config check (cat config.json | grep -E "parallel|tp_plan") empty.
Warnings: Uninitialized weights (feed_forward.experts.*) and offloaded parameters (expected due to VRAM).
**Request**
Is the TypeError a known llama4 bug? Should parallel_style be explicitly set in config.json?
Guidance on fine-tuning for sentiment analysis (~100–150GB/day, ~85–90% accuracy).
Sharing workaround for community benefit.
**Logs**
See traceback above. config.json (40KB) available if needed.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError in Llama-4-Maverick-17B-128E-Instruct-FP8 Resolved with Workaround #38283

Issue Description

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError in Llama-4-Maverick-17B-128E-Instruct-FP8 Resolved with Workaround #38283

Description

Issue Description

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions