Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]
- Loading pipeline components...: 43%|████▎ | 3/7 [00:00<00:00, 24.21it/s]
- Loading weights: 0%| | 0/219 [00:00<?, ?it/s][A
- Loading weights: 0%| | 1/219 [00:00<00:00, 3387.97it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.k.weight][A
- Loading weights: 0%| | 1/219 [00:00<00:00, 1611.95it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.k.weight][A
- Loading weights: 1%| | 2/219 [00:00<00:00, 1119.08it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.o.weight][A
- Loading weights: 1%| | 2/219 [00:00<00:00, 943.07it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.o.weight] [A
- Loading weights: 1%|▏ | 3/219 [00:00<00:00, 439.55it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.q.weight][A
- Loading weights: 1%|▏ | 3/219 [00:00<00:00, 413.19it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.q.weight][A
- Loading weights: 2%|▏ | 4/219 [00:00<00:00, 503.81it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight][A
- Loading weights: 2%|▏ | 4/219 [00:00<00:00, 489.80it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight][A
- Loading weights: 2%|▏ | 5/219 [00:00<00:00, 515.03it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.v.weight] [A
- Loading weights: 2%|▏ | 5/219 [00:00<00:00, 498.34it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.v.weight][A
- Loading weights: 3%|▎ | 6/219 [00:00<00:00, 562.91it/s, Materializing param=encoder.block.0.layer.0.layer_norm.weight] [A
- Loading weights: 3%|▎ | 6/219 [00:00<00:00, 555.50it/s, Materializing param=encoder.block.0.layer.0.layer_norm.weight][A
- Loading weights: 3%|▎ | 7/219 [00:00<00:00, 637.39it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 3%|▎ | 7/219 [00:00<00:00, 633.12it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 4%|▎ | 8/219 [00:00<00:00, 704.04it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 4%|▎ | 8/219 [00:00<00:00, 699.40it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 4%|▍ | 9/219 [00:00<00:00, 776.26it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 4%|▍ | 9/219 [00:00<00:00, 771.77it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 5%|▍ | 10/219 [00:00<00:00, 444.82it/s, Materializing param=encoder.block.0.layer.1.layer_norm.weight] [A
- Loading weights: 5%|▍ | 10/219 [00:00<00:00, 441.45it/s, Materializing param=encoder.block.0.layer.1.layer_norm.weight][A
- Loading weights: 5%|▌ | 11/219 [00:00<00:00, 481.67it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.k.weight][A
- Loading weights: 5%|▌ | 11/219 [00:00<00:00, 480.10it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.k.weight][A
- Loading weights: 5%|▌ | 12/219 [00:00<00:00, 516.27it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.o.weight][A
- Loading weights: 5%|▌ | 12/219 [00:00<00:00, 514.66it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.o.weight][A
- Loading weights: 6%|▌ | 13/219 [00:00<00:00, 553.21it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.q.weight][A
- Loading weights: 6%|▌ | 13/219 [00:00<00:00, 551.56it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.q.weight][A
- Loading weights: 6%|▋ | 14/219 [00:00<00:00, 590.45it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.v.weight][A
- Loading weights: 6%|▋ | 14/219 [00:00<00:00, 588.87it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.v.weight][A
- Loading weights: 7%|▋ | 15/219 [00:00<00:00, 627.49it/s, Materializing param=encoder.block.1.layer.0.layer_norm.weight] [A
- Loading weights: 7%|▋ | 15/219 [00:00<00:00, 625.63it/s, Materializing param=encoder.block.1.layer.0.layer_norm.weight][A
- Loading weights: 7%|▋ | 16/219 [00:00<00:00, 663.70it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 7%|▋ | 16/219 [00:00<00:00, 661.53it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 8%|▊ | 17/219 [00:00<00:00, 699.15it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 8%|▊ | 17/219 [00:00<00:00, 696.99it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 8%|▊ | 18/219 [00:00<00:00, 734.26it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 8%|▊ | 18/219 [00:00<00:00, 732.27it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.layer_norm.weight] [A
- Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.layer_norm.weight][A
- Loading weights: 9%|▉ | 20/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.k.weight][A
- Loading weights: 9%|▉ | 20/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.k.weight][A
- Loading weights: 10%|▉ | 21/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.o.weight][A
- Loading weights: 10%|▉ | 21/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.o.weight][A
- Loading weights: 10%|█ | 22/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.q.weight][A
- Loading weights: 10%|█ | 22/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.q.weight][A
- Loading weights: 11%|█ | 23/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.v.weight][A
- Loading weights: 11%|█ | 23/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.v.weight][A
- Loading weights: 11%|█ | 24/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.layer_norm.weight] [A
- Loading weights: 11%|█ | 24/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.layer_norm.weight][A
- Loading weights: 11%|█▏ | 25/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 11%|█▏ | 25/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 12%|█▏ | 26/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 12%|█▏ | 26/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 12%|█▏ | 27/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 12%|█▏ | 27/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 13%|█▎ | 28/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.layer_norm.weight] [A
- Loading weights: 13%|█▎ | 28/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.layer_norm.weight][A
- Loading weights: 13%|█▎ | 29/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.k.weight][A
- Loading weights: 13%|█▎ | 29/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.k.weight][A
- Loading weights: 14%|█▎ | 30/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.o.weight][A
- Loading weights: 14%|█▎ | 30/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.o.weight][A
- Loading weights: 14%|█▍ | 31/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.q.weight][A
- Loading weights: 14%|█▍ | 31/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.q.weight][A
- Loading weights: 15%|█▍ | 32/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.v.weight][A
- Loading weights: 15%|█▍ | 32/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.v.weight][A
- Loading weights: 15%|█▌ | 33/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.layer_norm.weight] [A
- Loading weights: 15%|█▌ | 33/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.layer_norm.weight][A
- Loading weights: 16%|█▌ | 34/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 16%|█▌ | 34/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 16%|█▌ | 35/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 16%|█▌ | 35/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 16%|█▋ | 36/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 16%|█▋ | 36/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 17%|█▋ | 37/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.layer_norm.weight] [A
- Loading weights: 17%|█▋ | 37/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.layer_norm.weight][A
- Loading weights: 17%|█▋ | 38/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.k.weight][A
- Loading weights: 17%|█▋ | 38/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.k.weight][A
- Loading weights: 18%|█▊ | 39/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.o.weight][A
- Loading weights: 18%|█▊ | 39/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.o.weight][A
- Loading weights: 18%|█▊ | 40/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.q.weight][A
- Loading weights: 18%|█▊ | 40/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.q.weight][A
- Loading weights: 19%|█▊ | 41/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.v.weight][A
- Loading weights: 19%|█▊ | 41/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.v.weight][A
- Loading weights: 19%|█▉ | 42/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.layer_norm.weight] [A
- Loading weights: 19%|█▉ | 42/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.layer_norm.weight][A
- Loading weights: 20%|█▉ | 43/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 20%|█▉ | 43/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 20%|██ | 44/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 20%|██ | 44/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 21%|██ | 45/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 21%|██ | 45/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 21%|██ | 46/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.layer_norm.weight] [A
- Loading weights: 21%|██ | 46/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.layer_norm.weight][A
- Loading weights: 21%|██▏ | 47/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.k.weight][A
- Loading weights: 21%|██▏ | 47/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.k.weight][A
- Loading weights: 22%|██▏ | 48/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.o.weight][A
- Loading weights: 22%|██▏ | 48/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.o.weight][A
- Loading weights: 22%|██▏ | 49/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.q.weight][A
- Loading weights: 22%|██▏ | 49/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.q.weight][A
- Loading weights: 23%|██▎ | 50/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.v.weight][A
- Loading weights: 23%|██▎ | 50/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.v.weight][A
- Loading weights: 23%|██▎ | 51/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.layer_norm.weight] [A
- Loading weights: 23%|██▎ | 51/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.layer_norm.weight][A
- Loading weights: 24%|██▎ | 52/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 24%|██▎ | 52/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 24%|██▍ | 53/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 24%|██▍ | 53/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 25%|██▍ | 54/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 25%|██▍ | 54/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 25%|██▌ | 55/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.layer_norm.weight] [A
- Loading weights: 25%|██▌ | 55/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.layer_norm.weight][A
- Loading weights: 26%|██▌ | 56/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.k.weight][A
- Loading weights: 26%|██▌ | 56/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.k.weight][A
- Loading weights: 26%|██▌ | 57/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.o.weight][A
- Loading weights: 26%|██▌ | 57/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.o.weight][A
- Loading weights: 26%|██▋ | 58/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.q.weight][A
- Loading weights: 26%|██▋ | 58/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.q.weight][A
- Loading weights: 27%|██▋ | 59/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.v.weight][A
- Loading weights: 27%|██▋ | 59/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.v.weight][A
- Loading weights: 27%|██▋ | 60/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.layer_norm.weight] [A
- Loading weights: 27%|██▋ | 60/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.layer_norm.weight][A
- Loading weights: 28%|██▊ | 61/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 28%|██▊ | 61/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 28%|██▊ | 62/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 28%|██▊ | 62/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 29%|██▉ | 63/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 29%|██▉ | 63/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 29%|██▉ | 64/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.layer_norm.weight] [A
- Loading weights: 29%|██▉ | 64/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.layer_norm.weight][A
- Loading weights: 30%|██▉ | 65/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.k.weight][A
- Loading weights: 30%|██▉ | 65/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.k.weight][A
- Loading weights: 30%|███ | 66/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.o.weight][A
- Loading weights: 30%|███ | 66/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.o.weight][A
- Loading weights: 31%|███ | 67/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.q.weight][A
- Loading weights: 31%|███ | 67/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.q.weight][A
- Loading weights: 31%|███ | 68/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.v.weight][A
- Loading weights: 31%|███ | 68/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.v.weight][A
- Loading weights: 32%|███▏ | 69/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.layer_norm.weight] [A
- Loading weights: 32%|███▏ | 69/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.layer_norm.weight][A
- Loading weights: 32%|███▏ | 70/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 32%|███▏ | 70/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 32%|███▏ | 71/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 32%|███▏ | 71/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 33%|███▎ | 72/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 33%|███▎ | 72/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 33%|███▎ | 73/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.layer_norm.weight] [A
- Loading weights: 33%|███▎ | 73/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.layer_norm.weight][A
- Loading weights: 34%|███▍ | 74/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.k.weight][A
- Loading weights: 34%|███▍ | 74/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.k.weight][A
- Loading weights: 34%|███▍ | 75/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.o.weight][A
- Loading weights: 34%|███▍ | 75/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.o.weight][A
- Loading weights: 35%|███▍ | 76/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.q.weight][A
- Loading weights: 35%|███▍ | 76/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.q.weight][A
- Loading weights: 35%|███▌ | 77/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.v.weight][A
- Loading weights: 35%|███▌ | 77/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.v.weight][A
- Loading weights: 36%|███▌ | 78/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.layer_norm.weight] [A
- Loading weights: 36%|███▌ | 78/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.layer_norm.weight][A
- Loading weights: 36%|███▌ | 79/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 36%|███▌ | 79/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 37%|███▋ | 80/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 37%|███▋ | 80/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 37%|███▋ | 81/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 37%|███▋ | 81/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 37%|███▋ | 82/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.layer_norm.weight] [A
- Loading weights: 37%|███▋ | 82/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.layer_norm.weight][A
- Loading weights: 38%|███▊ | 83/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.k.weight][A
- Loading weights: 38%|███▊ | 83/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.k.weight][A
- Loading weights: 38%|███▊ | 84/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.o.weight][A
- Loading weights: 38%|███▊ | 84/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.o.weight][A
- Loading weights: 39%|███▉ | 85/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.q.weight][A
- Loading weights: 39%|███▉ | 85/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.q.weight][A
- Loading weights: 39%|███▉ | 86/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.v.weight][A
- Loading weights: 39%|███▉ | 86/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.v.weight][A
- Loading weights: 40%|███▉ | 87/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.layer_norm.weight] [A
- Loading weights: 40%|███▉ | 87/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.layer_norm.weight][A
- Loading weights: 40%|████ | 88/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 40%|████ | 88/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 41%|████ | 89/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 41%|████ | 89/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 41%|████ | 90/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 41%|████ | 90/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 42%|████▏ | 91/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.layer_norm.weight] [A
- Loading weights: 42%|████▏ | 91/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.layer_norm.weight][A
- Loading weights: 42%|████▏ | 92/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.k.weight][A
- Loading weights: 42%|████▏ | 92/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.k.weight][A
- Loading weights: 42%|████▏ | 93/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.o.weight][A
- Loading weights: 42%|████▏ | 93/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.o.weight][A
- Loading weights: 43%|████▎ | 94/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.q.weight][A
- Loading weights: 43%|████▎ | 94/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.q.weight][A
- Loading weights: 43%|████▎ | 95/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.v.weight][A
- Loading weights: 43%|████▎ | 95/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.v.weight][A
- Loading weights: 44%|████▍ | 96/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.layer_norm.weight] [A
- Loading weights: 44%|████▍ | 96/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.layer_norm.weight][A
- Loading weights: 44%|████▍ | 97/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 44%|████▍ | 97/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 45%|████▍ | 98/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 45%|████▍ | 98/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 45%|████▌ | 99/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 45%|████▌ | 99/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 46%|████▌ | 100/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.layer_norm.weight] [A
- Loading weights: 46%|████▌ | 100/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.layer_norm.weight][A
- Loading weights: 46%|████▌ | 101/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.k.weight][A
- Loading weights: 46%|████▌ | 101/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.k.weight][A
- Loading weights: 47%|████▋ | 102/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.o.weight][A
- Loading weights: 47%|████▋ | 102/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.o.weight][A
- Loading weights: 47%|████▋ | 103/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.q.weight][A
- Loading weights: 47%|████▋ | 103/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.q.weight][A
- Loading weights: 47%|████▋ | 104/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.v.weight][A
- Loading weights: 47%|████▋ | 104/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.v.weight][A
- Loading weights: 48%|████▊ | 105/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.layer_norm.weight] [A
- Loading weights: 48%|████▊ | 105/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.layer_norm.weight][A
- Loading weights: 48%|████▊ | 106/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 48%|████▊ | 106/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 49%|████▉ | 107/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 49%|████▉ | 107/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 49%|████▉ | 108/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 49%|████▉ | 108/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.layer_norm.weight] [A
- Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.layer_norm.weight][A
- Loading weights: 50%|█████ | 110/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.k.weight][A
- Loading weights: 50%|█████ | 110/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.k.weight][A
- Loading weights: 51%|█████ | 111/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.o.weight][A
- Loading weights: 51%|█████ | 111/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.o.weight][A
- Loading weights: 51%|█████ | 112/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.q.weight][A
- Loading weights: 51%|█████ | 112/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.q.weight][A
- Loading weights: 52%|█████▏ | 113/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.v.weight][A
- Loading weights: 52%|█████▏ | 113/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.v.weight][A
- Loading weights: 52%|█████▏ | 114/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.layer_norm.weight] [A
- Loading weights: 52%|█████▏ | 114/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.layer_norm.weight][A
- Loading weights: 53%|█████▎ | 115/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 53%|█████▎ | 115/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 53%|█████▎ | 116/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 53%|█████▎ | 116/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 53%|█████▎ | 117/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 53%|█████▎ | 117/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 54%|█████▍ | 118/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.layer_norm.weight] [A
- Loading weights: 54%|█████▍ | 118/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.layer_norm.weight][A
- Loading weights: 54%|█████▍ | 119/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.k.weight][A
- Loading weights: 54%|█████▍ | 119/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.k.weight][A
- Loading weights: 55%|█████▍ | 120/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.o.weight][A
- Loading weights: 55%|█████▍ | 120/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.o.weight][A
- Loading weights: 55%|█████▌ | 121/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.q.weight][A
- Loading weights: 55%|█████▌ | 121/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.q.weight][A
- Loading weights: 56%|█████▌ | 122/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.v.weight][A
- Loading weights: 56%|█████▌ | 122/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.v.weight][A
- Loading weights: 56%|█████▌ | 123/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.layer_norm.weight] [A
- Loading weights: 56%|█████▌ | 123/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.layer_norm.weight][A
- Loading weights: 57%|█████▋ | 124/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 57%|█████▋ | 124/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 57%|█████▋ | 125/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 57%|█████▋ | 125/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 58%|█████▊ | 126/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 58%|█████▊ | 126/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 58%|█████▊ | 127/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.layer_norm.weight] [A
- Loading weights: 58%|█████▊ | 127/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.layer_norm.weight][A
- Loading weights: 58%|█████▊ | 128/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.k.weight][A
- Loading weights: 58%|█████▊ | 128/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.k.weight][A
- Loading weights: 59%|█████▉ | 129/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.o.weight][A
- Loading weights: 59%|█████▉ | 129/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.o.weight][A
- Loading weights: 59%|█████▉ | 130/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.q.weight][A
- Loading weights: 59%|█████▉ | 130/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.q.weight][A
- Loading weights: 60%|█████▉ | 131/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.v.weight][A
- Loading weights: 60%|█████▉ | 131/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.v.weight][A
- Loading weights: 60%|██████ | 132/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.layer_norm.weight] [A
- Loading weights: 60%|██████ | 132/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.layer_norm.weight][A
- Loading weights: 61%|██████ | 133/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 61%|██████ | 133/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 61%|██████ | 134/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 61%|██████ | 134/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 62%|██████▏ | 135/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 62%|██████▏ | 135/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 62%|██████▏ | 136/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.layer_norm.weight] [A
- Loading weights: 62%|██████▏ | 136/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.layer_norm.weight][A
- Loading weights: 63%|██████▎ | 137/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.k.weight][A
- Loading weights: 63%|██████▎ | 137/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.k.weight][A
- Loading weights: 63%|██████▎ | 138/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.o.weight][A
- Loading weights: 63%|██████▎ | 138/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.o.weight][A
- Loading weights: 63%|██████▎ | 139/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.q.weight][A
- Loading weights: 63%|██████▎ | 139/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.q.weight][A
- Loading weights: 64%|██████▍ | 140/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.v.weight][A
- Loading weights: 64%|██████▍ | 140/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.v.weight][A
- Loading weights: 64%|██████▍ | 141/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.layer_norm.weight] [A
- Loading weights: 64%|██████▍ | 141/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.layer_norm.weight][A
- Loading weights: 65%|██████▍ | 142/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 65%|██████▍ | 142/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 65%|██████▌ | 143/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 65%|██████▌ | 143/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 66%|██████▌ | 144/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 66%|██████▌ | 144/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 66%|██████▌ | 145/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.layer_norm.weight] [A
- Loading weights: 66%|██████▌ | 145/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.layer_norm.weight][A
- Loading weights: 67%|██████▋ | 146/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.k.weight][A
- Loading weights: 67%|██████▋ | 146/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.k.weight][A
- Loading weights: 67%|██████▋ | 147/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.o.weight][A
- Loading weights: 67%|██████▋ | 147/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.o.weight][A
- Loading weights: 68%|██████▊ | 148/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.q.weight][A
- Loading weights: 68%|██████▊ | 148/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.q.weight][A
- Loading weights: 68%|██████▊ | 149/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.v.weight][A
- Loading weights: 68%|██████▊ | 149/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.v.weight][A
- Loading weights: 68%|██████▊ | 150/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.layer_norm.weight] [A
- Loading weights: 68%|██████▊ | 150/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.layer_norm.weight][A
- Loading weights: 69%|██████▉ | 151/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 69%|██████▉ | 151/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 69%|██████▉ | 152/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 69%|██████▉ | 152/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 70%|██████▉ | 153/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 70%|██████▉ | 153/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 70%|███████ | 154/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.layer_norm.weight] [A
- Loading weights: 70%|███████ | 154/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.layer_norm.weight][A
- Loading weights: 71%|███████ | 155/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.k.weight][A
- Loading weights: 71%|███████ | 155/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.k.weight][A
- Loading weights: 71%|███████ | 156/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.o.weight][A
- Loading weights: 71%|███████ | 156/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.o.weight][A
- Loading weights: 72%|███████▏ | 157/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.q.weight][A
- Loading weights: 72%|███████▏ | 157/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.q.weight][A
- Loading weights: 72%|███████▏ | 158/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.v.weight][A
- Loading weights: 72%|███████▏ | 158/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.v.weight][A
- Loading weights: 73%|███████▎ | 159/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.layer_norm.weight] [A
- Loading weights: 73%|███████▎ | 159/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.layer_norm.weight][A
- Loading weights: 73%|███████▎ | 160/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 73%|███████▎ | 160/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 74%|███████▎ | 161/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 74%|███████▎ | 161/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 74%|███████▍ | 162/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 74%|███████▍ | 162/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 74%|███████▍ | 163/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.layer_norm.weight] [A
- Loading weights: 74%|███████▍ | 163/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.layer_norm.weight][A
- Loading weights: 75%|███████▍ | 164/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.k.weight][A
- Loading weights: 75%|███████▍ | 164/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.k.weight][A
- Loading weights: 75%|███████▌ | 165/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.o.weight][A
- Loading weights: 75%|███████▌ | 165/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.o.weight][A
- Loading weights: 76%|███████▌ | 166/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.q.weight][A
- Loading weights: 76%|███████▌ | 166/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.q.weight][A
- Loading weights: 76%|███████▋ | 167/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.v.weight][A
- Loading weights: 76%|███████▋ | 167/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.v.weight][A
- Loading weights: 77%|███████▋ | 168/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.layer_norm.weight] [A
- Loading weights: 77%|███████▋ | 168/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.layer_norm.weight][A
- Loading weights: 77%|███████▋ | 169/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 77%|███████▋ | 169/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 78%|███████▊ | 170/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 78%|███████▊ | 170/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 78%|███████▊ | 171/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 78%|███████▊ | 171/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 79%|███████▊ | 172/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.layer_norm.weight] [A
- Loading weights: 79%|███████▊ | 172/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.layer_norm.weight][A
- Loading weights: 79%|███████▉ | 173/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.k.weight][A
- Loading weights: 79%|███████▉ | 173/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.k.weight][A
- Loading weights: 79%|███████▉ | 174/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.o.weight][A
- Loading weights: 79%|███████▉ | 174/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.o.weight][A
- Loading weights: 80%|███████▉ | 175/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.q.weight][A
- Loading weights: 80%|███████▉ | 175/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.q.weight][A
- Loading weights: 80%|████████ | 176/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.v.weight][A
- Loading weights: 80%|████████ | 176/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.v.weight][A
- Loading weights: 81%|████████ | 177/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.layer_norm.weight] [A
- Loading weights: 81%|████████ | 177/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.layer_norm.weight][A
- Loading weights: 81%|████████▏ | 178/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 81%|████████▏ | 178/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 82%|████████▏ | 179/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 82%|████████▏ | 179/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 82%|████████▏ | 180/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 82%|████████▏ | 180/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 83%|████████▎ | 181/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.layer_norm.weight] [A
- Loading weights: 83%|████████▎ | 181/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.layer_norm.weight][A
- Loading weights: 83%|████████▎ | 182/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.k.weight][A
- Loading weights: 83%|████████▎ | 182/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.k.weight][A
- Loading weights: 84%|████████▎ | 183/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.o.weight][A
- Loading weights: 84%|████████▎ | 183/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.o.weight][A
- Loading weights: 84%|████████▍ | 184/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.q.weight][A
- Loading weights: 84%|████████▍ | 184/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.q.weight][A
- Loading weights: 84%|████████▍ | 185/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.v.weight][A
- Loading weights: 84%|████████▍ | 185/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.v.weight][A
- Loading weights: 85%|████████▍ | 186/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.layer_norm.weight] [A
- Loading weights: 85%|████████▍ | 186/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.layer_norm.weight][A
- Loading weights: 85%|████████▌ | 187/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 85%|████████▌ | 187/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 86%|████████▌ | 188/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 86%|████████▌ | 188/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 86%|████████▋ | 189/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 86%|████████▋ | 189/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 87%|████████▋ | 190/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.layer_norm.weight] [A
- Loading weights: 87%|████████▋ | 190/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.layer_norm.weight][A
- Loading weights: 87%|████████▋ | 191/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.k.weight][A
- Loading weights: 87%|████████▋ | 191/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.k.weight][A
- Loading weights: 88%|████████▊ | 192/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.o.weight][A
- Loading weights: 88%|████████▊ | 192/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.o.weight][A
- Loading weights: 88%|████████▊ | 193/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.q.weight][A
- Loading weights: 88%|████████▊ | 193/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.q.weight][A
- Loading weights: 89%|████████▊ | 194/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.v.weight][A
- Loading weights: 89%|████████▊ | 194/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.v.weight][A
- Loading weights: 89%|████████▉ | 195/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.layer_norm.weight] [A
- Loading weights: 89%|████████▉ | 195/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.layer_norm.weight][A
- Loading weights: 89%|████████▉ | 196/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 89%|████████▉ | 196/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 90%|████████▉ | 197/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 90%|████████▉ | 197/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 90%|█████████ | 198/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 90%|█████████ | 198/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 91%|█████████ | 199/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.layer_norm.weight] [A
- Loading weights: 91%|█████████ | 199/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.layer_norm.weight][A
- Loading weights: 91%|█████████▏| 200/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.k.weight][A
- Loading weights: 91%|█████████▏| 200/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.k.weight][A
- Loading weights: 92%|█████████▏| 201/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.o.weight][A
- Loading weights: 92%|█████████▏| 201/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.o.weight][A
- Loading weights: 92%|█████████▏| 202/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.q.weight][A
- Loading weights: 92%|█████████▏| 202/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.q.weight][A
- Loading weights: 93%|█████████▎| 203/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.v.weight][A
- Loading weights: 93%|█████████▎| 203/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.v.weight][A
- Loading weights: 93%|█████████▎| 204/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.layer_norm.weight] [A
- Loading weights: 93%|█████████▎| 204/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.layer_norm.weight][A
- Loading weights: 94%|█████████▎| 205/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 94%|█████████▎| 205/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 94%|█████████▍| 206/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 94%|█████████▍| 206/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 95%|█████████▍| 207/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 95%|█████████▍| 207/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 95%|█████████▍| 208/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.layer_norm.weight] [A
- Loading weights: 95%|█████████▍| 208/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.layer_norm.weight][A
- Loading weights: 95%|█████████▌| 209/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.k.weight][A
- Loading weights: 95%|█████████▌| 209/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.k.weight][A
- Loading weights: 96%|█████████▌| 210/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.o.weight][A
- Loading weights: 96%|█████████▌| 210/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.o.weight][A
- Loading weights: 96%|█████████▋| 211/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.q.weight][A
- Loading weights: 96%|█████████▋| 211/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.q.weight][A
- Loading weights: 97%|█████████▋| 212/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.v.weight][A
- Loading weights: 97%|█████████▋| 212/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.v.weight][A
- Loading weights: 97%|█████████▋| 213/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.layer_norm.weight] [A
- Loading weights: 97%|█████████▋| 213/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.layer_norm.weight][A
- Loading weights: 98%|█████████▊| 214/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 98%|█████████▊| 214/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_0.weight][A
- Loading weights: 98%|█████████▊| 215/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 98%|█████████▊| 215/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_1.weight][A
- Loading weights: 99%|█████████▊| 216/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wo.weight] [A
- Loading weights: 99%|█████████▊| 216/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wo.weight][A
- Loading weights: 99%|█████████▉| 217/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.layer_norm.weight] [A
- Loading weights: 99%|█████████▉| 217/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.layer_norm.weight][A
- Loading weights: 100%|█████████▉| 218/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.final_layer_norm.weight] [A
- Loading weights: 100%|█████████▉| 218/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.final_layer_norm.weight][A
- Loading weights: 100%|██████████| 219/219 [00:00<00:00, 586.19it/s, Materializing param=shared.weight] [A
- Loading weights: 100%|██████████| 219/219 [00:00<00:00, 586.19it/s, Materializing param=shared.weight][A
- Loading weights: 100%|██████████| 219/219 [00:00<00:00, 866.14it/s, Materializing param=shared.weight]
- Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s][A
- Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 5.36it/s][A
- Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 13.00it/s]
- Loading pipeline components...: 86%|████████▌ | 6/7 [00:00<00:00, 5.87it/s]
- Loading weights: 0%| | 0/196 [00:00<?, ?it/s][A
- Loading weights: 1%| | 1/196 [00:00<00:00, 32513.98it/s, Materializing param=text_model.embeddings.position_embedding.weight][A
- Loading weights: 1%| | 1/196 [00:00<00:00, 1680.41it/s, Materializing param=text_model.embeddings.position_embedding.weight] [A
- Loading weights: 1%| | 2/196 [00:00<00:00, 272.93it/s, Materializing param=text_model.embeddings.token_embedding.weight] [A
- Loading weights: 1%| | 2/196 [00:00<00:00, 255.14it/s, Materializing param=text_model.embeddings.token_embedding.weight][A
- Loading weights: 2%|▏ | 3/196 [00:00<00:00, 345.67it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.bias][A
- Loading weights: 2%|▏ | 3/196 [00:00<00:00, 335.02it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.bias][A
- Loading weights: 2%|▏ | 4/196 [00:00<00:00, 407.75it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.weight][A
- Loading weights: 2%|▏ | 4/196 [00:00<00:00, 401.82it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.weight][A
- Loading weights: 3%|▎ | 5/196 [00:00<00:00, 453.08it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.bias] [A
- Loading weights: 3%|▎ | 5/196 [00:00<00:00, 446.62it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.bias][A
- Loading weights: 3%|▎ | 6/196 [00:00<00:00, 493.08it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.weight][A
- Loading weights: 3%|▎ | 6/196 [00:00<00:00, 467.15it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.weight][A
- Loading weights: 4%|▎ | 7/196 [00:00<00:00, 487.90it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.bias] [A
- Loading weights: 4%|▎ | 7/196 [00:00<00:00, 483.41it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.bias][A
- Loading weights: 4%|▍ | 8/196 [00:00<00:00, 527.08it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.weight][A
- Loading weights: 4%|▍ | 8/196 [00:00<00:00, 509.83it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.weight][A
- Loading weights: 5%|▍ | 9/196 [00:00<00:00, 557.08it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.bias] [A
- Loading weights: 5%|▍ | 9/196 [00:00<00:00, 548.11it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.bias][A
- Loading weights: 5%|▌ | 10/196 [00:00<00:00, 604.11it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.weight][A
- Loading weights: 5%|▌ | 10/196 [00:00<00:00, 601.36it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.weight][A
- Loading weights: 6%|▌ | 11/196 [00:00<00:00, 656.67it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.bias][A
- Loading weights: 6%|▌ | 11/196 [00:00<00:00, 642.75it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.bias][A
- Loading weights: 6%|▌ | 12/196 [00:00<00:00, 685.74it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.weight][A
- Loading weights: 6%|▌ | 12/196 [00:00<00:00, 643.82it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.weight][A
- Loading weights: 7%|▋ | 13/196 [00:00<00:00, 559.98it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.bias][A
- Loading weights: 7%|▋ | 13/196 [00:00<00:00, 546.64it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.bias][A
- Loading weights: 7%|▋ | 14/196 [00:00<00:00, 569.30it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.weight][A
- Loading weights: 7%|▋ | 14/196 [00:00<00:00, 563.84it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.weight][A
- Loading weights: 8%|▊ | 15/196 [00:00<00:00, 575.09it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.bias] [A
- Loading weights: 8%|▊ | 15/196 [00:00<00:00, 563.13it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.bias][A
- Loading weights: 8%|▊ | 16/196 [00:00<00:00, 537.23it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.weight][A
- Loading weights: 8%|▊ | 16/196 [00:00<00:00, 534.81it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.weight][A
- Loading weights: 9%|▊ | 17/196 [00:00<00:00, 543.48it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.bias] [A
- Loading weights: 9%|▊ | 17/196 [00:00<00:00, 530.42it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.bias][A
- Loading weights: 9%|▉ | 18/196 [00:00<00:00, 544.65it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.weight][A
- Loading weights: 9%|▉ | 18/196 [00:00<00:00, 537.06it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.weight][A
- Loading weights: 10%|▉ | 19/196 [00:00<00:00, 546.28it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.bias] [A
- Loading weights: 10%|▉ | 19/196 [00:00<00:00, 543.87it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.bias][A
- Loading weights: 10%|█ | 20/196 [00:00<00:00, 568.58it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.weight][A
- Loading weights: 10%|█ | 20/196 [00:00<00:00, 561.26it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.weight][A
- Loading weights: 11%|█ | 21/196 [00:00<00:00, 583.38it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.bias] [A
- Loading weights: 11%|█ | 21/196 [00:00<00:00, 575.56it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.bias][A
- Loading weights: 11%|█ | 22/196 [00:00<00:00, 538.59it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.weight][A
- Loading weights: 11%|█ | 22/196 [00:00<00:00, 526.97it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.weight][A
- Loading weights: 12%|█▏ | 23/196 [00:00<00:00, 526.67it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.bias] [A
- Loading weights: 12%|█▏ | 23/196 [00:00<00:00, 524.82it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.bias][A
- Loading weights: 12%|█▏ | 24/196 [00:00<00:00, 507.50it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.weight][A
- Loading weights: 12%|█▏ | 24/196 [00:00<00:00, 505.89it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.weight][A
- Loading weights: 13%|█▎ | 25/196 [00:00<00:00, 498.34it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.bias] [A
- Loading weights: 13%|█▎ | 25/196 [00:00<00:00, 493.00it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.bias][A
- Loading weights: 13%|█▎ | 26/196 [00:00<00:00, 500.69it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.weight][A
- Loading weights: 13%|█▎ | 26/196 [00:00<00:00, 495.33it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.weight][A
- Loading weights: 14%|█▍ | 27/196 [00:00<00:00, 503.28it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.bias][A
- Loading weights: 14%|█▍ | 27/196 [00:00<00:00, 498.59it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.bias][A
- Loading weights: 14%|█▍ | 28/196 [00:00<00:00, 492.22it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.weight][A
- Loading weights: 14%|█▍ | 28/196 [00:00<00:00, 476.53it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.weight][A
- Loading weights: 15%|█▍ | 29/196 [00:00<00:00, 489.92it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.bias][A
- Loading weights: 15%|█▍ | 29/196 [00:00<00:00, 489.22it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.bias][A
- Loading weights: 15%|█▌ | 30/196 [00:00<00:00, 504.99it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.weight][A
- Loading weights: 15%|█▌ | 30/196 [00:00<00:00, 503.90it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.weight][A
- Loading weights: 16%|█▌ | 31/196 [00:00<00:00, 518.33it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.bias] [A
- Loading weights: 16%|█▌ | 31/196 [00:00<00:00, 517.61it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.bias][A
- Loading weights: 16%|█▋ | 32/196 [00:00<00:00, 532.90it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.weight][A
- Loading weights: 16%|█▋ | 32/196 [00:00<00:00, 530.14it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.weight][A
- Loading weights: 17%|█▋ | 33/196 [00:00<00:00, 542.83it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.bias] [A
- Loading weights: 17%|█▋ | 33/196 [00:00<00:00, 542.08it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.bias][A
- Loading weights: 17%|█▋ | 34/196 [00:00<00:00, 556.30it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.weight][A
- Loading weights: 17%|█▋ | 34/196 [00:00<00:00, 555.72it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.weight][A
- Loading weights: 18%|█▊ | 35/196 [00:00<00:00, 565.88it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.bias] [A
- Loading weights: 18%|█▊ | 35/196 [00:00<00:00, 565.21it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.bias][A
- Loading weights: 18%|█▊ | 36/196 [00:00<00:00, 578.11it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.weight][A
- Loading weights: 18%|█▊ | 36/196 [00:00<00:00, 577.63it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.weight][A
- Loading weights: 19%|█▉ | 37/196 [00:00<00:00, 588.22it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.bias] [A
- Loading weights: 19%|█▉ | 37/196 [00:00<00:00, 586.67it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.bias][A
- Loading weights: 19%|█▉ | 38/196 [00:00<00:00, 583.97it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.weight][A
- Loading weights: 19%|█▉ | 38/196 [00:00<00:00, 582.14it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.weight][A
- Loading weights: 20%|█▉ | 39/196 [00:00<00:00, 591.78it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.bias] [A
- Loading weights: 20%|█▉ | 39/196 [00:00<00:00, 590.06it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.bias][A
- Loading weights: 20%|██ | 40/196 [00:00<00:00, 581.93it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.weight][A
- Loading weights: 20%|██ | 40/196 [00:00<00:00, 577.67it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.weight][A
- Loading weights: 21%|██ | 41/196 [00:00<00:00, 585.71it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.bias] [A
- Loading weights: 21%|██ | 41/196 [00:00<00:00, 578.43it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.bias][A
- Loading weights: 21%|██▏ | 42/196 [00:00<00:00, 588.02it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.weight][A
- Loading weights: 21%|██▏ | 42/196 [00:00<00:00, 586.57it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.weight][A
- Loading weights: 22%|██▏ | 43/196 [00:00<00:00, 589.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.bias][A
- Loading weights: 22%|██▏ | 43/196 [00:00<00:00, 543.53it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.bias][A
- Loading weights: 22%|██▏ | 44/196 [00:00<00:00, 551.08it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.weight][A
- Loading weights: 22%|██▏ | 44/196 [00:00<00:00, 547.79it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.weight][A
- Loading weights: 23%|██▎ | 45/196 [00:00<00:00, 524.04it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.bias][A
- Loading weights: 23%|██▎ | 45/196 [00:00<00:00, 521.34it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.bias][A
- Loading weights: 23%|██▎ | 46/196 [00:00<00:00, 515.26it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.weight][A
- Loading weights: 23%|██▎ | 46/196 [00:00<00:00, 512.44it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.weight][A
- Loading weights: 24%|██▍ | 47/196 [00:00<00:00, 522.28it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.bias] [A
- Loading weights: 24%|██▍ | 47/196 [00:00<00:00, 519.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.bias][A
- Loading weights: 24%|██▍ | 48/196 [00:00<00:00, 522.74it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.weight][A
- Loading weights: 24%|██▍ | 48/196 [00:00<00:00, 521.92it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.weight][A
- Loading weights: 25%|██▌ | 49/196 [00:00<00:00, 524.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.bias] [A
- Loading weights: 25%|██▌ | 49/196 [00:00<00:00, 523.04it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.bias][A
- Loading weights: 26%|██▌ | 50/196 [00:00<00:00, 520.00it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.weight][A
- Loading weights: 26%|██▌ | 50/196 [00:00<00:00, 516.50it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.weight][A
- Loading weights: 26%|██▌ | 51/196 [00:00<00:00, 522.98it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.bias] [A
- Loading weights: 26%|██▌ | 51/196 [00:00<00:00, 521.65it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.bias][A
- Loading weights: 27%|██▋ | 52/196 [00:00<00:00, 531.13it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.weight][A
- Loading weights: 27%|██▋ | 52/196 [00:00<00:00, 530.70it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.weight][A
- Loading weights: 27%|██▋ | 53/196 [00:00<00:00, 539.91it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.bias] [A
- Loading weights: 27%|██▋ | 53/196 [00:00<00:00, 538.33it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.bias][A
- Loading weights: 28%|██▊ | 54/196 [00:00<00:00, 547.87it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.weight][A
- Loading weights: 28%|██▊ | 54/196 [00:00<00:00, 547.55it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.weight][A
- Loading weights: 28%|██▊ | 55/196 [00:00<00:00, 556.84it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.bias] [A
- Loading weights: 28%|██▊ | 55/196 [00:00<00:00, 556.46it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.bias][A
- Loading weights: 29%|██▊ | 56/196 [00:00<00:00, 560.25it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight][A
- Loading weights: 29%|██▊ | 56/196 [00:00<00:00, 559.15it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight][A
- Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight][A
- Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.bias] [A
- Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.bias][A
- Loading weights: 30%|██▉ | 58/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.weight][A
- Loading weights: 30%|██▉ | 58/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.weight][A
- Loading weights: 30%|███ | 59/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.bias][A
- Loading weights: 30%|███ | 59/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.bias][A
- Loading weights: 31%|███ | 60/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.weight][A
- Loading weights: 31%|███ | 60/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.weight][A
- Loading weights: 31%|███ | 61/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.bias][A
- Loading weights: 31%|███ | 61/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.bias][A
- Loading weights: 32%|███▏ | 62/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.weight][A
- Loading weights: 32%|███▏ | 62/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.weight][A
- Loading weights: 32%|███▏ | 63/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.bias] [A
- Loading weights: 32%|███▏ | 63/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.bias][A
- Loading weights: 33%|███▎ | 64/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.weight][A
- Loading weights: 33%|███▎ | 64/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.weight][A
- Loading weights: 33%|███▎ | 65/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.bias] [A
- Loading weights: 33%|███▎ | 65/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.bias][A
- Loading weights: 34%|███▎ | 66/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.weight][A
- Loading weights: 34%|███▎ | 66/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.weight][A
- Loading weights: 34%|███▍ | 67/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.bias] [A
- Loading weights: 34%|███▍ | 67/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.bias][A
- Loading weights: 35%|███▍ | 68/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.weight][A
- Loading weights: 35%|███▍ | 68/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.weight][A
- Loading weights: 35%|███▌ | 69/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.bias] [A
- Loading weights: 35%|███▌ | 69/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.bias][A
- Loading weights: 36%|███▌ | 70/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.weight][A
- Loading weights: 36%|███▌ | 70/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.weight][A
- Loading weights: 36%|███▌ | 71/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.bias] [A
- Loading weights: 36%|███▌ | 71/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.bias][A
- Loading weights: 37%|███▋ | 72/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.weight][A
- Loading weights: 37%|███▋ | 72/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.weight][A
- Loading weights: 37%|███▋ | 73/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.bias] [A
- Loading weights: 37%|███▋ | 73/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.bias][A
- Loading weights: 38%|███▊ | 74/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.weight][A
- Loading weights: 38%|███▊ | 74/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.weight][A
- Loading weights: 38%|███▊ | 75/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.bias][A
- Loading weights: 38%|███▊ | 75/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.bias][A
- Loading weights: 39%|███▉ | 76/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.weight][A
- Loading weights: 39%|███▉ | 76/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.weight][A
- Loading weights: 39%|███▉ | 77/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.bias][A
- Loading weights: 39%|███▉ | 77/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.bias][A
- Loading weights: 40%|███▉ | 78/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.weight][A
- Loading weights: 40%|███▉ | 78/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.weight][A
- Loading weights: 40%|████ | 79/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.bias] [A
- Loading weights: 40%|████ | 79/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.bias][A
- Loading weights: 41%|████ | 80/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.weight][A
- Loading weights: 41%|████ | 80/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.weight][A
- Loading weights: 41%|████▏ | 81/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.bias] [A
- Loading weights: 41%|████▏ | 81/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.bias][A
- Loading weights: 42%|████▏ | 82/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.weight][A
- Loading weights: 42%|████▏ | 82/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.weight][A
- Loading weights: 42%|████▏ | 83/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.bias] [A
- Loading weights: 42%|████▏ | 83/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.bias][A
- Loading weights: 43%|████▎ | 84/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.weight][A
- Loading weights: 43%|████▎ | 84/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.weight][A
- Loading weights: 43%|████▎ | 85/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.bias] [A
- Loading weights: 43%|████▎ | 85/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.bias][A
- Loading weights: 44%|████▍ | 86/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.weight][A
- Loading weights: 44%|████▍ | 86/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.weight][A
- Loading weights: 44%|████▍ | 87/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.bias] [A
- Loading weights: 44%|████▍ | 87/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.bias][A
- Loading weights: 45%|████▍ | 88/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.weight][A
- Loading weights: 45%|████▍ | 88/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.weight][A
- Loading weights: 45%|████▌ | 89/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.bias] [A
- Loading weights: 45%|████▌ | 89/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.bias][A
- Loading weights: 46%|████▌ | 90/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.weight][A
- Loading weights: 46%|████▌ | 90/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.weight][A
- Loading weights: 46%|████▋ | 91/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.bias][A
- Loading weights: 46%|████▋ | 91/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.bias][A
- Loading weights: 47%|████▋ | 92/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.weight][A
- Loading weights: 47%|████▋ | 92/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.weight][A
- Loading weights: 47%|████▋ | 93/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.bias][A
- Loading weights: 47%|████▋ | 93/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.bias][A
- Loading weights: 48%|████▊ | 94/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.weight][A
- Loading weights: 48%|████▊ | 94/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.weight][A
- Loading weights: 48%|████▊ | 95/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.bias] [A
- Loading weights: 48%|████▊ | 95/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.bias][A
- Loading weights: 49%|████▉ | 96/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.weight][A
- Loading weights: 49%|████▉ | 96/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.weight][A
- Loading weights: 49%|████▉ | 97/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.bias] [A
- Loading weights: 49%|████▉ | 97/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.bias][A
- Loading weights: 50%|█████ | 98/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.weight][A
- Loading weights: 50%|█████ | 98/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.weight][A
- Loading weights: 51%|█████ | 99/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.bias] [A
- Loading weights: 51%|█████ | 99/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.bias][A
- Loading weights: 51%|█████ | 100/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.weight][A
- Loading weights: 51%|█████ | 100/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.weight][A
- Loading weights: 52%|█████▏ | 101/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.bias] [A
- Loading weights: 52%|█████▏ | 101/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.bias][A
- Loading weights: 52%|█████▏ | 102/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.weight][A
- Loading weights: 52%|█████▏ | 102/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.weight][A
- Loading weights: 53%|█████▎ | 103/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.bias] [A
- Loading weights: 53%|█████▎ | 103/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.bias][A
- Loading weights: 53%|█████▎ | 104/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.weight][A
- Loading weights: 53%|█████▎ | 104/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.weight][A
- Loading weights: 54%|█████▎ | 105/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.bias] [A
- Loading weights: 54%|█████▎ | 105/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.bias][A
- Loading weights: 54%|█████▍ | 106/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.weight][A
- Loading weights: 54%|█████▍ | 106/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.weight][A
- Loading weights: 55%|█████▍ | 107/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.bias][A
- Loading weights: 55%|█████▍ | 107/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.bias][A
- Loading weights: 55%|█████▌ | 108/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.weight][A
- Loading weights: 55%|█████▌ | 108/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.weight][A
- Loading weights: 56%|█████▌ | 109/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.bias][A
- Loading weights: 56%|█████▌ | 109/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.bias][A
- Loading weights: 56%|█████▌ | 110/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.weight][A
- Loading weights: 56%|█████▌ | 110/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.weight][A
- Loading weights: 57%|█████▋ | 111/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.bias] [A
- Loading weights: 57%|█████▋ | 111/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.bias][A
- Loading weights: 57%|█████▋ | 112/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.weight][A
- Loading weights: 57%|█████▋ | 112/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.weight][A
- Loading weights: 58%|█████▊ | 113/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.bias] [A
- Loading weights: 58%|█████▊ | 113/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.bias][A
- Loading weights: 58%|█████▊ | 114/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.weight][A
- Loading weights: 58%|█████▊ | 114/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.weight][A
- Loading weights: 59%|█████▊ | 115/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.bias] [A
- Loading weights: 59%|█████▊ | 115/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.bias][A
- Loading weights: 59%|█████▉ | 116/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.weight][A
- Loading weights: 59%|█████▉ | 116/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.weight][A
- Loading weights: 60%|█████▉ | 117/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.bias] [A
- Loading weights: 60%|█████▉ | 117/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.bias][A
- Loading weights: 60%|██████ | 118/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.weight][A
- Loading weights: 60%|██████ | 118/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.weight][A
- Loading weights: 61%|██████ | 119/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.bias] [A
- Loading weights: 61%|██████ | 119/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.bias][A
- Loading weights: 61%|██████ | 120/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.weight][A
- Loading weights: 61%|██████ | 120/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.weight][A
- Loading weights: 62%|██████▏ | 121/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.bias] [A
- Loading weights: 62%|██████▏ | 121/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.bias][A
- Loading weights: 62%|██████▏ | 122/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.weight][A
- Loading weights: 62%|██████▏ | 122/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.weight][A
- Loading weights: 63%|██████▎ | 123/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.bias][A
- Loading weights: 63%|██████▎ | 123/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.bias][A
- Loading weights: 63%|██████▎ | 124/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.weight][A
- Loading weights: 63%|██████▎ | 124/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.weight][A
- Loading weights: 64%|██████▍ | 125/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.bias][A
- Loading weights: 64%|██████▍ | 125/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.bias][A
- Loading weights: 64%|██████▍ | 126/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.weight][A
- Loading weights: 64%|██████▍ | 126/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.weight][A
- Loading weights: 65%|██████▍ | 127/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.bias] [A
- Loading weights: 65%|██████▍ | 127/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.bias][A
- Loading weights: 65%|██████▌ | 128/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.weight][A
- Loading weights: 65%|██████▌ | 128/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.weight][A
- Loading weights: 66%|██████▌ | 129/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.bias] [A
- Loading weights: 66%|██████▌ | 129/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.bias][A
- Loading weights: 66%|██████▋ | 130/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.weight][A
- Loading weights: 66%|██████▋ | 130/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.weight][A
- Loading weights: 67%|██████▋ | 131/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.bias] [A
- Loading weights: 67%|██████▋ | 131/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.bias][A
- Loading weights: 67%|██████▋ | 132/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.weight][A
- Loading weights: 67%|██████▋ | 132/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.weight][A
- Loading weights: 68%|██████▊ | 133/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.bias] [A
- Loading weights: 68%|██████▊ | 133/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.bias][A
- Loading weights: 68%|██████▊ | 134/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.weight][A
- Loading weights: 68%|██████▊ | 134/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.weight][A
- Loading weights: 69%|██████▉ | 135/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.bias] [A
- Loading weights: 69%|██████▉ | 135/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.bias][A
- Loading weights: 69%|██████▉ | 136/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.weight][A
- Loading weights: 69%|██████▉ | 136/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.weight][A
- Loading weights: 70%|██████▉ | 137/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.bias] [A
- Loading weights: 70%|██████▉ | 137/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.bias][A
- Loading weights: 70%|███████ | 138/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.weight][A
- Loading weights: 70%|███████ | 138/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.weight][A
- Loading weights: 71%|███████ | 139/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.bias][A
- Loading weights: 71%|███████ | 139/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.bias][A
- Loading weights: 71%|███████▏ | 140/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.weight][A
- Loading weights: 71%|███████▏ | 140/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.weight][A
- Loading weights: 72%|███████▏ | 141/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.bias][A
- Loading weights: 72%|███████▏ | 141/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.bias][A
- Loading weights: 72%|███████▏ | 142/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.weight][A
- Loading weights: 72%|███████▏ | 142/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.weight][A
- Loading weights: 73%|███████▎ | 143/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.bias] [A
- Loading weights: 73%|███████▎ | 143/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.bias][A
- Loading weights: 73%|███████▎ | 144/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.weight][A
- Loading weights: 73%|███████▎ | 144/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.weight][A
- Loading weights: 74%|███████▍ | 145/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.bias] [A
- Loading weights: 74%|███████▍ | 145/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.bias][A
- Loading weights: 74%|███████▍ | 146/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.weight][A
- Loading weights: 74%|███████▍ | 146/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.weight][A
- Loading weights: 75%|███████▌ | 147/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.bias] [A
- Loading weights: 75%|███████▌ | 147/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.bias][A
- Loading weights: 76%|███████▌ | 148/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.weight][A
- Loading weights: 76%|███████▌ | 148/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.weight][A
- Loading weights: 76%|███████▌ | 149/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.bias] [A
- Loading weights: 76%|███████▌ | 149/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.bias][A
- Loading weights: 77%|███████▋ | 150/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.weight][A
- Loading weights: 77%|███████▋ | 150/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.weight][A
- Loading weights: 77%|███████▋ | 151/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.bias] [A
- Loading weights: 77%|███████▋ | 151/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.bias][A
- Loading weights: 78%|███████▊ | 152/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.weight][A
- Loading weights: 78%|███████▊ | 152/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.weight][A
- Loading weights: 78%|███████▊ | 153/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.bias] [A
- Loading weights: 78%|███████▊ | 153/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.bias][A
- Loading weights: 79%|███████▊ | 154/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.weight][A
- Loading weights: 79%|███████▊ | 154/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.weight][A
- Loading weights: 79%|███████▉ | 155/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.bias][A
- Loading weights: 79%|███████▉ | 155/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.bias][A
- Loading weights: 80%|███████▉ | 156/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.weight][A
- Loading weights: 80%|███████▉ | 156/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.weight][A
- Loading weights: 80%|████████ | 157/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.bias][A
- Loading weights: 80%|████████ | 157/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.bias][A
- Loading weights: 81%|████████ | 158/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.weight][A
- Loading weights: 81%|████████ | 158/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.weight][A
- Loading weights: 81%|████████ | 159/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.bias] [A
- Loading weights: 81%|████████ | 159/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.bias][A
- Loading weights: 82%|████████▏ | 160/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.weight][A
- Loading weights: 82%|████████▏ | 160/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.weight][A
- Loading weights: 82%|████████▏ | 161/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.bias] [A
- Loading weights: 82%|████████▏ | 161/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.bias][A
- Loading weights: 83%|████████▎ | 162/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.weight][A
- Loading weights: 83%|████████▎ | 162/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.weight][A
- Loading weights: 83%|████████▎ | 163/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.bias] [A
- Loading weights: 83%|████████▎ | 163/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.bias][A
- Loading weights: 84%|████████▎ | 164/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.weight][A
- Loading weights: 84%|████████▎ | 164/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.weight][A
- Loading weights: 84%|████████▍ | 165/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.bias] [A
- Loading weights: 84%|████████▍ | 165/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.bias][A
- Loading weights: 85%|████████▍ | 166/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.weight][A
- Loading weights: 85%|████████▍ | 166/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.weight][A
- Loading weights: 85%|████████▌ | 167/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.bias] [A
- Loading weights: 85%|████████▌ | 167/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.bias][A
- Loading weights: 86%|████████▌ | 168/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.weight][A
- Loading weights: 86%|████████▌ | 168/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.weight][A
- Loading weights: 86%|████████▌ | 169/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.bias] [A
- Loading weights: 86%|████████▌ | 169/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.bias][A
- Loading weights: 87%|████████▋ | 170/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.weight][A
- Loading weights: 87%|████████▋ | 170/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.weight][A
- Loading weights: 87%|████████▋ | 171/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.bias][A
- Loading weights: 87%|████████▋ | 171/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.bias][A
- Loading weights: 88%|████████▊ | 172/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.weight][A
- Loading weights: 88%|████████▊ | 172/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.weight][A
- Loading weights: 88%|████████▊ | 173/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.bias][A
- Loading weights: 88%|████████▊ | 173/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.bias][A
- Loading weights: 89%|████████▉ | 174/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.weight][A
- Loading weights: 89%|████████▉ | 174/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.weight][A
- Loading weights: 89%|████████▉ | 175/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.bias] [A
- Loading weights: 89%|████████▉ | 175/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.bias][A
- Loading weights: 90%|████████▉ | 176/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.weight][A
- Loading weights: 90%|████████▉ | 176/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.weight][A
- Loading weights: 90%|█████████ | 177/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.bias] [A
- Loading weights: 90%|█████████ | 177/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.bias][A
- Loading weights: 91%|█████████ | 178/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.weight][A
- Loading weights: 91%|█████████ | 178/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.weight][A
- Loading weights: 91%|█████████▏| 179/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.bias] [A
- Loading weights: 91%|█████████▏| 179/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.bias][A
- Loading weights: 92%|█████████▏| 180/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.weight][A
- Loading weights: 92%|█████████▏| 180/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.weight][A
- Loading weights: 92%|█████████▏| 181/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.bias] [A
- Loading weights: 92%|█████████▏| 181/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.bias][A
- Loading weights: 93%|█████████▎| 182/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.weight][A
- Loading weights: 93%|█████████▎| 182/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.weight][A
- Loading weights: 93%|█████████▎| 183/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.bias] [A
- Loading weights: 93%|█████████▎| 183/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.bias][A
- Loading weights: 94%|█████████▍| 184/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.weight][A
- Loading weights: 94%|█████████▍| 184/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.weight][A
- Loading weights: 94%|█████████▍| 185/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.bias] [A
- Loading weights: 94%|█████████▍| 185/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.bias][A
- Loading weights: 95%|█████████▍| 186/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.weight][A
- Loading weights: 95%|█████████▍| 186/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.weight][A
- Loading weights: 95%|█████████▌| 187/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.bias][A
- Loading weights: 95%|█████████▌| 187/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.bias][A
- Loading weights: 96%|█████████▌| 188/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.weight][A
- Loading weights: 96%|█████████▌| 188/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.weight][A
- Loading weights: 96%|█████████▋| 189/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.bias][A
- Loading weights: 96%|█████████▋| 189/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.bias][A
- Loading weights: 97%|█████████▋| 190/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.weight][A
- Loading weights: 97%|█████████▋| 190/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.weight][A
- Loading weights: 97%|█████████▋| 191/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.bias] [A
- Loading weights: 97%|█████████▋| 191/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.bias][A
- Loading weights: 98%|█████████▊| 192/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.weight][A
- Loading weights: 98%|█████████▊| 192/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.weight][A
- Loading weights: 98%|█████████▊| 193/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.bias] [A
- Loading weights: 98%|█████████▊| 193/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.bias][A
- Loading weights: 99%|█████████▉| 194/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.weight][A
- Loading weights: 99%|█████████▉| 194/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.weight][A
- Loading weights: 99%|█████████▉| 195/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.bias] [A
- Loading weights: 99%|█████████▉| 195/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.bias][A
- Loading weights: 100%|██████████| 196/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.weight][A
- Loading weights: 100%|██████████| 196/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.weight][A
- Loading weights: 100%|██████████| 196/196 [00:00<00:00, 1124.60it/s, Materializing param=text_model.final_layer_norm.weight]
- Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00, 6.14it/s]
- Traceback (most recent call last):
- File "/home/sayak/diffusers/check_group_offloading.py", line 16, in <module>
- pipe.transformer.enable_group_offload(
- File "/home/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 571, in enable_group_offload
- apply_group_offloading(
- File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 612, in apply_group_offloading
- _apply_group_offloading(module, config)
- File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 619, in _apply_group_offloading
- _apply_group_offloading_leaf_level(module, config)
- File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 737, in _apply_group_offloading_leaf_level
- group = ModuleGroup(
- ^^^^^^^^^^^^
- File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 119, in __init__
- self.cpu_param_dict = self._init_cpu_param_dict()
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 134, in _init_cpu_param_dict
- cpu_param_dict[param] = param.data.cpu() if self.low_cpu_mem_usage else param.data.cpu().pin_memory()
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- File "/home/sayak/ao/torchao/utils.py", line 684, in _dispatch__torch_dispatch__
- raise NotImplementedError(
- NotImplementedError: NVFP4Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.is_pinned', overload='default')>, types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>,), arg_types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>,), kwarg_types={}
Advertisement
Add Comment
Please, Sign In to add comment