Guest User

Untitled

a guest
Mar 9th, 2026
28
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 127.71 KB | None | 0 0
  1.  
  2. Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]
  3. Loading pipeline components...: 43%|████▎ | 3/7 [00:00<00:00, 24.21it/s]
  4.  
  5. Loading weights: 0%| | 0/219 [00:00<?, ?it/s]
  6.  
  7. Loading weights: 0%| | 1/219 [00:00<00:00, 3387.97it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.k.weight]
  8.  
  9. Loading weights: 0%| | 1/219 [00:00<00:00, 1611.95it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.k.weight]
  10.  
  11. Loading weights: 1%| | 2/219 [00:00<00:00, 1119.08it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.o.weight]
  12.  
  13. Loading weights: 1%| | 2/219 [00:00<00:00, 943.07it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.o.weight] 
  14.  
  15. Loading weights: 1%|▏ | 3/219 [00:00<00:00, 439.55it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.q.weight]
  16.  
  17. Loading weights: 1%|▏ | 3/219 [00:00<00:00, 413.19it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.q.weight]
  18.  
  19. Loading weights: 2%|▏ | 4/219 [00:00<00:00, 503.81it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight]
  20.  
  21. Loading weights: 2%|▏ | 4/219 [00:00<00:00, 489.80it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight]
  22.  
  23. Loading weights: 2%|▏ | 5/219 [00:00<00:00, 515.03it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.v.weight] 
  24.  
  25. Loading weights: 2%|▏ | 5/219 [00:00<00:00, 498.34it/s, Materializing param=encoder.block.0.layer.0.SelfAttention.v.weight]
  26.  
  27. Loading weights: 3%|▎ | 6/219 [00:00<00:00, 562.91it/s, Materializing param=encoder.block.0.layer.0.layer_norm.weight] 
  28.  
  29. Loading weights: 3%|▎ | 6/219 [00:00<00:00, 555.50it/s, Materializing param=encoder.block.0.layer.0.layer_norm.weight]
  30.  
  31. Loading weights: 3%|▎ | 7/219 [00:00<00:00, 637.39it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_0.weight]
  32.  
  33. Loading weights: 3%|▎ | 7/219 [00:00<00:00, 633.12it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_0.weight]
  34.  
  35. Loading weights: 4%|▎ | 8/219 [00:00<00:00, 704.04it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_1.weight]
  36.  
  37. Loading weights: 4%|▎ | 8/219 [00:00<00:00, 699.40it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wi_1.weight]
  38.  
  39. Loading weights: 4%|▍ | 9/219 [00:00<00:00, 776.26it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wo.weight] 
  40.  
  41. Loading weights: 4%|▍ | 9/219 [00:00<00:00, 771.77it/s, Materializing param=encoder.block.0.layer.1.DenseReluDense.wo.weight]
  42.  
  43. Loading weights: 5%|▍ | 10/219 [00:00<00:00, 444.82it/s, Materializing param=encoder.block.0.layer.1.layer_norm.weight] 
  44.  
  45. Loading weights: 5%|▍ | 10/219 [00:00<00:00, 441.45it/s, Materializing param=encoder.block.0.layer.1.layer_norm.weight]
  46.  
  47. Loading weights: 5%|▌ | 11/219 [00:00<00:00, 481.67it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.k.weight]
  48.  
  49. Loading weights: 5%|▌ | 11/219 [00:00<00:00, 480.10it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.k.weight]
  50.  
  51. Loading weights: 5%|▌ | 12/219 [00:00<00:00, 516.27it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.o.weight]
  52.  
  53. Loading weights: 5%|▌ | 12/219 [00:00<00:00, 514.66it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.o.weight]
  54.  
  55. Loading weights: 6%|▌ | 13/219 [00:00<00:00, 553.21it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.q.weight]
  56.  
  57. Loading weights: 6%|▌ | 13/219 [00:00<00:00, 551.56it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.q.weight]
  58.  
  59. Loading weights: 6%|▋ | 14/219 [00:00<00:00, 590.45it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.v.weight]
  60.  
  61. Loading weights: 6%|▋ | 14/219 [00:00<00:00, 588.87it/s, Materializing param=encoder.block.1.layer.0.SelfAttention.v.weight]
  62.  
  63. Loading weights: 7%|▋ | 15/219 [00:00<00:00, 627.49it/s, Materializing param=encoder.block.1.layer.0.layer_norm.weight] 
  64.  
  65. Loading weights: 7%|▋ | 15/219 [00:00<00:00, 625.63it/s, Materializing param=encoder.block.1.layer.0.layer_norm.weight]
  66.  
  67. Loading weights: 7%|▋ | 16/219 [00:00<00:00, 663.70it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_0.weight]
  68.  
  69. Loading weights: 7%|▋ | 16/219 [00:00<00:00, 661.53it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_0.weight]
  70.  
  71. Loading weights: 8%|▊ | 17/219 [00:00<00:00, 699.15it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_1.weight]
  72.  
  73. Loading weights: 8%|▊ | 17/219 [00:00<00:00, 696.99it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wi_1.weight]
  74.  
  75. Loading weights: 8%|▊ | 18/219 [00:00<00:00, 734.26it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight] 
  76.  
  77. Loading weights: 8%|▊ | 18/219 [00:00<00:00, 732.27it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight]
  78.  
  79. Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.DenseReluDense.wo.weight]
  80.  
  81. Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.layer_norm.weight] 
  82.  
  83. Loading weights: 9%|▊ | 19/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.1.layer.1.layer_norm.weight]
  84.  
  85. Loading weights: 9%|▉ | 20/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.k.weight]
  86.  
  87. Loading weights: 9%|▉ | 20/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.k.weight]
  88.  
  89. Loading weights: 10%|▉ | 21/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.o.weight]
  90.  
  91. Loading weights: 10%|▉ | 21/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.o.weight]
  92.  
  93. Loading weights: 10%|█ | 22/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.q.weight]
  94.  
  95. Loading weights: 10%|█ | 22/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.q.weight]
  96.  
  97. Loading weights: 11%|█ | 23/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.v.weight]
  98.  
  99. Loading weights: 11%|█ | 23/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.SelfAttention.v.weight]
  100.  
  101. Loading weights: 11%|█ | 24/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.layer_norm.weight] 
  102.  
  103. Loading weights: 11%|█ | 24/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.0.layer_norm.weight]
  104.  
  105. Loading weights: 11%|█▏ | 25/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_0.weight]
  106.  
  107. Loading weights: 11%|█▏ | 25/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_0.weight]
  108.  
  109. Loading weights: 12%|█▏ | 26/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_1.weight]
  110.  
  111. Loading weights: 12%|█▏ | 26/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wi_1.weight]
  112.  
  113. Loading weights: 12%|█▏ | 27/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wo.weight] 
  114.  
  115. Loading weights: 12%|█▏ | 27/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.DenseReluDense.wo.weight]
  116.  
  117. Loading weights: 13%|█▎ | 28/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.layer_norm.weight] 
  118.  
  119. Loading weights: 13%|█▎ | 28/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.2.layer.1.layer_norm.weight]
  120.  
  121. Loading weights: 13%|█▎ | 29/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.k.weight]
  122.  
  123. Loading weights: 13%|█▎ | 29/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.k.weight]
  124.  
  125. Loading weights: 14%|█▎ | 30/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.o.weight]
  126.  
  127. Loading weights: 14%|█▎ | 30/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.o.weight]
  128.  
  129. Loading weights: 14%|█▍ | 31/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.q.weight]
  130.  
  131. Loading weights: 14%|█▍ | 31/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.q.weight]
  132.  
  133. Loading weights: 15%|█▍ | 32/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.v.weight]
  134.  
  135. Loading weights: 15%|█▍ | 32/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.SelfAttention.v.weight]
  136.  
  137. Loading weights: 15%|█▌ | 33/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.layer_norm.weight] 
  138.  
  139. Loading weights: 15%|█▌ | 33/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.0.layer_norm.weight]
  140.  
  141. Loading weights: 16%|█▌ | 34/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_0.weight]
  142.  
  143. Loading weights: 16%|█▌ | 34/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_0.weight]
  144.  
  145. Loading weights: 16%|█▌ | 35/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_1.weight]
  146.  
  147. Loading weights: 16%|█▌ | 35/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wi_1.weight]
  148.  
  149. Loading weights: 16%|█▋ | 36/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wo.weight] 
  150.  
  151. Loading weights: 16%|█▋ | 36/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.DenseReluDense.wo.weight]
  152.  
  153. Loading weights: 17%|█▋ | 37/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.layer_norm.weight] 
  154.  
  155. Loading weights: 17%|█▋ | 37/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.3.layer.1.layer_norm.weight]
  156.  
  157. Loading weights: 17%|█▋ | 38/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.k.weight]
  158.  
  159. Loading weights: 17%|█▋ | 38/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.k.weight]
  160.  
  161. Loading weights: 18%|█▊ | 39/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.o.weight]
  162.  
  163. Loading weights: 18%|█▊ | 39/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.o.weight]
  164.  
  165. Loading weights: 18%|█▊ | 40/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.q.weight]
  166.  
  167. Loading weights: 18%|█▊ | 40/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.q.weight]
  168.  
  169. Loading weights: 19%|█▊ | 41/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.v.weight]
  170.  
  171. Loading weights: 19%|█▊ | 41/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.SelfAttention.v.weight]
  172.  
  173. Loading weights: 19%|█▉ | 42/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.layer_norm.weight] 
  174.  
  175. Loading weights: 19%|█▉ | 42/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.0.layer_norm.weight]
  176.  
  177. Loading weights: 20%|█▉ | 43/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_0.weight]
  178.  
  179. Loading weights: 20%|█▉ | 43/219 [00:00<00:01, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_0.weight]
  180.  
  181. Loading weights: 20%|██ | 44/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_1.weight]
  182.  
  183. Loading weights: 20%|██ | 44/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wi_1.weight]
  184.  
  185. Loading weights: 21%|██ | 45/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wo.weight] 
  186.  
  187. Loading weights: 21%|██ | 45/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.DenseReluDense.wo.weight]
  188.  
  189. Loading weights: 21%|██ | 46/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.layer_norm.weight] 
  190.  
  191. Loading weights: 21%|██ | 46/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.4.layer.1.layer_norm.weight]
  192.  
  193. Loading weights: 21%|██▏ | 47/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.k.weight]
  194.  
  195. Loading weights: 21%|██▏ | 47/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.k.weight]
  196.  
  197. Loading weights: 22%|██▏ | 48/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.o.weight]
  198.  
  199. Loading weights: 22%|██▏ | 48/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.o.weight]
  200.  
  201. Loading weights: 22%|██▏ | 49/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.q.weight]
  202.  
  203. Loading weights: 22%|██▏ | 49/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.q.weight]
  204.  
  205. Loading weights: 23%|██▎ | 50/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.v.weight]
  206.  
  207. Loading weights: 23%|██▎ | 50/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.SelfAttention.v.weight]
  208.  
  209. Loading weights: 23%|██▎ | 51/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.layer_norm.weight] 
  210.  
  211. Loading weights: 23%|██▎ | 51/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.0.layer_norm.weight]
  212.  
  213. Loading weights: 24%|██▎ | 52/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_0.weight]
  214.  
  215. Loading weights: 24%|██▎ | 52/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_0.weight]
  216.  
  217. Loading weights: 24%|██▍ | 53/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_1.weight]
  218.  
  219. Loading weights: 24%|██▍ | 53/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wi_1.weight]
  220.  
  221. Loading weights: 25%|██▍ | 54/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wo.weight] 
  222.  
  223. Loading weights: 25%|██▍ | 54/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.DenseReluDense.wo.weight]
  224.  
  225. Loading weights: 25%|██▌ | 55/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.layer_norm.weight] 
  226.  
  227. Loading weights: 25%|██▌ | 55/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.5.layer.1.layer_norm.weight]
  228.  
  229. Loading weights: 26%|██▌ | 56/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.k.weight]
  230.  
  231. Loading weights: 26%|██▌ | 56/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.k.weight]
  232.  
  233. Loading weights: 26%|██▌ | 57/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.o.weight]
  234.  
  235. Loading weights: 26%|██▌ | 57/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.o.weight]
  236.  
  237. Loading weights: 26%|██▋ | 58/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.q.weight]
  238.  
  239. Loading weights: 26%|██▋ | 58/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.q.weight]
  240.  
  241. Loading weights: 27%|██▋ | 59/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.v.weight]
  242.  
  243. Loading weights: 27%|██▋ | 59/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.SelfAttention.v.weight]
  244.  
  245. Loading weights: 27%|██▋ | 60/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.layer_norm.weight] 
  246.  
  247. Loading weights: 27%|██▋ | 60/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.0.layer_norm.weight]
  248.  
  249. Loading weights: 28%|██▊ | 61/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_0.weight]
  250.  
  251. Loading weights: 28%|██▊ | 61/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_0.weight]
  252.  
  253. Loading weights: 28%|██▊ | 62/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_1.weight]
  254.  
  255. Loading weights: 28%|██▊ | 62/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wi_1.weight]
  256.  
  257. Loading weights: 29%|██▉ | 63/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wo.weight] 
  258.  
  259. Loading weights: 29%|██▉ | 63/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.DenseReluDense.wo.weight]
  260.  
  261. Loading weights: 29%|██▉ | 64/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.layer_norm.weight] 
  262.  
  263. Loading weights: 29%|██▉ | 64/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.6.layer.1.layer_norm.weight]
  264.  
  265. Loading weights: 30%|██▉ | 65/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.k.weight]
  266.  
  267. Loading weights: 30%|██▉ | 65/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.k.weight]
  268.  
  269. Loading weights: 30%|███ | 66/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.o.weight]
  270.  
  271. Loading weights: 30%|███ | 66/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.o.weight]
  272.  
  273. Loading weights: 31%|███ | 67/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.q.weight]
  274.  
  275. Loading weights: 31%|███ | 67/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.q.weight]
  276.  
  277. Loading weights: 31%|███ | 68/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.v.weight]
  278.  
  279. Loading weights: 31%|███ | 68/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.SelfAttention.v.weight]
  280.  
  281. Loading weights: 32%|███▏ | 69/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.layer_norm.weight] 
  282.  
  283. Loading weights: 32%|███▏ | 69/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.0.layer_norm.weight]
  284.  
  285. Loading weights: 32%|███▏ | 70/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_0.weight]
  286.  
  287. Loading weights: 32%|███▏ | 70/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_0.weight]
  288.  
  289. Loading weights: 32%|███▏ | 71/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_1.weight]
  290.  
  291. Loading weights: 32%|███▏ | 71/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wi_1.weight]
  292.  
  293. Loading weights: 33%|███▎ | 72/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wo.weight] 
  294.  
  295. Loading weights: 33%|███▎ | 72/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.DenseReluDense.wo.weight]
  296.  
  297. Loading weights: 33%|███▎ | 73/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.layer_norm.weight] 
  298.  
  299. Loading weights: 33%|███▎ | 73/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.7.layer.1.layer_norm.weight]
  300.  
  301. Loading weights: 34%|███▍ | 74/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.k.weight]
  302.  
  303. Loading weights: 34%|███▍ | 74/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.k.weight]
  304.  
  305. Loading weights: 34%|███▍ | 75/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.o.weight]
  306.  
  307. Loading weights: 34%|███▍ | 75/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.o.weight]
  308.  
  309. Loading weights: 35%|███▍ | 76/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.q.weight]
  310.  
  311. Loading weights: 35%|███▍ | 76/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.q.weight]
  312.  
  313. Loading weights: 35%|███▌ | 77/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.v.weight]
  314.  
  315. Loading weights: 35%|███▌ | 77/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.SelfAttention.v.weight]
  316.  
  317. Loading weights: 36%|███▌ | 78/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.layer_norm.weight] 
  318.  
  319. Loading weights: 36%|███▌ | 78/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.0.layer_norm.weight]
  320.  
  321. Loading weights: 36%|███▌ | 79/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_0.weight]
  322.  
  323. Loading weights: 36%|███▌ | 79/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_0.weight]
  324.  
  325. Loading weights: 37%|███▋ | 80/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_1.weight]
  326.  
  327. Loading weights: 37%|███▋ | 80/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wi_1.weight]
  328.  
  329. Loading weights: 37%|███▋ | 81/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wo.weight] 
  330.  
  331. Loading weights: 37%|███▋ | 81/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.DenseReluDense.wo.weight]
  332.  
  333. Loading weights: 37%|███▋ | 82/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.layer_norm.weight] 
  334.  
  335. Loading weights: 37%|███▋ | 82/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.8.layer.1.layer_norm.weight]
  336.  
  337. Loading weights: 38%|███▊ | 83/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.k.weight]
  338.  
  339. Loading weights: 38%|███▊ | 83/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.k.weight]
  340.  
  341. Loading weights: 38%|███▊ | 84/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.o.weight]
  342.  
  343. Loading weights: 38%|███▊ | 84/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.o.weight]
  344.  
  345. Loading weights: 39%|███▉ | 85/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.q.weight]
  346.  
  347. Loading weights: 39%|███▉ | 85/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.q.weight]
  348.  
  349. Loading weights: 39%|███▉ | 86/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.v.weight]
  350.  
  351. Loading weights: 39%|███▉ | 86/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.SelfAttention.v.weight]
  352.  
  353. Loading weights: 40%|███▉ | 87/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.layer_norm.weight] 
  354.  
  355. Loading weights: 40%|███▉ | 87/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.0.layer_norm.weight]
  356.  
  357. Loading weights: 40%|████ | 88/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_0.weight]
  358.  
  359. Loading weights: 40%|████ | 88/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_0.weight]
  360.  
  361. Loading weights: 41%|████ | 89/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_1.weight]
  362.  
  363. Loading weights: 41%|████ | 89/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wi_1.weight]
  364.  
  365. Loading weights: 41%|████ | 90/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wo.weight] 
  366.  
  367. Loading weights: 41%|████ | 90/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.DenseReluDense.wo.weight]
  368.  
  369. Loading weights: 42%|████▏ | 91/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.layer_norm.weight] 
  370.  
  371. Loading weights: 42%|████▏ | 91/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.9.layer.1.layer_norm.weight]
  372.  
  373. Loading weights: 42%|████▏ | 92/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.k.weight]
  374.  
  375. Loading weights: 42%|████▏ | 92/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.k.weight]
  376.  
  377. Loading weights: 42%|████▏ | 93/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.o.weight]
  378.  
  379. Loading weights: 42%|████▏ | 93/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.o.weight]
  380.  
  381. Loading weights: 43%|████▎ | 94/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.q.weight]
  382.  
  383. Loading weights: 43%|████▎ | 94/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.q.weight]
  384.  
  385. Loading weights: 43%|████▎ | 95/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.v.weight]
  386.  
  387. Loading weights: 43%|████▎ | 95/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.SelfAttention.v.weight]
  388.  
  389. Loading weights: 44%|████▍ | 96/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.layer_norm.weight] 
  390.  
  391. Loading weights: 44%|████▍ | 96/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.0.layer_norm.weight]
  392.  
  393. Loading weights: 44%|████▍ | 97/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_0.weight]
  394.  
  395. Loading weights: 44%|████▍ | 97/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_0.weight]
  396.  
  397. Loading weights: 45%|████▍ | 98/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_1.weight]
  398.  
  399. Loading weights: 45%|████▍ | 98/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wi_1.weight]
  400.  
  401. Loading weights: 45%|████▌ | 99/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wo.weight] 
  402.  
  403. Loading weights: 45%|████▌ | 99/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.DenseReluDense.wo.weight]
  404.  
  405. Loading weights: 46%|████▌ | 100/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.layer_norm.weight] 
  406.  
  407. Loading weights: 46%|████▌ | 100/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.10.layer.1.layer_norm.weight]
  408.  
  409. Loading weights: 46%|████▌ | 101/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.k.weight]
  410.  
  411. Loading weights: 46%|████▌ | 101/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.k.weight]
  412.  
  413. Loading weights: 47%|████▋ | 102/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.o.weight]
  414.  
  415. Loading weights: 47%|████▋ | 102/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.o.weight]
  416.  
  417. Loading weights: 47%|████▋ | 103/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.q.weight]
  418.  
  419. Loading weights: 47%|████▋ | 103/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.q.weight]
  420.  
  421. Loading weights: 47%|████▋ | 104/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.v.weight]
  422.  
  423. Loading weights: 47%|████▋ | 104/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.SelfAttention.v.weight]
  424.  
  425. Loading weights: 48%|████▊ | 105/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.layer_norm.weight] 
  426.  
  427. Loading weights: 48%|████▊ | 105/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.0.layer_norm.weight]
  428.  
  429. Loading weights: 48%|████▊ | 106/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_0.weight]
  430.  
  431. Loading weights: 48%|████▊ | 106/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_0.weight]
  432.  
  433. Loading weights: 49%|████▉ | 107/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_1.weight]
  434.  
  435. Loading weights: 49%|████▉ | 107/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wi_1.weight]
  436.  
  437. Loading weights: 49%|████▉ | 108/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight] 
  438.  
  439. Loading weights: 49%|████▉ | 108/219 [00:00<00:00, 175.35it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight]
  440.  
  441. Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.DenseReluDense.wo.weight]
  442.  
  443. Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.layer_norm.weight] 
  444.  
  445. Loading weights: 50%|████▉ | 109/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.11.layer.1.layer_norm.weight]
  446.  
  447. Loading weights: 50%|█████ | 110/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.k.weight]
  448.  
  449. Loading weights: 50%|█████ | 110/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.k.weight]
  450.  
  451. Loading weights: 51%|█████ | 111/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.o.weight]
  452.  
  453. Loading weights: 51%|█████ | 111/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.o.weight]
  454.  
  455. Loading weights: 51%|█████ | 112/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.q.weight]
  456.  
  457. Loading weights: 51%|█████ | 112/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.q.weight]
  458.  
  459. Loading weights: 52%|█████▏ | 113/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.v.weight]
  460.  
  461. Loading weights: 52%|█████▏ | 113/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.SelfAttention.v.weight]
  462.  
  463. Loading weights: 52%|█████▏ | 114/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.layer_norm.weight] 
  464.  
  465. Loading weights: 52%|█████▏ | 114/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.0.layer_norm.weight]
  466.  
  467. Loading weights: 53%|█████▎ | 115/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_0.weight]
  468.  
  469. Loading weights: 53%|█████▎ | 115/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_0.weight]
  470.  
  471. Loading weights: 53%|█████▎ | 116/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_1.weight]
  472.  
  473. Loading weights: 53%|█████▎ | 116/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wi_1.weight]
  474.  
  475. Loading weights: 53%|█████▎ | 117/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wo.weight] 
  476.  
  477. Loading weights: 53%|█████▎ | 117/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.DenseReluDense.wo.weight]
  478.  
  479. Loading weights: 54%|█████▍ | 118/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.layer_norm.weight] 
  480.  
  481. Loading weights: 54%|█████▍ | 118/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.12.layer.1.layer_norm.weight]
  482.  
  483. Loading weights: 54%|█████▍ | 119/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.k.weight]
  484.  
  485. Loading weights: 54%|█████▍ | 119/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.k.weight]
  486.  
  487. Loading weights: 55%|█████▍ | 120/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.o.weight]
  488.  
  489. Loading weights: 55%|█████▍ | 120/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.o.weight]
  490.  
  491. Loading weights: 55%|█████▌ | 121/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.q.weight]
  492.  
  493. Loading weights: 55%|█████▌ | 121/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.q.weight]
  494.  
  495. Loading weights: 56%|█████▌ | 122/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.v.weight]
  496.  
  497. Loading weights: 56%|█████▌ | 122/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.SelfAttention.v.weight]
  498.  
  499. Loading weights: 56%|█████▌ | 123/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.layer_norm.weight] 
  500.  
  501. Loading weights: 56%|█████▌ | 123/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.0.layer_norm.weight]
  502.  
  503. Loading weights: 57%|█████▋ | 124/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_0.weight]
  504.  
  505. Loading weights: 57%|█████▋ | 124/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_0.weight]
  506.  
  507. Loading weights: 57%|█████▋ | 125/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_1.weight]
  508.  
  509. Loading weights: 57%|█████▋ | 125/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wi_1.weight]
  510.  
  511. Loading weights: 58%|█████▊ | 126/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wo.weight] 
  512.  
  513. Loading weights: 58%|█████▊ | 126/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.DenseReluDense.wo.weight]
  514.  
  515. Loading weights: 58%|█████▊ | 127/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.layer_norm.weight] 
  516.  
  517. Loading weights: 58%|█████▊ | 127/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.13.layer.1.layer_norm.weight]
  518.  
  519. Loading weights: 58%|█████▊ | 128/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.k.weight]
  520.  
  521. Loading weights: 58%|█████▊ | 128/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.k.weight]
  522.  
  523. Loading weights: 59%|█████▉ | 129/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.o.weight]
  524.  
  525. Loading weights: 59%|█████▉ | 129/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.o.weight]
  526.  
  527. Loading weights: 59%|█████▉ | 130/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.q.weight]
  528.  
  529. Loading weights: 59%|█████▉ | 130/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.q.weight]
  530.  
  531. Loading weights: 60%|█████▉ | 131/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.v.weight]
  532.  
  533. Loading weights: 60%|█████▉ | 131/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.SelfAttention.v.weight]
  534.  
  535. Loading weights: 60%|██████ | 132/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.layer_norm.weight] 
  536.  
  537. Loading weights: 60%|██████ | 132/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.0.layer_norm.weight]
  538.  
  539. Loading weights: 61%|██████ | 133/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_0.weight]
  540.  
  541. Loading weights: 61%|██████ | 133/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_0.weight]
  542.  
  543. Loading weights: 61%|██████ | 134/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_1.weight]
  544.  
  545. Loading weights: 61%|██████ | 134/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wi_1.weight]
  546.  
  547. Loading weights: 62%|██████▏ | 135/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wo.weight] 
  548.  
  549. Loading weights: 62%|██████▏ | 135/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.DenseReluDense.wo.weight]
  550.  
  551. Loading weights: 62%|██████▏ | 136/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.layer_norm.weight] 
  552.  
  553. Loading weights: 62%|██████▏ | 136/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.14.layer.1.layer_norm.weight]
  554.  
  555. Loading weights: 63%|██████▎ | 137/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.k.weight]
  556.  
  557. Loading weights: 63%|██████▎ | 137/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.k.weight]
  558.  
  559. Loading weights: 63%|██████▎ | 138/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.o.weight]
  560.  
  561. Loading weights: 63%|██████▎ | 138/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.o.weight]
  562.  
  563. Loading weights: 63%|██████▎ | 139/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.q.weight]
  564.  
  565. Loading weights: 63%|██████▎ | 139/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.q.weight]
  566.  
  567. Loading weights: 64%|██████▍ | 140/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.v.weight]
  568.  
  569. Loading weights: 64%|██████▍ | 140/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.SelfAttention.v.weight]
  570.  
  571. Loading weights: 64%|██████▍ | 141/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.layer_norm.weight] 
  572.  
  573. Loading weights: 64%|██████▍ | 141/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.0.layer_norm.weight]
  574.  
  575. Loading weights: 65%|██████▍ | 142/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_0.weight]
  576.  
  577. Loading weights: 65%|██████▍ | 142/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_0.weight]
  578.  
  579. Loading weights: 65%|██████▌ | 143/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_1.weight]
  580.  
  581. Loading weights: 65%|██████▌ | 143/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wi_1.weight]
  582.  
  583. Loading weights: 66%|██████▌ | 144/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wo.weight] 
  584.  
  585. Loading weights: 66%|██████▌ | 144/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.DenseReluDense.wo.weight]
  586.  
  587. Loading weights: 66%|██████▌ | 145/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.layer_norm.weight] 
  588.  
  589. Loading weights: 66%|██████▌ | 145/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.15.layer.1.layer_norm.weight]
  590.  
  591. Loading weights: 67%|██████▋ | 146/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.k.weight]
  592.  
  593. Loading weights: 67%|██████▋ | 146/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.k.weight]
  594.  
  595. Loading weights: 67%|██████▋ | 147/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.o.weight]
  596.  
  597. Loading weights: 67%|██████▋ | 147/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.o.weight]
  598.  
  599. Loading weights: 68%|██████▊ | 148/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.q.weight]
  600.  
  601. Loading weights: 68%|██████▊ | 148/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.q.weight]
  602.  
  603. Loading weights: 68%|██████▊ | 149/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.v.weight]
  604.  
  605. Loading weights: 68%|██████▊ | 149/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.SelfAttention.v.weight]
  606.  
  607. Loading weights: 68%|██████▊ | 150/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.layer_norm.weight] 
  608.  
  609. Loading weights: 68%|██████▊ | 150/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.0.layer_norm.weight]
  610.  
  611. Loading weights: 69%|██████▉ | 151/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_0.weight]
  612.  
  613. Loading weights: 69%|██████▉ | 151/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_0.weight]
  614.  
  615. Loading weights: 69%|██████▉ | 152/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_1.weight]
  616.  
  617. Loading weights: 69%|██████▉ | 152/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wi_1.weight]
  618.  
  619. Loading weights: 70%|██████▉ | 153/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wo.weight] 
  620.  
  621. Loading weights: 70%|██████▉ | 153/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.DenseReluDense.wo.weight]
  622.  
  623. Loading weights: 70%|███████ | 154/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.layer_norm.weight] 
  624.  
  625. Loading weights: 70%|███████ | 154/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.16.layer.1.layer_norm.weight]
  626.  
  627. Loading weights: 71%|███████ | 155/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.k.weight]
  628.  
  629. Loading weights: 71%|███████ | 155/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.k.weight]
  630.  
  631. Loading weights: 71%|███████ | 156/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.o.weight]
  632.  
  633. Loading weights: 71%|███████ | 156/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.o.weight]
  634.  
  635. Loading weights: 72%|███████▏ | 157/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.q.weight]
  636.  
  637. Loading weights: 72%|███████▏ | 157/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.q.weight]
  638.  
  639. Loading weights: 72%|███████▏ | 158/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.v.weight]
  640.  
  641. Loading weights: 72%|███████▏ | 158/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.SelfAttention.v.weight]
  642.  
  643. Loading weights: 73%|███████▎ | 159/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.layer_norm.weight] 
  644.  
  645. Loading weights: 73%|███████▎ | 159/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.0.layer_norm.weight]
  646.  
  647. Loading weights: 73%|███████▎ | 160/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_0.weight]
  648.  
  649. Loading weights: 73%|███████▎ | 160/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_0.weight]
  650.  
  651. Loading weights: 74%|███████▎ | 161/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_1.weight]
  652.  
  653. Loading weights: 74%|███████▎ | 161/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wi_1.weight]
  654.  
  655. Loading weights: 74%|███████▍ | 162/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wo.weight] 
  656.  
  657. Loading weights: 74%|███████▍ | 162/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.DenseReluDense.wo.weight]
  658.  
  659. Loading weights: 74%|███████▍ | 163/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.layer_norm.weight] 
  660.  
  661. Loading weights: 74%|███████▍ | 163/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.17.layer.1.layer_norm.weight]
  662.  
  663. Loading weights: 75%|███████▍ | 164/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.k.weight]
  664.  
  665. Loading weights: 75%|███████▍ | 164/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.k.weight]
  666.  
  667. Loading weights: 75%|███████▌ | 165/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.o.weight]
  668.  
  669. Loading weights: 75%|███████▌ | 165/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.o.weight]
  670.  
  671. Loading weights: 76%|███████▌ | 166/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.q.weight]
  672.  
  673. Loading weights: 76%|███████▌ | 166/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.q.weight]
  674.  
  675. Loading weights: 76%|███████▋ | 167/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.v.weight]
  676.  
  677. Loading weights: 76%|███████▋ | 167/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.SelfAttention.v.weight]
  678.  
  679. Loading weights: 77%|███████▋ | 168/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.layer_norm.weight] 
  680.  
  681. Loading weights: 77%|███████▋ | 168/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.0.layer_norm.weight]
  682.  
  683. Loading weights: 77%|███████▋ | 169/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_0.weight]
  684.  
  685. Loading weights: 77%|███████▋ | 169/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_0.weight]
  686.  
  687. Loading weights: 78%|███████▊ | 170/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_1.weight]
  688.  
  689. Loading weights: 78%|███████▊ | 170/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wi_1.weight]
  690.  
  691. Loading weights: 78%|███████▊ | 171/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wo.weight] 
  692.  
  693. Loading weights: 78%|███████▊ | 171/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.DenseReluDense.wo.weight]
  694.  
  695. Loading weights: 79%|███████▊ | 172/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.layer_norm.weight] 
  696.  
  697. Loading weights: 79%|███████▊ | 172/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.18.layer.1.layer_norm.weight]
  698.  
  699. Loading weights: 79%|███████▉ | 173/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.k.weight]
  700.  
  701. Loading weights: 79%|███████▉ | 173/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.k.weight]
  702.  
  703. Loading weights: 79%|███████▉ | 174/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.o.weight]
  704.  
  705. Loading weights: 79%|███████▉ | 174/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.o.weight]
  706.  
  707. Loading weights: 80%|███████▉ | 175/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.q.weight]
  708.  
  709. Loading weights: 80%|███████▉ | 175/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.q.weight]
  710.  
  711. Loading weights: 80%|████████ | 176/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.v.weight]
  712.  
  713. Loading weights: 80%|████████ | 176/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.SelfAttention.v.weight]
  714.  
  715. Loading weights: 81%|████████ | 177/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.layer_norm.weight] 
  716.  
  717. Loading weights: 81%|████████ | 177/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.0.layer_norm.weight]
  718.  
  719. Loading weights: 81%|████████▏ | 178/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_0.weight]
  720.  
  721. Loading weights: 81%|████████▏ | 178/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_0.weight]
  722.  
  723. Loading weights: 82%|████████▏ | 179/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_1.weight]
  724.  
  725. Loading weights: 82%|████████▏ | 179/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wi_1.weight]
  726.  
  727. Loading weights: 82%|████████▏ | 180/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wo.weight] 
  728.  
  729. Loading weights: 82%|████████▏ | 180/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.DenseReluDense.wo.weight]
  730.  
  731. Loading weights: 83%|████████▎ | 181/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.layer_norm.weight] 
  732.  
  733. Loading weights: 83%|████████▎ | 181/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.19.layer.1.layer_norm.weight]
  734.  
  735. Loading weights: 83%|████████▎ | 182/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.k.weight]
  736.  
  737. Loading weights: 83%|████████▎ | 182/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.k.weight]
  738.  
  739. Loading weights: 84%|████████▎ | 183/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.o.weight]
  740.  
  741. Loading weights: 84%|████████▎ | 183/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.o.weight]
  742.  
  743. Loading weights: 84%|████████▍ | 184/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.q.weight]
  744.  
  745. Loading weights: 84%|████████▍ | 184/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.q.weight]
  746.  
  747. Loading weights: 84%|████████▍ | 185/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.v.weight]
  748.  
  749. Loading weights: 84%|████████▍ | 185/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.SelfAttention.v.weight]
  750.  
  751. Loading weights: 85%|████████▍ | 186/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.layer_norm.weight] 
  752.  
  753. Loading weights: 85%|████████▍ | 186/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.0.layer_norm.weight]
  754.  
  755. Loading weights: 85%|████████▌ | 187/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_0.weight]
  756.  
  757. Loading weights: 85%|████████▌ | 187/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_0.weight]
  758.  
  759. Loading weights: 86%|████████▌ | 188/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_1.weight]
  760.  
  761. Loading weights: 86%|████████▌ | 188/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wi_1.weight]
  762.  
  763. Loading weights: 86%|████████▋ | 189/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wo.weight] 
  764.  
  765. Loading weights: 86%|████████▋ | 189/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.DenseReluDense.wo.weight]
  766.  
  767. Loading weights: 87%|████████▋ | 190/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.layer_norm.weight] 
  768.  
  769. Loading weights: 87%|████████▋ | 190/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.20.layer.1.layer_norm.weight]
  770.  
  771. Loading weights: 87%|████████▋ | 191/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.k.weight]
  772.  
  773. Loading weights: 87%|████████▋ | 191/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.k.weight]
  774.  
  775. Loading weights: 88%|████████▊ | 192/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.o.weight]
  776.  
  777. Loading weights: 88%|████████▊ | 192/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.o.weight]
  778.  
  779. Loading weights: 88%|████████▊ | 193/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.q.weight]
  780.  
  781. Loading weights: 88%|████████▊ | 193/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.q.weight]
  782.  
  783. Loading weights: 89%|████████▊ | 194/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.v.weight]
  784.  
  785. Loading weights: 89%|████████▊ | 194/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.SelfAttention.v.weight]
  786.  
  787. Loading weights: 89%|████████▉ | 195/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.layer_norm.weight] 
  788.  
  789. Loading weights: 89%|████████▉ | 195/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.0.layer_norm.weight]
  790.  
  791. Loading weights: 89%|████████▉ | 196/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_0.weight]
  792.  
  793. Loading weights: 89%|████████▉ | 196/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_0.weight]
  794.  
  795. Loading weights: 90%|████████▉ | 197/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_1.weight]
  796.  
  797. Loading weights: 90%|████████▉ | 197/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wi_1.weight]
  798.  
  799. Loading weights: 90%|█████████ | 198/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wo.weight] 
  800.  
  801. Loading weights: 90%|█████████ | 198/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.DenseReluDense.wo.weight]
  802.  
  803. Loading weights: 91%|█████████ | 199/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.layer_norm.weight] 
  804.  
  805. Loading weights: 91%|█████████ | 199/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.21.layer.1.layer_norm.weight]
  806.  
  807. Loading weights: 91%|█████████▏| 200/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.k.weight]
  808.  
  809. Loading weights: 91%|█████████▏| 200/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.k.weight]
  810.  
  811. Loading weights: 92%|█████████▏| 201/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.o.weight]
  812.  
  813. Loading weights: 92%|█████████▏| 201/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.o.weight]
  814.  
  815. Loading weights: 92%|█████████▏| 202/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.q.weight]
  816.  
  817. Loading weights: 92%|█████████▏| 202/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.q.weight]
  818.  
  819. Loading weights: 93%|█████████▎| 203/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.v.weight]
  820.  
  821. Loading weights: 93%|█████████▎| 203/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.SelfAttention.v.weight]
  822.  
  823. Loading weights: 93%|█████████▎| 204/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.layer_norm.weight] 
  824.  
  825. Loading weights: 93%|█████████▎| 204/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.0.layer_norm.weight]
  826.  
  827. Loading weights: 94%|█████████▎| 205/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_0.weight]
  828.  
  829. Loading weights: 94%|█████████▎| 205/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_0.weight]
  830.  
  831. Loading weights: 94%|█████████▍| 206/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_1.weight]
  832.  
  833. Loading weights: 94%|█████████▍| 206/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wi_1.weight]
  834.  
  835. Loading weights: 95%|█████████▍| 207/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wo.weight] 
  836.  
  837. Loading weights: 95%|█████████▍| 207/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.DenseReluDense.wo.weight]
  838.  
  839. Loading weights: 95%|█████████▍| 208/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.layer_norm.weight] 
  840.  
  841. Loading weights: 95%|█████████▍| 208/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.22.layer.1.layer_norm.weight]
  842.  
  843. Loading weights: 95%|█████████▌| 209/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.k.weight]
  844.  
  845. Loading weights: 95%|█████████▌| 209/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.k.weight]
  846.  
  847. Loading weights: 96%|█████████▌| 210/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.o.weight]
  848.  
  849. Loading weights: 96%|█████████▌| 210/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.o.weight]
  850.  
  851. Loading weights: 96%|█████████▋| 211/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.q.weight]
  852.  
  853. Loading weights: 96%|█████████▋| 211/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.q.weight]
  854.  
  855. Loading weights: 97%|█████████▋| 212/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.v.weight]
  856.  
  857. Loading weights: 97%|█████████▋| 212/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.SelfAttention.v.weight]
  858.  
  859. Loading weights: 97%|█████████▋| 213/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.layer_norm.weight] 
  860.  
  861. Loading weights: 97%|█████████▋| 213/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.0.layer_norm.weight]
  862.  
  863. Loading weights: 98%|█████████▊| 214/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_0.weight]
  864.  
  865. Loading weights: 98%|█████████▊| 214/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_0.weight]
  866.  
  867. Loading weights: 98%|█████████▊| 215/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_1.weight]
  868.  
  869. Loading weights: 98%|█████████▊| 215/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wi_1.weight]
  870.  
  871. Loading weights: 99%|█████████▊| 216/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wo.weight] 
  872.  
  873. Loading weights: 99%|█████████▊| 216/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.DenseReluDense.wo.weight]
  874.  
  875. Loading weights: 99%|█████████▉| 217/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.layer_norm.weight] 
  876.  
  877. Loading weights: 99%|█████████▉| 217/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.block.23.layer.1.layer_norm.weight]
  878.  
  879. Loading weights: 100%|█████████▉| 218/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.final_layer_norm.weight] 
  880.  
  881. Loading weights: 100%|█████████▉| 218/219 [00:00<00:00, 586.19it/s, Materializing param=encoder.final_layer_norm.weight]
  882.  
  883. Loading weights: 100%|██████████| 219/219 [00:00<00:00, 586.19it/s, Materializing param=shared.weight] 
  884.  
  885. Loading weights: 100%|██████████| 219/219 [00:00<00:00, 586.19it/s, Materializing param=shared.weight]
  886. Loading weights: 100%|██████████| 219/219 [00:00<00:00, 866.14it/s, Materializing param=shared.weight]
  887.  
  888.  
  889. Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
  890.  
  891. Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 5.36it/s]
  892. Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 13.00it/s]
  893.  
  894. Loading pipeline components...: 86%|████████▌ | 6/7 [00:00<00:00, 5.87it/s]
  895.  
  896. Loading weights: 0%| | 0/196 [00:00<?, ?it/s]
  897.  
  898. Loading weights: 1%| | 1/196 [00:00<00:00, 32513.98it/s, Materializing param=text_model.embeddings.position_embedding.weight]
  899.  
  900. Loading weights: 1%| | 1/196 [00:00<00:00, 1680.41it/s, Materializing param=text_model.embeddings.position_embedding.weight] 
  901.  
  902. Loading weights: 1%| | 2/196 [00:00<00:00, 272.93it/s, Materializing param=text_model.embeddings.token_embedding.weight] 
  903.  
  904. Loading weights: 1%| | 2/196 [00:00<00:00, 255.14it/s, Materializing param=text_model.embeddings.token_embedding.weight]
  905.  
  906. Loading weights: 2%|▏ | 3/196 [00:00<00:00, 345.67it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.bias]
  907.  
  908. Loading weights: 2%|▏ | 3/196 [00:00<00:00, 335.02it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.bias]
  909.  
  910. Loading weights: 2%|▏ | 4/196 [00:00<00:00, 407.75it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.weight]
  911.  
  912. Loading weights: 2%|▏ | 4/196 [00:00<00:00, 401.82it/s, Materializing param=text_model.encoder.layers.0.layer_norm1.weight]
  913.  
  914. Loading weights: 3%|▎ | 5/196 [00:00<00:00, 453.08it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.bias] 
  915.  
  916. Loading weights: 3%|▎ | 5/196 [00:00<00:00, 446.62it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.bias]
  917.  
  918. Loading weights: 3%|▎ | 6/196 [00:00<00:00, 493.08it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.weight]
  919.  
  920. Loading weights: 3%|▎ | 6/196 [00:00<00:00, 467.15it/s, Materializing param=text_model.encoder.layers.0.layer_norm2.weight]
  921.  
  922. Loading weights: 4%|▎ | 7/196 [00:00<00:00, 487.90it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.bias] 
  923.  
  924. Loading weights: 4%|▎ | 7/196 [00:00<00:00, 483.41it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.bias]
  925.  
  926. Loading weights: 4%|▍ | 8/196 [00:00<00:00, 527.08it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.weight]
  927.  
  928. Loading weights: 4%|▍ | 8/196 [00:00<00:00, 509.83it/s, Materializing param=text_model.encoder.layers.0.mlp.fc1.weight]
  929.  
  930. Loading weights: 5%|▍ | 9/196 [00:00<00:00, 557.08it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.bias] 
  931.  
  932. Loading weights: 5%|▍ | 9/196 [00:00<00:00, 548.11it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.bias]
  933.  
  934. Loading weights: 5%|▌ | 10/196 [00:00<00:00, 604.11it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.weight]
  935.  
  936. Loading weights: 5%|▌ | 10/196 [00:00<00:00, 601.36it/s, Materializing param=text_model.encoder.layers.0.mlp.fc2.weight]
  937.  
  938. Loading weights: 6%|▌ | 11/196 [00:00<00:00, 656.67it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.bias]
  939.  
  940. Loading weights: 6%|▌ | 11/196 [00:00<00:00, 642.75it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.bias]
  941.  
  942. Loading weights: 6%|▌ | 12/196 [00:00<00:00, 685.74it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.weight]
  943.  
  944. Loading weights: 6%|▌ | 12/196 [00:00<00:00, 643.82it/s, Materializing param=text_model.encoder.layers.0.self_attn.k_proj.weight]
  945.  
  946. Loading weights: 7%|▋ | 13/196 [00:00<00:00, 559.98it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.bias]
  947.  
  948. Loading weights: 7%|▋ | 13/196 [00:00<00:00, 546.64it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.bias]
  949.  
  950. Loading weights: 7%|▋ | 14/196 [00:00<00:00, 569.30it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.weight]
  951.  
  952. Loading weights: 7%|▋ | 14/196 [00:00<00:00, 563.84it/s, Materializing param=text_model.encoder.layers.0.self_attn.out_proj.weight]
  953.  
  954. Loading weights: 8%|▊ | 15/196 [00:00<00:00, 575.09it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.bias] 
  955.  
  956. Loading weights: 8%|▊ | 15/196 [00:00<00:00, 563.13it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.bias]
  957.  
  958. Loading weights: 8%|▊ | 16/196 [00:00<00:00, 537.23it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.weight]
  959.  
  960. Loading weights: 8%|▊ | 16/196 [00:00<00:00, 534.81it/s, Materializing param=text_model.encoder.layers.0.self_attn.q_proj.weight]
  961.  
  962. Loading weights: 9%|▊ | 17/196 [00:00<00:00, 543.48it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.bias] 
  963.  
  964. Loading weights: 9%|▊ | 17/196 [00:00<00:00, 530.42it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.bias]
  965.  
  966. Loading weights: 9%|▉ | 18/196 [00:00<00:00, 544.65it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.weight]
  967.  
  968. Loading weights: 9%|▉ | 18/196 [00:00<00:00, 537.06it/s, Materializing param=text_model.encoder.layers.0.self_attn.v_proj.weight]
  969.  
  970. Loading weights: 10%|▉ | 19/196 [00:00<00:00, 546.28it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.bias] 
  971.  
  972. Loading weights: 10%|▉ | 19/196 [00:00<00:00, 543.87it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.bias]
  973.  
  974. Loading weights: 10%|█ | 20/196 [00:00<00:00, 568.58it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.weight]
  975.  
  976. Loading weights: 10%|█ | 20/196 [00:00<00:00, 561.26it/s, Materializing param=text_model.encoder.layers.1.layer_norm1.weight]
  977.  
  978. Loading weights: 11%|█ | 21/196 [00:00<00:00, 583.38it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.bias] 
  979.  
  980. Loading weights: 11%|█ | 21/196 [00:00<00:00, 575.56it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.bias]
  981.  
  982. Loading weights: 11%|█ | 22/196 [00:00<00:00, 538.59it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.weight]
  983.  
  984. Loading weights: 11%|█ | 22/196 [00:00<00:00, 526.97it/s, Materializing param=text_model.encoder.layers.1.layer_norm2.weight]
  985.  
  986. Loading weights: 12%|█▏ | 23/196 [00:00<00:00, 526.67it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.bias] 
  987.  
  988. Loading weights: 12%|█▏ | 23/196 [00:00<00:00, 524.82it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.bias]
  989.  
  990. Loading weights: 12%|█▏ | 24/196 [00:00<00:00, 507.50it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.weight]
  991.  
  992. Loading weights: 12%|█▏ | 24/196 [00:00<00:00, 505.89it/s, Materializing param=text_model.encoder.layers.1.mlp.fc1.weight]
  993.  
  994. Loading weights: 13%|█▎ | 25/196 [00:00<00:00, 498.34it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.bias] 
  995.  
  996. Loading weights: 13%|█▎ | 25/196 [00:00<00:00, 493.00it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.bias]
  997.  
  998. Loading weights: 13%|█▎ | 26/196 [00:00<00:00, 500.69it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.weight]
  999.  
  1000. Loading weights: 13%|█▎ | 26/196 [00:00<00:00, 495.33it/s, Materializing param=text_model.encoder.layers.1.mlp.fc2.weight]
  1001.  
  1002. Loading weights: 14%|█▍ | 27/196 [00:00<00:00, 503.28it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.bias]
  1003.  
  1004. Loading weights: 14%|█▍ | 27/196 [00:00<00:00, 498.59it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.bias]
  1005.  
  1006. Loading weights: 14%|█▍ | 28/196 [00:00<00:00, 492.22it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.weight]
  1007.  
  1008. Loading weights: 14%|█▍ | 28/196 [00:00<00:00, 476.53it/s, Materializing param=text_model.encoder.layers.1.self_attn.k_proj.weight]
  1009.  
  1010. Loading weights: 15%|█▍ | 29/196 [00:00<00:00, 489.92it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.bias]
  1011.  
  1012. Loading weights: 15%|█▍ | 29/196 [00:00<00:00, 489.22it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.bias]
  1013.  
  1014. Loading weights: 15%|█▌ | 30/196 [00:00<00:00, 504.99it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.weight]
  1015.  
  1016. Loading weights: 15%|█▌ | 30/196 [00:00<00:00, 503.90it/s, Materializing param=text_model.encoder.layers.1.self_attn.out_proj.weight]
  1017.  
  1018. Loading weights: 16%|█▌ | 31/196 [00:00<00:00, 518.33it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.bias] 
  1019.  
  1020. Loading weights: 16%|█▌ | 31/196 [00:00<00:00, 517.61it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.bias]
  1021.  
  1022. Loading weights: 16%|█▋ | 32/196 [00:00<00:00, 532.90it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.weight]
  1023.  
  1024. Loading weights: 16%|█▋ | 32/196 [00:00<00:00, 530.14it/s, Materializing param=text_model.encoder.layers.1.self_attn.q_proj.weight]
  1025.  
  1026. Loading weights: 17%|█▋ | 33/196 [00:00<00:00, 542.83it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.bias] 
  1027.  
  1028. Loading weights: 17%|█▋ | 33/196 [00:00<00:00, 542.08it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.bias]
  1029.  
  1030. Loading weights: 17%|█▋ | 34/196 [00:00<00:00, 556.30it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.weight]
  1031.  
  1032. Loading weights: 17%|█▋ | 34/196 [00:00<00:00, 555.72it/s, Materializing param=text_model.encoder.layers.1.self_attn.v_proj.weight]
  1033.  
  1034. Loading weights: 18%|█▊ | 35/196 [00:00<00:00, 565.88it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.bias] 
  1035.  
  1036. Loading weights: 18%|█▊ | 35/196 [00:00<00:00, 565.21it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.bias]
  1037.  
  1038. Loading weights: 18%|█▊ | 36/196 [00:00<00:00, 578.11it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.weight]
  1039.  
  1040. Loading weights: 18%|█▊ | 36/196 [00:00<00:00, 577.63it/s, Materializing param=text_model.encoder.layers.2.layer_norm1.weight]
  1041.  
  1042. Loading weights: 19%|█▉ | 37/196 [00:00<00:00, 588.22it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.bias] 
  1043.  
  1044. Loading weights: 19%|█▉ | 37/196 [00:00<00:00, 586.67it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.bias]
  1045.  
  1046. Loading weights: 19%|█▉ | 38/196 [00:00<00:00, 583.97it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.weight]
  1047.  
  1048. Loading weights: 19%|█▉ | 38/196 [00:00<00:00, 582.14it/s, Materializing param=text_model.encoder.layers.2.layer_norm2.weight]
  1049.  
  1050. Loading weights: 20%|█▉ | 39/196 [00:00<00:00, 591.78it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.bias] 
  1051.  
  1052. Loading weights: 20%|█▉ | 39/196 [00:00<00:00, 590.06it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.bias]
  1053.  
  1054. Loading weights: 20%|██ | 40/196 [00:00<00:00, 581.93it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.weight]
  1055.  
  1056. Loading weights: 20%|██ | 40/196 [00:00<00:00, 577.67it/s, Materializing param=text_model.encoder.layers.2.mlp.fc1.weight]
  1057.  
  1058. Loading weights: 21%|██ | 41/196 [00:00<00:00, 585.71it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.bias] 
  1059.  
  1060. Loading weights: 21%|██ | 41/196 [00:00<00:00, 578.43it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.bias]
  1061.  
  1062. Loading weights: 21%|██▏ | 42/196 [00:00<00:00, 588.02it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.weight]
  1063.  
  1064. Loading weights: 21%|██▏ | 42/196 [00:00<00:00, 586.57it/s, Materializing param=text_model.encoder.layers.2.mlp.fc2.weight]
  1065.  
  1066. Loading weights: 22%|██▏ | 43/196 [00:00<00:00, 589.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.bias]
  1067.  
  1068. Loading weights: 22%|██▏ | 43/196 [00:00<00:00, 543.53it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.bias]
  1069.  
  1070. Loading weights: 22%|██▏ | 44/196 [00:00<00:00, 551.08it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.weight]
  1071.  
  1072. Loading weights: 22%|██▏ | 44/196 [00:00<00:00, 547.79it/s, Materializing param=text_model.encoder.layers.2.self_attn.k_proj.weight]
  1073.  
  1074. Loading weights: 23%|██▎ | 45/196 [00:00<00:00, 524.04it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.bias]
  1075.  
  1076. Loading weights: 23%|██▎ | 45/196 [00:00<00:00, 521.34it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.bias]
  1077.  
  1078. Loading weights: 23%|██▎ | 46/196 [00:00<00:00, 515.26it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.weight]
  1079.  
  1080. Loading weights: 23%|██▎ | 46/196 [00:00<00:00, 512.44it/s, Materializing param=text_model.encoder.layers.2.self_attn.out_proj.weight]
  1081.  
  1082. Loading weights: 24%|██▍ | 47/196 [00:00<00:00, 522.28it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.bias] 
  1083.  
  1084. Loading weights: 24%|██▍ | 47/196 [00:00<00:00, 519.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.bias]
  1085.  
  1086. Loading weights: 24%|██▍ | 48/196 [00:00<00:00, 522.74it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.weight]
  1087.  
  1088. Loading weights: 24%|██▍ | 48/196 [00:00<00:00, 521.92it/s, Materializing param=text_model.encoder.layers.2.self_attn.q_proj.weight]
  1089.  
  1090. Loading weights: 25%|██▌ | 49/196 [00:00<00:00, 524.86it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.bias] 
  1091.  
  1092. Loading weights: 25%|██▌ | 49/196 [00:00<00:00, 523.04it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.bias]
  1093.  
  1094. Loading weights: 26%|██▌ | 50/196 [00:00<00:00, 520.00it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.weight]
  1095.  
  1096. Loading weights: 26%|██▌ | 50/196 [00:00<00:00, 516.50it/s, Materializing param=text_model.encoder.layers.2.self_attn.v_proj.weight]
  1097.  
  1098. Loading weights: 26%|██▌ | 51/196 [00:00<00:00, 522.98it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.bias] 
  1099.  
  1100. Loading weights: 26%|██▌ | 51/196 [00:00<00:00, 521.65it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.bias]
  1101.  
  1102. Loading weights: 27%|██▋ | 52/196 [00:00<00:00, 531.13it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.weight]
  1103.  
  1104. Loading weights: 27%|██▋ | 52/196 [00:00<00:00, 530.70it/s, Materializing param=text_model.encoder.layers.3.layer_norm1.weight]
  1105.  
  1106. Loading weights: 27%|██▋ | 53/196 [00:00<00:00, 539.91it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.bias] 
  1107.  
  1108. Loading weights: 27%|██▋ | 53/196 [00:00<00:00, 538.33it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.bias]
  1109.  
  1110. Loading weights: 28%|██▊ | 54/196 [00:00<00:00, 547.87it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.weight]
  1111.  
  1112. Loading weights: 28%|██▊ | 54/196 [00:00<00:00, 547.55it/s, Materializing param=text_model.encoder.layers.3.layer_norm2.weight]
  1113.  
  1114. Loading weights: 28%|██▊ | 55/196 [00:00<00:00, 556.84it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.bias] 
  1115.  
  1116. Loading weights: 28%|██▊ | 55/196 [00:00<00:00, 556.46it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.bias]
  1117.  
  1118. Loading weights: 29%|██▊ | 56/196 [00:00<00:00, 560.25it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight]
  1119.  
  1120. Loading weights: 29%|██▊ | 56/196 [00:00<00:00, 559.15it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight]
  1121.  
  1122. Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc1.weight]
  1123.  
  1124. Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.bias] 
  1125.  
  1126. Loading weights: 29%|██▉ | 57/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.bias]
  1127.  
  1128. Loading weights: 30%|██▉ | 58/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.weight]
  1129.  
  1130. Loading weights: 30%|██▉ | 58/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.mlp.fc2.weight]
  1131.  
  1132. Loading weights: 30%|███ | 59/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.bias]
  1133.  
  1134. Loading weights: 30%|███ | 59/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.bias]
  1135.  
  1136. Loading weights: 31%|███ | 60/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.weight]
  1137.  
  1138. Loading weights: 31%|███ | 60/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.k_proj.weight]
  1139.  
  1140. Loading weights: 31%|███ | 61/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.bias]
  1141.  
  1142. Loading weights: 31%|███ | 61/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.bias]
  1143.  
  1144. Loading weights: 32%|███▏ | 62/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.weight]
  1145.  
  1146. Loading weights: 32%|███▏ | 62/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.out_proj.weight]
  1147.  
  1148. Loading weights: 32%|███▏ | 63/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.bias] 
  1149.  
  1150. Loading weights: 32%|███▏ | 63/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.bias]
  1151.  
  1152. Loading weights: 33%|███▎ | 64/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.weight]
  1153.  
  1154. Loading weights: 33%|███▎ | 64/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.q_proj.weight]
  1155.  
  1156. Loading weights: 33%|███▎ | 65/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.bias] 
  1157.  
  1158. Loading weights: 33%|███▎ | 65/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.bias]
  1159.  
  1160. Loading weights: 34%|███▎ | 66/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.weight]
  1161.  
  1162. Loading weights: 34%|███▎ | 66/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.3.self_attn.v_proj.weight]
  1163.  
  1164. Loading weights: 34%|███▍ | 67/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.bias] 
  1165.  
  1166. Loading weights: 34%|███▍ | 67/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.bias]
  1167.  
  1168. Loading weights: 35%|███▍ | 68/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.weight]
  1169.  
  1170. Loading weights: 35%|███▍ | 68/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm1.weight]
  1171.  
  1172. Loading weights: 35%|███▌ | 69/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.bias] 
  1173.  
  1174. Loading weights: 35%|███▌ | 69/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.bias]
  1175.  
  1176. Loading weights: 36%|███▌ | 70/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.weight]
  1177.  
  1178. Loading weights: 36%|███▌ | 70/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.layer_norm2.weight]
  1179.  
  1180. Loading weights: 36%|███▌ | 71/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.bias] 
  1181.  
  1182. Loading weights: 36%|███▌ | 71/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.bias]
  1183.  
  1184. Loading weights: 37%|███▋ | 72/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.weight]
  1185.  
  1186. Loading weights: 37%|███▋ | 72/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc1.weight]
  1187.  
  1188. Loading weights: 37%|███▋ | 73/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.bias] 
  1189.  
  1190. Loading weights: 37%|███▋ | 73/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.bias]
  1191.  
  1192. Loading weights: 38%|███▊ | 74/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.weight]
  1193.  
  1194. Loading weights: 38%|███▊ | 74/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.mlp.fc2.weight]
  1195.  
  1196. Loading weights: 38%|███▊ | 75/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.bias]
  1197.  
  1198. Loading weights: 38%|███▊ | 75/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.bias]
  1199.  
  1200. Loading weights: 39%|███▉ | 76/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.weight]
  1201.  
  1202. Loading weights: 39%|███▉ | 76/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.k_proj.weight]
  1203.  
  1204. Loading weights: 39%|███▉ | 77/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.bias]
  1205.  
  1206. Loading weights: 39%|███▉ | 77/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.bias]
  1207.  
  1208. Loading weights: 40%|███▉ | 78/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.weight]
  1209.  
  1210. Loading weights: 40%|███▉ | 78/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.out_proj.weight]
  1211.  
  1212. Loading weights: 40%|████ | 79/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.bias] 
  1213.  
  1214. Loading weights: 40%|████ | 79/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.bias]
  1215.  
  1216. Loading weights: 41%|████ | 80/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.weight]
  1217.  
  1218. Loading weights: 41%|████ | 80/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.q_proj.weight]
  1219.  
  1220. Loading weights: 41%|████▏ | 81/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.bias] 
  1221.  
  1222. Loading weights: 41%|████▏ | 81/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.bias]
  1223.  
  1224. Loading weights: 42%|████▏ | 82/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.weight]
  1225.  
  1226. Loading weights: 42%|████▏ | 82/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.4.self_attn.v_proj.weight]
  1227.  
  1228. Loading weights: 42%|████▏ | 83/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.bias] 
  1229.  
  1230. Loading weights: 42%|████▏ | 83/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.bias]
  1231.  
  1232. Loading weights: 43%|████▎ | 84/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.weight]
  1233.  
  1234. Loading weights: 43%|████▎ | 84/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm1.weight]
  1235.  
  1236. Loading weights: 43%|████▎ | 85/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.bias] 
  1237.  
  1238. Loading weights: 43%|████▎ | 85/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.bias]
  1239.  
  1240. Loading weights: 44%|████▍ | 86/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.weight]
  1241.  
  1242. Loading weights: 44%|████▍ | 86/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.layer_norm2.weight]
  1243.  
  1244. Loading weights: 44%|████▍ | 87/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.bias] 
  1245.  
  1246. Loading weights: 44%|████▍ | 87/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.bias]
  1247.  
  1248. Loading weights: 45%|████▍ | 88/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.weight]
  1249.  
  1250. Loading weights: 45%|████▍ | 88/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc1.weight]
  1251.  
  1252. Loading weights: 45%|████▌ | 89/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.bias] 
  1253.  
  1254. Loading weights: 45%|████▌ | 89/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.bias]
  1255.  
  1256. Loading weights: 46%|████▌ | 90/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.weight]
  1257.  
  1258. Loading weights: 46%|████▌ | 90/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.mlp.fc2.weight]
  1259.  
  1260. Loading weights: 46%|████▋ | 91/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.bias]
  1261.  
  1262. Loading weights: 46%|████▋ | 91/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.bias]
  1263.  
  1264. Loading weights: 47%|████▋ | 92/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.weight]
  1265.  
  1266. Loading weights: 47%|████▋ | 92/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.k_proj.weight]
  1267.  
  1268. Loading weights: 47%|████▋ | 93/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.bias]
  1269.  
  1270. Loading weights: 47%|████▋ | 93/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.bias]
  1271.  
  1272. Loading weights: 48%|████▊ | 94/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.weight]
  1273.  
  1274. Loading weights: 48%|████▊ | 94/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.out_proj.weight]
  1275.  
  1276. Loading weights: 48%|████▊ | 95/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.bias] 
  1277.  
  1278. Loading weights: 48%|████▊ | 95/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.bias]
  1279.  
  1280. Loading weights: 49%|████▉ | 96/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.weight]
  1281.  
  1282. Loading weights: 49%|████▉ | 96/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.q_proj.weight]
  1283.  
  1284. Loading weights: 49%|████▉ | 97/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.bias] 
  1285.  
  1286. Loading weights: 49%|████▉ | 97/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.bias]
  1287.  
  1288. Loading weights: 50%|█████ | 98/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.weight]
  1289.  
  1290. Loading weights: 50%|█████ | 98/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.5.self_attn.v_proj.weight]
  1291.  
  1292. Loading weights: 51%|█████ | 99/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.bias] 
  1293.  
  1294. Loading weights: 51%|█████ | 99/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.bias]
  1295.  
  1296. Loading weights: 51%|█████ | 100/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.weight]
  1297.  
  1298. Loading weights: 51%|█████ | 100/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm1.weight]
  1299.  
  1300. Loading weights: 52%|█████▏ | 101/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.bias] 
  1301.  
  1302. Loading weights: 52%|█████▏ | 101/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.bias]
  1303.  
  1304. Loading weights: 52%|█████▏ | 102/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.weight]
  1305.  
  1306. Loading weights: 52%|█████▏ | 102/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.layer_norm2.weight]
  1307.  
  1308. Loading weights: 53%|█████▎ | 103/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.bias] 
  1309.  
  1310. Loading weights: 53%|█████▎ | 103/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.bias]
  1311.  
  1312. Loading weights: 53%|█████▎ | 104/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.weight]
  1313.  
  1314. Loading weights: 53%|█████▎ | 104/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc1.weight]
  1315.  
  1316. Loading weights: 54%|█████▎ | 105/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.bias] 
  1317.  
  1318. Loading weights: 54%|█████▎ | 105/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.bias]
  1319.  
  1320. Loading weights: 54%|█████▍ | 106/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.weight]
  1321.  
  1322. Loading weights: 54%|█████▍ | 106/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.mlp.fc2.weight]
  1323.  
  1324. Loading weights: 55%|█████▍ | 107/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.bias]
  1325.  
  1326. Loading weights: 55%|█████▍ | 107/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.bias]
  1327.  
  1328. Loading weights: 55%|█████▌ | 108/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.weight]
  1329.  
  1330. Loading weights: 55%|█████▌ | 108/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.k_proj.weight]
  1331.  
  1332. Loading weights: 56%|█████▌ | 109/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.bias]
  1333.  
  1334. Loading weights: 56%|█████▌ | 109/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.bias]
  1335.  
  1336. Loading weights: 56%|█████▌ | 110/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.weight]
  1337.  
  1338. Loading weights: 56%|█████▌ | 110/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.out_proj.weight]
  1339.  
  1340. Loading weights: 57%|█████▋ | 111/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.bias] 
  1341.  
  1342. Loading weights: 57%|█████▋ | 111/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.bias]
  1343.  
  1344. Loading weights: 57%|█████▋ | 112/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.weight]
  1345.  
  1346. Loading weights: 57%|█████▋ | 112/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.q_proj.weight]
  1347.  
  1348. Loading weights: 58%|█████▊ | 113/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.bias] 
  1349.  
  1350. Loading weights: 58%|█████▊ | 113/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.bias]
  1351.  
  1352. Loading weights: 58%|█████▊ | 114/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.weight]
  1353.  
  1354. Loading weights: 58%|█████▊ | 114/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.6.self_attn.v_proj.weight]
  1355.  
  1356. Loading weights: 59%|█████▊ | 115/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.bias] 
  1357.  
  1358. Loading weights: 59%|█████▊ | 115/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.bias]
  1359.  
  1360. Loading weights: 59%|█████▉ | 116/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.weight]
  1361.  
  1362. Loading weights: 59%|█████▉ | 116/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm1.weight]
  1363.  
  1364. Loading weights: 60%|█████▉ | 117/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.bias] 
  1365.  
  1366. Loading weights: 60%|█████▉ | 117/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.bias]
  1367.  
  1368. Loading weights: 60%|██████ | 118/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.weight]
  1369.  
  1370. Loading weights: 60%|██████ | 118/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.layer_norm2.weight]
  1371.  
  1372. Loading weights: 61%|██████ | 119/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.bias] 
  1373.  
  1374. Loading weights: 61%|██████ | 119/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.bias]
  1375.  
  1376. Loading weights: 61%|██████ | 120/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.weight]
  1377.  
  1378. Loading weights: 61%|██████ | 120/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc1.weight]
  1379.  
  1380. Loading weights: 62%|██████▏ | 121/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.bias] 
  1381.  
  1382. Loading weights: 62%|██████▏ | 121/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.bias]
  1383.  
  1384. Loading weights: 62%|██████▏ | 122/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.weight]
  1385.  
  1386. Loading weights: 62%|██████▏ | 122/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.mlp.fc2.weight]
  1387.  
  1388. Loading weights: 63%|██████▎ | 123/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.bias]
  1389.  
  1390. Loading weights: 63%|██████▎ | 123/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.bias]
  1391.  
  1392. Loading weights: 63%|██████▎ | 124/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.weight]
  1393.  
  1394. Loading weights: 63%|██████▎ | 124/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.k_proj.weight]
  1395.  
  1396. Loading weights: 64%|██████▍ | 125/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.bias]
  1397.  
  1398. Loading weights: 64%|██████▍ | 125/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.bias]
  1399.  
  1400. Loading weights: 64%|██████▍ | 126/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.weight]
  1401.  
  1402. Loading weights: 64%|██████▍ | 126/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.out_proj.weight]
  1403.  
  1404. Loading weights: 65%|██████▍ | 127/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.bias] 
  1405.  
  1406. Loading weights: 65%|██████▍ | 127/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.bias]
  1407.  
  1408. Loading weights: 65%|██████▌ | 128/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.weight]
  1409.  
  1410. Loading weights: 65%|██████▌ | 128/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.q_proj.weight]
  1411.  
  1412. Loading weights: 66%|██████▌ | 129/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.bias] 
  1413.  
  1414. Loading weights: 66%|██████▌ | 129/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.bias]
  1415.  
  1416. Loading weights: 66%|██████▋ | 130/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.weight]
  1417.  
  1418. Loading weights: 66%|██████▋ | 130/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.7.self_attn.v_proj.weight]
  1419.  
  1420. Loading weights: 67%|██████▋ | 131/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.bias] 
  1421.  
  1422. Loading weights: 67%|██████▋ | 131/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.bias]
  1423.  
  1424. Loading weights: 67%|██████▋ | 132/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.weight]
  1425.  
  1426. Loading weights: 67%|██████▋ | 132/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm1.weight]
  1427.  
  1428. Loading weights: 68%|██████▊ | 133/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.bias] 
  1429.  
  1430. Loading weights: 68%|██████▊ | 133/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.bias]
  1431.  
  1432. Loading weights: 68%|██████▊ | 134/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.weight]
  1433.  
  1434. Loading weights: 68%|██████▊ | 134/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.layer_norm2.weight]
  1435.  
  1436. Loading weights: 69%|██████▉ | 135/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.bias] 
  1437.  
  1438. Loading weights: 69%|██████▉ | 135/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.bias]
  1439.  
  1440. Loading weights: 69%|██████▉ | 136/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.weight]
  1441.  
  1442. Loading weights: 69%|██████▉ | 136/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc1.weight]
  1443.  
  1444. Loading weights: 70%|██████▉ | 137/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.bias] 
  1445.  
  1446. Loading weights: 70%|██████▉ | 137/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.bias]
  1447.  
  1448. Loading weights: 70%|███████ | 138/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.weight]
  1449.  
  1450. Loading weights: 70%|███████ | 138/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.mlp.fc2.weight]
  1451.  
  1452. Loading weights: 71%|███████ | 139/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.bias]
  1453.  
  1454. Loading weights: 71%|███████ | 139/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.bias]
  1455.  
  1456. Loading weights: 71%|███████▏ | 140/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.weight]
  1457.  
  1458. Loading weights: 71%|███████▏ | 140/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.k_proj.weight]
  1459.  
  1460. Loading weights: 72%|███████▏ | 141/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.bias]
  1461.  
  1462. Loading weights: 72%|███████▏ | 141/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.bias]
  1463.  
  1464. Loading weights: 72%|███████▏ | 142/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.weight]
  1465.  
  1466. Loading weights: 72%|███████▏ | 142/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.out_proj.weight]
  1467.  
  1468. Loading weights: 73%|███████▎ | 143/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.bias] 
  1469.  
  1470. Loading weights: 73%|███████▎ | 143/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.bias]
  1471.  
  1472. Loading weights: 73%|███████▎ | 144/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.weight]
  1473.  
  1474. Loading weights: 73%|███████▎ | 144/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.q_proj.weight]
  1475.  
  1476. Loading weights: 74%|███████▍ | 145/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.bias] 
  1477.  
  1478. Loading weights: 74%|███████▍ | 145/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.bias]
  1479.  
  1480. Loading weights: 74%|███████▍ | 146/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.weight]
  1481.  
  1482. Loading weights: 74%|███████▍ | 146/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.8.self_attn.v_proj.weight]
  1483.  
  1484. Loading weights: 75%|███████▌ | 147/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.bias] 
  1485.  
  1486. Loading weights: 75%|███████▌ | 147/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.bias]
  1487.  
  1488. Loading weights: 76%|███████▌ | 148/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.weight]
  1489.  
  1490. Loading weights: 76%|███████▌ | 148/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm1.weight]
  1491.  
  1492. Loading weights: 76%|███████▌ | 149/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.bias] 
  1493.  
  1494. Loading weights: 76%|███████▌ | 149/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.bias]
  1495.  
  1496. Loading weights: 77%|███████▋ | 150/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.weight]
  1497.  
  1498. Loading weights: 77%|███████▋ | 150/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.layer_norm2.weight]
  1499.  
  1500. Loading weights: 77%|███████▋ | 151/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.bias] 
  1501.  
  1502. Loading weights: 77%|███████▋ | 151/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.bias]
  1503.  
  1504. Loading weights: 78%|███████▊ | 152/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.weight]
  1505.  
  1506. Loading weights: 78%|███████▊ | 152/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc1.weight]
  1507.  
  1508. Loading weights: 78%|███████▊ | 153/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.bias] 
  1509.  
  1510. Loading weights: 78%|███████▊ | 153/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.bias]
  1511.  
  1512. Loading weights: 79%|███████▊ | 154/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.weight]
  1513.  
  1514. Loading weights: 79%|███████▊ | 154/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.mlp.fc2.weight]
  1515.  
  1516. Loading weights: 79%|███████▉ | 155/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.bias]
  1517.  
  1518. Loading weights: 79%|███████▉ | 155/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.bias]
  1519.  
  1520. Loading weights: 80%|███████▉ | 156/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.weight]
  1521.  
  1522. Loading weights: 80%|███████▉ | 156/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.k_proj.weight]
  1523.  
  1524. Loading weights: 80%|████████ | 157/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.bias]
  1525.  
  1526. Loading weights: 80%|████████ | 157/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.bias]
  1527.  
  1528. Loading weights: 81%|████████ | 158/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.weight]
  1529.  
  1530. Loading weights: 81%|████████ | 158/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.out_proj.weight]
  1531.  
  1532. Loading weights: 81%|████████ | 159/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.bias] 
  1533.  
  1534. Loading weights: 81%|████████ | 159/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.bias]
  1535.  
  1536. Loading weights: 82%|████████▏ | 160/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.weight]
  1537.  
  1538. Loading weights: 82%|████████▏ | 160/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.q_proj.weight]
  1539.  
  1540. Loading weights: 82%|████████▏ | 161/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.bias] 
  1541.  
  1542. Loading weights: 82%|████████▏ | 161/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.bias]
  1543.  
  1544. Loading weights: 83%|████████▎ | 162/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.weight]
  1545.  
  1546. Loading weights: 83%|████████▎ | 162/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.9.self_attn.v_proj.weight]
  1547.  
  1548. Loading weights: 83%|████████▎ | 163/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.bias] 
  1549.  
  1550. Loading weights: 83%|████████▎ | 163/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.bias]
  1551.  
  1552. Loading weights: 84%|████████▎ | 164/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.weight]
  1553.  
  1554. Loading weights: 84%|████████▎ | 164/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm1.weight]
  1555.  
  1556. Loading weights: 84%|████████▍ | 165/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.bias] 
  1557.  
  1558. Loading weights: 84%|████████▍ | 165/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.bias]
  1559.  
  1560. Loading weights: 85%|████████▍ | 166/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.weight]
  1561.  
  1562. Loading weights: 85%|████████▍ | 166/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.layer_norm2.weight]
  1563.  
  1564. Loading weights: 85%|████████▌ | 167/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.bias] 
  1565.  
  1566. Loading weights: 85%|████████▌ | 167/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.bias]
  1567.  
  1568. Loading weights: 86%|████████▌ | 168/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.weight]
  1569.  
  1570. Loading weights: 86%|████████▌ | 168/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc1.weight]
  1571.  
  1572. Loading weights: 86%|████████▌ | 169/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.bias] 
  1573.  
  1574. Loading weights: 86%|████████▌ | 169/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.bias]
  1575.  
  1576. Loading weights: 87%|████████▋ | 170/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.weight]
  1577.  
  1578. Loading weights: 87%|████████▋ | 170/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.mlp.fc2.weight]
  1579.  
  1580. Loading weights: 87%|████████▋ | 171/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.bias]
  1581.  
  1582. Loading weights: 87%|████████▋ | 171/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.bias]
  1583.  
  1584. Loading weights: 88%|████████▊ | 172/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.weight]
  1585.  
  1586. Loading weights: 88%|████████▊ | 172/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.k_proj.weight]
  1587.  
  1588. Loading weights: 88%|████████▊ | 173/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.bias]
  1589.  
  1590. Loading weights: 88%|████████▊ | 173/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.bias]
  1591.  
  1592. Loading weights: 89%|████████▉ | 174/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.weight]
  1593.  
  1594. Loading weights: 89%|████████▉ | 174/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.out_proj.weight]
  1595.  
  1596. Loading weights: 89%|████████▉ | 175/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.bias] 
  1597.  
  1598. Loading weights: 89%|████████▉ | 175/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.bias]
  1599.  
  1600. Loading weights: 90%|████████▉ | 176/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.weight]
  1601.  
  1602. Loading weights: 90%|████████▉ | 176/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.q_proj.weight]
  1603.  
  1604. Loading weights: 90%|█████████ | 177/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.bias] 
  1605.  
  1606. Loading weights: 90%|█████████ | 177/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.bias]
  1607.  
  1608. Loading weights: 91%|█████████ | 178/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.weight]
  1609.  
  1610. Loading weights: 91%|█████████ | 178/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.10.self_attn.v_proj.weight]
  1611.  
  1612. Loading weights: 91%|█████████▏| 179/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.bias] 
  1613.  
  1614. Loading weights: 91%|█████████▏| 179/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.bias]
  1615.  
  1616. Loading weights: 92%|█████████▏| 180/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.weight]
  1617.  
  1618. Loading weights: 92%|█████████▏| 180/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm1.weight]
  1619.  
  1620. Loading weights: 92%|█████████▏| 181/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.bias] 
  1621.  
  1622. Loading weights: 92%|█████████▏| 181/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.bias]
  1623.  
  1624. Loading weights: 93%|█████████▎| 182/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.weight]
  1625.  
  1626. Loading weights: 93%|█████████▎| 182/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.layer_norm2.weight]
  1627.  
  1628. Loading weights: 93%|█████████▎| 183/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.bias] 
  1629.  
  1630. Loading weights: 93%|█████████▎| 183/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.bias]
  1631.  
  1632. Loading weights: 94%|█████████▍| 184/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.weight]
  1633.  
  1634. Loading weights: 94%|█████████▍| 184/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc1.weight]
  1635.  
  1636. Loading weights: 94%|█████████▍| 185/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.bias] 
  1637.  
  1638. Loading weights: 94%|█████████▍| 185/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.bias]
  1639.  
  1640. Loading weights: 95%|█████████▍| 186/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.weight]
  1641.  
  1642. Loading weights: 95%|█████████▍| 186/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.mlp.fc2.weight]
  1643.  
  1644. Loading weights: 95%|█████████▌| 187/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.bias]
  1645.  
  1646. Loading weights: 95%|█████████▌| 187/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.bias]
  1647.  
  1648. Loading weights: 96%|█████████▌| 188/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.weight]
  1649.  
  1650. Loading weights: 96%|█████████▌| 188/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.k_proj.weight]
  1651.  
  1652. Loading weights: 96%|█████████▋| 189/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.bias]
  1653.  
  1654. Loading weights: 96%|█████████▋| 189/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.bias]
  1655.  
  1656. Loading weights: 97%|█████████▋| 190/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.weight]
  1657.  
  1658. Loading weights: 97%|█████████▋| 190/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.out_proj.weight]
  1659.  
  1660. Loading weights: 97%|█████████▋| 191/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.bias] 
  1661.  
  1662. Loading weights: 97%|█████████▋| 191/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.bias]
  1663.  
  1664. Loading weights: 98%|█████████▊| 192/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.weight]
  1665.  
  1666. Loading weights: 98%|█████████▊| 192/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.q_proj.weight]
  1667.  
  1668. Loading weights: 98%|█████████▊| 193/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.bias] 
  1669.  
  1670. Loading weights: 98%|█████████▊| 193/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.bias]
  1671.  
  1672. Loading weights: 99%|█████████▉| 194/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.weight]
  1673.  
  1674. Loading weights: 99%|█████████▉| 194/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.encoder.layers.11.self_attn.v_proj.weight]
  1675.  
  1676. Loading weights: 99%|█████████▉| 195/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.bias] 
  1677.  
  1678. Loading weights: 99%|█████████▉| 195/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.bias]
  1679.  
  1680. Loading weights: 100%|██████████| 196/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.weight]
  1681.  
  1682. Loading weights: 100%|██████████| 196/196 [00:00<00:00, 562.74it/s, Materializing param=text_model.final_layer_norm.weight]
  1683. Loading weights: 100%|██████████| 196/196 [00:00<00:00, 1124.60it/s, Materializing param=text_model.final_layer_norm.weight]
  1684.  
  1685. Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00, 6.14it/s]
  1686. Traceback (most recent call last):
  1687. File "/home/sayak/diffusers/check_group_offloading.py", line 16, in <module>
  1688. pipe.transformer.enable_group_offload(
  1689. File "/home/sayak/diffusers/src/diffusers/models/modeling_utils.py", line 571, in enable_group_offload
  1690. apply_group_offloading(
  1691. File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 612, in apply_group_offloading
  1692. _apply_group_offloading(module, config)
  1693. File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 619, in _apply_group_offloading
  1694. _apply_group_offloading_leaf_level(module, config)
  1695. File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 737, in _apply_group_offloading_leaf_level
  1696. group = ModuleGroup(
  1697. ^^^^^^^^^^^^
  1698. File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 119, in __init__
  1699. self.cpu_param_dict = self._init_cpu_param_dict()
  1700. ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1701. File "/home/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 134, in _init_cpu_param_dict
  1702. cpu_param_dict[param] = param.data.cpu() if self.low_cpu_mem_usage else param.data.cpu().pin_memory()
  1703. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1704. File "/home/sayak/ao/torchao/utils.py", line 684, in _dispatch__torch_dispatch__
  1705. raise NotImplementedError(
  1706. NotImplementedError: NVFP4Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.is_pinned', overload='default')>, types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>,), arg_types=(<class 'torchao.prototype.mx_formats.nvfp4_tensor.NVFP4Tensor'>,), kwarg_types={}
  1707.  
Advertisement
Add Comment
Please, Sign In to add comment