streaming_convnets model. the WER does not decrease during training (it remains stable at 100% for dev and train)
I tried to use the streaming_convnets from recipes: i used two different lexicon and token(character based model vs wordpiece based model):
I used for that a small dataset for training (train-clean-100)
1) lexicon and token from seq2seq_dts mode (word-piece based model):
a-Lexion: librispeech-train+dev-unigram-10000-nbest10.lexicon
a _a a _ a a'azam _a ' az am a'azam _a ' a z am a'azam _a ' a za m a'azam _a ' az a m a'azam _ a ' az am a'azam _a ' a z a m ...etc
b-Token: librispeech-train-all-unigram-10000.tokens
_the s _and ed _of _to _a _in _i _he _that ly _was ..etc
c-cfg file :
--train=lists/train-clean-100.lst --valid=dev-clean:lists/dev-clean.lst --lexicon=am/librispeech-train+dev-unigram-10000-nbest10.lexicon --arch=am_500ms_future_context.arch --tokens=am/librispeech-train-all-unigram-10000.tokens
--criterion=ctc --batchsize=8
--lr=0.4 #--lrcrit=0.05 --momentum=0.0 --maxgradnorm=0.5 --reportiters=1000
--nthread=6 --mfsc=true --usewordpiece=true --wordseparator=_
--filterbanks=80 --minisz=200 --mintsz=2 --maxisz=33000 --enable_distributed=true --pcttraineval=1 --minloglevel=0
--logtostderr --onorm=target --sqnorm --localnrmlleftctx=300 --lr_decay=10000
d- result
epoch: 1 | nupdates: 1000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:08:53 | bch(ms): 533.32 | smp(ms): 354.76 | fwd(ms): 59.87 | crit-fwd(ms): 8.10 | bwd(ms): 105.77 | optim(ms): 12.29 | loss: 52.64304 | train-TER: 99.39 | train-WER: 99.66 | dev-clean-loss: 34.79316 | dev-clean-TER: 87.16 | dev-clean-WER: 95.75 | avg-isz: 1240 | avg-tsz: 049 | max-tsz: 093 | hrs: 27.57 | thrpt(sec/sec): 186.08 epoch: 1 | nupdates: 2000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:30 | bch(ms): 570.33 | smp(ms): 388.66 | fwd(ms): 59.99 | crit-fwd(ms): 8.11 | bwd(ms): 108.18 | optim(ms): 12.78 | loss: 46.04015 | train-TER: 82.20 | train-WER: 92.45 | dev-clean-loss: 33.09205 | dev-clean-TER: 79.57 | dev-clean-WER: 94.58 | avg-isz: 1265 | avg-tsz: 050 | max-tsz: 077 | hrs: 28.13 | thrpt(sec/sec): 177.54 epoch: 1 | nupdates: 3000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:32 | bch(ms): 572.96 | smp(ms): 390.00 | fwd(ms): 60.63 | crit-fwd(ms): 8.22 | bwd(ms): 108.94 | optim(ms): 12.76 | loss: 44.30390 | train-TER: 83.84 | train-WER: 94.23 | dev-clean-loss: 33.39086 | dev-clean-TER: 82.08 | dev-clean-WER: 94.03 | avg-isz: 1274 | avg-tsz: 050 | max-tsz: 083 | hrs: 28.31 | thrpt(sec/sec): 177.90 epoch: 2 | nupdates: 4000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:39 | bch(ms): 579.43 | smp(ms): 394.31 | fwd(ms): 61.53 | crit-fwd(ms): 8.40 | bwd(ms): 110.09 | optim(ms): 12.72 | loss: 43.88908 | train-TER: 84.09 | train-WER: 94.07 | dev-clean-loss: 33.34910 | dev-clean-TER: 85.63 | dev-clean-WER: 93.78 | avg-isz: 1286 | avg-tsz: 051 | max-tsz: 083 | hrs: 28.59 | thrpt(sec/sec): 177.63 epoch: 2 | nupdates: 5000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:19 | bch(ms): 559.75 | smp(ms): 380.93 | fwd(ms): 59.20 | crit-fwd(ms): 8.09 | bwd(ms): 106.35 | optim(ms): 12.67 | loss: 42.80119 | train-TER: 82.50 | train-WER: 93.65 | dev-clean-loss: 32.47028 | dev-clean-TER: 77.14 | dev-clean-WER: 92.30 | avg-isz: 1242 | avg-tsz: 049 | max-tsz: 079 | hrs: 27.61 | thrpt(sec/sec): 177.55 epoch: 2 | nupdates: 6000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:29 | bch(ms): 569.36 | smp(ms): 388.02 | fwd(ms): 59.92 | crit-fwd(ms): 8.04 | bwd(ms): 107.96 | optim(ms): 12.79 | loss: 42.73933 | train-TER: 86.38 | train-WER: 94.57 | dev-clean-loss: 31.84283 | dev-clean-TER: 74.21 | dev-clean-WER: 91.30 | avg-isz: 1264 | avg-tsz: 050 | max-tsz: 093 | hrs: 28.10 | thrpt(sec/sec): 177.66 epoch: 2 | nupdates: 7000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:44 | bch(ms): 584.42 | smp(ms): 399.69 | fwd(ms): 61.31 | crit-fwd(ms): 8.25 | bwd(ms): 110.21 | optim(ms): 12.76 | loss: 42.74298 | train-TER: 96.69 | train-WER: 97.89 | dev-clean-loss: 31.64643 | dev-clean-TER: 100.00 | dev-clean-WER: 99.99 | avg-isz: 1288 | avg-tsz: 051 | max-tsz: 078 | hrs: 28.63 | thrpt(sec/sec): 176.35 epoch: 3 | nupdates: 8000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:34 | bch(ms): 574.30 | smp(ms): 391.78 | fwd(ms): 60.62 | crit-fwd(ms): 8.25 | bwd(ms): 108.75 | optim(ms): 12.75 | loss: 42.07528 | train-TER: 87.87 | train-WER: 95.96 | dev-clean-loss: 31.18573 | dev-clean-TER: 96.60 | dev-clean-WER: 96.69 | avg-isz: 1271 | avg-tsz: 050 | max-tsz: 082 | hrs: 28.25 | thrpt(sec/sec): 177.07 epoch: 3 | nupdates: 9000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.77 | smp(ms): 385.23 | fwd(ms): 59.44 | crit-fwd(ms): 8.01 | bwd(ms): 106.69 | optim(ms): 12.75 | loss: 41.44461 | train-TER: 90.83 | train-WER: 96.70 | dev-clean-loss: 31.17613 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1245 | avg-tsz: 049 | max-tsz: 078 | hrs: 27.67 | thrpt(sec/sec): 176.38 epoch: 3 | nupdates: 10000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:36 | bch(ms): 576.68 | smp(ms): 392.34 | fwd(ms): 61.16 | crit-fwd(ms): 8.39 | bwd(ms): 109.78 | optim(ms): 12.77 | loss: 42.13430 | train-TER: 93.27 | train-WER: 96.79 | dev-clean-loss: 30.90531 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1285 | avg-tsz: 051 | max-tsz: 078 | hrs: 28.56 | thrpt(sec/sec): 178.28 epoch: 4 | nupdates: 11000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:36 | bch(ms): 576.67 | smp(ms): 393.02 | fwd(ms): 60.90 | crit-fwd(ms): 8.28 | bwd(ms): 109.36 | optim(ms): 12.71 | loss: 41.95766 | train-TER: 92.75 | train-WER: 96.92 | dev-clean-loss: 30.97161 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1281 | avg-tsz: 051 | max-tsz: 093 | hrs: 28.47 | thrpt(sec/sec): 177.76 epoch: 4 | nupdates: 12000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:19 | bch(ms): 559.70 | smp(ms): 380.42 | fwd(ms): 59.34 | crit-fwd(ms): 8.06 | bwd(ms): 106.74 | optim(ms): 12.69 | loss: 41.06174 | train-TER: 92.63 | train-WER: 96.35 | dev-clean-loss: 30.87306 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1246 | avg-tsz: 049 | max-tsz: 093 | hrs: 27.71 | thrpt(sec/sec): 178.20 epoch: 4 | nupdates: 13000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:25 | bch(ms): 565.96 | smp(ms): 384.76 | fwd(ms): 59.99 | crit-fwd(ms): 8.03 | bwd(ms): 107.96 | optim(ms): 12.72 | loss: 41.39895 | train-TER: 95.72 | train-WER: 97.68 | dev-clean-loss: 30.91884 | dev-clean-TER: 99.90 | dev-clean-WER: 99.85 | avg-isz: 1261 | avg-tsz: 050 | max-tsz: 083 | hrs: 28.03 | thrpt(sec/sec): 178.31 epoch: 4 | nupdates: 14000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:41 | bch(ms): 581.31 | smp(ms): 396.95 | fwd(ms): 61.07 | crit-fwd(ms): 8.28 | bwd(ms): 109.74 | optim(ms): 12.83 | loss: 41.71505 | train-TER: 96.13 | train-WER: 97.64 | dev-clean-loss: 31.56983 | dev-clean-TER: 81.42 | dev-clean-WER: 92.58 | avg-isz: 1285 | avg-tsz: 051 | max-tsz: 078 | hrs: 28.56 | thrpt(sec/sec): 176.85 epoch: 5 | nupdates: 15000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:27 | bch(ms): 567.82 | smp(ms): 385.63 | fwd(ms): 60.30 | crit-fwd(ms): 8.17 | bwd(ms): 108.21 | optim(ms): 12.77 | loss: 41.21607 | train-TER: 94.72 | train-WER: 96.95 | dev-clean-loss: 31.37890 | dev-clean-TER: 83.88 | dev-clean-WER: 92.59 | avg-isz: 1265 | avg-tsz: 050 | max-tsz: 082 | hrs: 28.12 | thrpt(sec/sec): 178.28 epoch: 5 | nupdates: 16000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:23 | bch(ms): 563.07 | smp(ms): 382.86 | fwd(ms): 59.49 | crit-fwd(ms): 7.94 | bwd(ms): 107.36 | optim(ms): 12.80 | loss: 41.02894 | train-TER: 98.09 | train-WER: 98.70 | dev-clean-loss: 31.29246 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1253 | avg-tsz: 049 | max-tsz: 079 | hrs: 27.86 | thrpt(sec/sec): 178.11 epoch: 5 | nupdates: 17000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:31 | bch(ms): 571.93 | smp(ms): 389.97 | fwd(ms): 60.45 | crit-fwd(ms): 8.30 | bwd(ms): 108.12 | optim(ms): 12.80 | loss: 41.09602 | train-TER: 96.65 | train-WER: 98.50 | dev-clean-loss: 30.68892 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1267 | avg-tsz: 050 | max-tsz: 093 | hrs: 28.17 | thrpt(sec/sec): 177.30 epoch: 6 | nupdates: 18000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:42 | bch(ms): 582.33 | smp(ms): 397.57 | fwd(ms): 61.39 | crit-fwd(ms): 8.35 | bwd(ms): 110.05 | optim(ms): 12.73 | loss: 41.61017 | train-TER: 98.73 | train-WER: 99.46 | dev-clean-loss: 31.22991 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1287 | avg-tsz: 051 | max-tsz: 079 | hrs: 28.60 | thrpt(sec/sec): 176.82 epoch: 6 | nupdates: 19000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:23 | bch(ms): 563.01 | smp(ms): 383.89 | fwd(ms): 59.24 | crit-fwd(ms): 7.95 | bwd(ms): 106.46 | optim(ms): 12.75 | loss: 40.63184 | train-TER: 98.38 | train-WER: 99.32 | dev-clean-loss: 30.71613 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1243 | avg-tsz: 049 | max-tsz: 082 | hrs: 27.64 | thrpt(sec/sec): 176.75 epoch: 6 | nupdates: 20000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.71 | smp(ms): 383.85 | fwd(ms): 59.80 | crit-fwd(ms): 8.09 | bwd(ms): 107.67 | optim(ms): 12.78 | loss: 40.92923 | train-TER: 97.04 | train-WER: 98.12 | dev-clean-loss: 31.03355 | dev-clean-TER: 99.76 | dev-clean-WER: 99.65 | avg-isz: 1260 | avg-tsz: 050 | max-tsz: 093 | hrs: 28.01 | thrpt(sec/sec): 178.57 epoch: 6 | nupdates: 21000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:44 | bch(ms): 584.29 | smp(ms): 399.79 | fwd(ms): 61.23 | crit-fwd(ms): 8.34 | bwd(ms): 109.91 | optim(ms): 12.87 | loss: 41.38276 | train-TER: 97.88 | train-WER: 98.79 | dev-clean-loss: 31.04303 | dev-clean-TER: 95.50 | dev-clean-WER: 95.84 | avg-isz: 1285 | avg-tsz: 051 | max-tsz: 083 | hrs: 28.58 | thrpt(sec/sec): 176.06 epoch: 7 | nupdates: 22000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:37 | bch(ms): 577.05 | smp(ms): 393.18 | fwd(ms): 61.19 | crit-fwd(ms): 8.29 | bwd(ms): 109.36 | optim(ms): 12.72 | loss: 41.08938 | train-TER: 98.45 | train-WER: 99.47 | dev-clean-loss: 31.37402 | dev-clean-TER: 99.96 | dev-clean-WER: 99.95 | avg-isz: 1279 | avg-tsz: 050 | max-tsz: 093 | hrs: 28.43 | thrpt(sec/sec): 177.34 epoch: 7 | nupdates: 23000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:20 | bch(ms): 560.66 | smp(ms): 381.87 | fwd(ms): 59.08 | crit-fwd(ms): 8.01 | bwd(ms): 106.34 | optim(ms): 12.76 | loss: 40.65637 | train-TER: 96.29 | train-WER: 97.57 | dev-clean-loss: 30.48641 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1243 | avg-tsz: 049 | max-tsz: 083 | hrs: 27.62 | thrpt(sec/sec): 177.38 epoch: 7 | nupdates: 24000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:29 | bch(ms): 569.43 | smp(ms): 387.19 | fwd(ms): 60.40 | crit-fwd(ms): 8.27 | bwd(ms): 108.54 | optim(ms): 12.73 | loss: 41.15380 | train-TER: 93.92 | train-WER: 96.91 | dev-clean-loss: 30.38039 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1270 | avg-tsz: 050 | max-tsz: 077 | hrs: 28.23 | thrpt(sec/sec): 178.48 epoch: 8 | nupdates: 25000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:42 | bch(ms): 582.65 | smp(ms): 397.19 | fwd(ms): 61.48 | crit-fwd(ms): 8.35 | bwd(ms): 110.53 | optim(ms): 12.76 | loss: 41.39486 | train-TER: 99.64 | train-WER: 99.66 | dev-clean-loss: 30.50425 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1295 | avg-tsz: 051 | max-tsz: 079 | hrs: 28.80 | thrpt(sec/sec): 177.92 epoch: 8 | nupdates: 26000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:17 | bch(ms): 557.59 | smp(ms): 379.72 | fwd(ms): 58.85 | crit-fwd(ms): 8.03 | bwd(ms): 105.55 | optim(ms): 12.77 | loss: 40.27983 | train-TER: 97.86 | train-WER: 98.93 | dev-clean-loss: 30.66625 | dev-clean-TER: 95.50 | dev-clean-WER: 96.02 | avg-isz: 1236 | avg-tsz: 049 | max-tsz: 079 | hrs: 27.47 | thrpt(sec/sec): 177.39 epoch: 8 | nupdates: 27000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:22 | bch(ms): 562.07 | smp(ms): 382.26 | fwd(ms): 59.55 | crit-fwd(ms): 8.03 | bwd(ms): 107.06 | optim(ms): 12.71 | loss: 40.53588 | train-TER: 97.12 | train-WER: 98.48 | dev-clean-loss: 30.78142 | dev-clean-TER: 91.34 | dev-clean-WER: 94.41 | avg-isz: 1251 | avg-tsz: 049 | max-tsz: 083 | hrs: 27.81 | thrpt(sec/sec): 178.12 epoch: 8 | nupdates: 28000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:40 | bch(ms): 580.95 | smp(ms): 396.74 | fwd(ms): 61.22 | crit-fwd(ms): 8.27 | bwd(ms): 109.74 | optim(ms): 12.73 | loss: 41.12175 | train-TER: 96.11 | train-WER: 98.14 | dev-clean-loss: 30.36239 | dev-clean-TER: 99.23 | dev-clean-WER: 99.01 | avg-isz: 1284 | avg-tsz: 051 | max-tsz: 093 | hrs: 28.55 | thrpt(sec/sec): 176.91 epoch: 9 | nupdates: 29000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:41 | bch(ms): 581.99 | smp(ms): 397.44 | fwd(ms): 61.27 | crit-fwd(ms): 8.27 | bwd(ms): 109.83 | optim(ms): 12.83 | loss: 41.08982 | train-TER: 97.21 | train-WER: 98.21 | dev-clean-loss: 31.13091 | dev-clean-TER: 94.99 | dev-clean-WER: 95.58 | avg-isz: 1286 | avg-tsz: 051 | max-tsz: 078 | hrs: 28.59 | thrpt(sec/sec): 176.83 epoch: 9 | nupdates: 30000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:25 | bch(ms): 565.67 | smp(ms): 385.70 | fwd(ms): 59.70 | crit-fwd(ms): 8.07 | bwd(ms): 107.15 | optim(ms): 12.79 | loss: 40.52055 | train-TER: 99.09 | train-WER: 99.61 | dev-clean-loss: 30.38957 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1252 | avg-tsz: 050 | max-tsz: 083 | hrs: 27.84 | thrpt(sec/sec): 177.15 epoch: 9 | nupdates: 31000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:18 | bch(ms): 558.41 | smp(ms): 378.95 | fwd(ms): 59.45 | crit-fwd(ms): 8.10 | bwd(ms): 106.71 | optim(ms): 12.75 | loss: 40.41936 | train-TER: 97.79 | train-WER: 98.83 | dev-clean-loss: 30.25391 | dev-clean-TER: 94.91 | dev-clean-WER: 95.60 | avg-isz: 1251 | avg-tsz: 049 | max-tsz: 093 | hrs: 27.81 | thrpt(sec/sec): 179.26 epoch: 9 | nupdates: 32000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:48 | bch(ms): 588.45 | smp(ms): 401.55 | fwd(ms): 62.09 | crit-fwd(ms): 8.42 | bwd(ms): 111.00 | optim(ms): 12.89 | loss: 41.24431 | train-TER: 99.78 | train-WER: 99.74 | dev-clean-loss: 30.58442 | dev-clean-TER: 87.27 | dev-clean-WER: 93.85 | avg-isz: 1300 | avg-tsz: 051 | max-tsz: 078 | hrs: 28.90 | thrpt(sec/sec): 176.78 epoch: 10 | nupdates: 33000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.76 | smp(ms): 385.13 | fwd(ms): 59.55 | crit-fwd(ms): 8.12 | bwd(ms): 106.81 | optim(ms): 12.68 | loss: 40.33834 | train-TER: 99.53 | train-WER: 99.53 | dev-clean-loss: 30.72762 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1251 | avg-tsz: 049 | max-tsz: 093 | hrs: 27.82 | thrpt(sec/sec): 177.31 epoch: 10 | nupdates: 34000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:17 | bch(ms): 557.62 | smp(ms): 378.99 | fwd(ms): 59.04 | crit-fwd(ms): 7.99 | bwd(ms): 106.20 | optim(ms): 12.74 | loss: 40.44278 | train-TER: 98.70 | train-WER: 99.34 | dev-clean-loss: 30.59309 | dev-clean-TER: 97.98 | dev-clean-WER: 97.77 | avg-isz: 1245 | avg-tsz: 049 | max-tsz: 077 | hrs: 27.68 | thrpt(sec/sec): 178.68 epoch: 10 | nupdates: 35000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:54 | bch(ms): 594.67 | smp(ms): 408.18 | fwd(ms): 61.98 | crit-fwd(ms): 8.36 | bwd(ms): 111.12 | optim(ms): 12.84 | loss: 41.28943 | train-TER: 97.57 | train-WER: 98.73 | dev-clean-loss: 30.88223 | dev-clean-TER: 97.73 | dev-clean-WER: 99.32 | avg-isz: 1300 | avg-tsz: 051 | max-tsz: 082 | hrs: 28.89 | thrpt(sec/sec): 174.89 epoch: 11 | nupdates: 36000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:37 | bch(ms): 577.24 | smp(ms): 393.78 | fwd(ms): 60.88 | crit-fwd(ms): 8.17 | bwd(ms): 109.41 | optim(ms): 12.70 | loss: 40.86320 | train-TER: 94.50 | train-WER: 96.46 | dev-clean-loss: 30.46451 | dev-clean-TER: 86.25 | dev-clean-WER: 93.76 | avg-isz: 1278 | avg-tsz: 050 | max-tsz: 079 | hrs: 28.40 | thrpt(sec/sec): 177.12 epoch: 11 | nupdates: 37000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.83 | smp(ms): 384.10 | fwd(ms): 59.85 | crit-fwd(ms): 8.08 | bwd(ms): 107.48 | optim(ms): 12.79 | loss: 40.55057 | train-TER: 97.49 | train-WER: 98.33 | dev-clean-loss: 31.24545 | dev-clean-TER: 92.41 | dev-clean-WER: 95.04 | avg-isz: 1257 | avg-tsz: 050 | max-tsz: 077 | hrs: 27.94 | thrpt(sec/sec): 178.08 epoch: 11 | nupdates: 38000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.35 | smp(ms): 385.31 | fwd(ms): 59.25 | crit-fwd(ms): 8.00 | bwd(ms): 106.40 | optim(ms): 12.72 | loss: 40.20313 | train-TER: 98.99 | train-WER: 99.57 | dev-clean-loss: 30.61073 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1244 | avg-tsz: 049 | max-tsz: 082 | hrs: 27.65 | thrpt(sec/sec): 176.36 epoch: 11 | nupdates: 39000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:44 | bch(ms): 584.66 | smp(ms): 399.47 | fwd(ms): 61.39 | crit-fwd(ms): 8.38 | bwd(ms): 110.30 | optim(ms): 12.86 | loss: 40.98882 | train-TER: 96.95 | train-WER: 98.35 | dev-clean-loss: 30.20326 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1291 | avg-tsz: 051 | max-tsz: 083 | hrs: 28.70 | thrpt(sec/sec): 176.70 epoch: 12 | nupdates: 40000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:31 | bch(ms): 571.33 | smp(ms): 390.01 | fwd(ms): 60.29 | crit-fwd(ms): 8.26 | bwd(ms): 107.70 | optim(ms): 12.82 | loss: 40.45131 | train-TER: 97.57 | train-WER: 98.15 | dev-clean-loss: 30.25990 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1259 | avg-tsz: 050 | max-tsz: 093 | hrs: 27.98 | thrpt(sec/sec): 176.31 epoch: 12 | nupdates: 41000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:23 | bch(ms): 563.71 | smp(ms): 383.84 | fwd(ms): 59.53 | crit-fwd(ms): 8.03 | bwd(ms): 106.85 | optim(ms): 12.83 | loss: 40.22705 | train-TER: 95.51 | train-WER: 98.07 | dev-clean-loss: 30.33597 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1249 | avg-tsz: 049 | max-tsz: 082 | hrs: 27.77 | thrpt(sec/sec): 177.33 epoch: 12 | nupdates: 42000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:38 | bch(ms): 578.62 | smp(ms): 395.31 | fwd(ms): 60.67 | crit-fwd(ms): 8.12 | bwd(ms): 109.20 | optim(ms): 12.78 | loss: 40.73667 | train-TER: 98.60 | train-WER: 99.52 | dev-clean-loss: 30.72150 | dev-clean-TER: 88.17 | dev-clean-WER: 93.84 | avg-isz: 1278 | avg-tsz: 050 | max-tsz: 093 | hrs: 28.41 | thrpt(sec/sec): 176.78
2) lexicon and token from tutorial (Letter based model"free-lexicon")
a-Lexion: lexicon.txt
a a | a'azam a ' a z a m | a'll a ' l l | a'most a ' m o s t | a'ready a ' r e a d y | a'rony a ' r o n y | a's a ' s | a'terwards a ' t e r w a r d s | a'thinkin a ' t h i n k i n | a've a ' v e | aaraaf a a r a a f | ...etc
b-Token: librispeech-train-all-unigram-10000.tokens
| ' a b c d e f g h i j k l m n o p q ...etc
c-cfg file :
--train=lists/train-clean-100.lst --valid=dev-clean:lists/dev-clean.lst --lexicon=am/conf_C01/lexicon.txt --arch=librispeech/am_500ms_future_context.arch --tokens=tokens.txt --surround=|
--criterion=ctc --batchsize=8
--lr=0.4 #--lrcrit=0.05 --momentum=0.0 --maxgradnorm=0.5 --reportiters=1000
--nthread=6 --mfsc=true #--usewordpiece=true #--wordseparator=_
--filterbanks=80 --minisz=200 --mintsz=2 --maxisz=33000 --enable_distributed=true --pcttraineval=1 --minloglevel=0
--logtostderr --onorm=target --sqnorm --localnrmlleftctx=300 --lr_decay=10000
d- result
epoch: 1 | nupdates: 1000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:08:45 | bch(ms): 525.71 | smp(ms): 360.50 | fwd(ms): 52.14 | crit-fwd(ms): 2.68 | bwd(ms): 100.64 | optim(ms): 12.17 | loss: 32.39452 | train-TER: 89.39 | train-WER: 102.35 | dev-clean-loss: 23.24659 | dev-clean-TER: 86.44 | dev-clean-WER: 100.08 | avg-isz: 1240 | avg-tsz: 218 | max-tsz: 400 | hrs: 27.57 | thrpt(sec/sec): 188.78 epoch: 1 | nupdates: 2000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:20 | bch(ms): 560.47 | smp(ms): 392.67 | fwd(ms): 52.19 | crit-fwd(ms): 2.78 | bwd(ms): 102.95 | optim(ms): 12.46 | loss: 32.52234 | train-TER: 98.37 | train-WER: 99.93 | dev-clean-loss: 23.17374 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1265 | avg-tsz: 223 | max-tsz: 331 | hrs: 28.13 | thrpt(sec/sec): 180.67 epoch: 1 | nupdates: 3000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:25 | bch(ms): 565.28 | smp(ms): 396.01 | fwd(ms): 52.86 | crit-fwd(ms): 2.83 | bwd(ms): 103.76 | optim(ms): 12.46 | loss: 32.56369 | train-TER: 98.44 | train-WER: 99.96 | dev-clean-loss: 23.80183 | dev-clean-TER: 86.07 | dev-clean-WER: 101.73 | avg-isz: 1274 | avg-tsz: 224 | max-tsz: 340 | hrs: 28.31 | thrpt(sec/sec): 180.31 epoch: 2 | nupdates: 4000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:36 | bch(ms): 576.89 | smp(ms): 405.33 | fwd(ms): 53.70 | crit-fwd(ms): 2.85 | bwd(ms): 105.20 | optim(ms): 12.46 | loss: 32.63548 | train-TER: 96.88 | train-WER: 99.82 | dev-clean-loss: 23.54874 | dev-clean-TER: 95.90 | dev-clean-WER: 100.01 | avg-isz: 1286 | avg-tsz: 227 | max-tsz: 340 | hrs: 28.59 | thrpt(sec/sec): 178.41 epoch: 2 | nupdates: 5000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:16 | bch(ms): 556.57 | smp(ms): 390.92 | fwd(ms): 51.64 | crit-fwd(ms): 2.70 | bwd(ms): 101.41 | optim(ms): 12.40 | loss: 31.97152 | train-TER: 98.04 | train-WER: 99.90 | dev-clean-loss: 23.06855 | dev-clean-TER: 98.25 | dev-clean-WER: 99.99 | avg-isz: 1242 | avg-tsz: 220 | max-tsz: 338 | hrs: 27.61 | thrpt(sec/sec): 178.57 epoch: 2 | nupdates: 6000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:21 | bch(ms): 561.09 | smp(ms): 392.92 | fwd(ms): 52.47 | crit-fwd(ms): 2.81 | bwd(ms): 103.00 | optim(ms): 12.50 | loss: 32.40682 | train-TER: 95.22 | train-WER: 99.82 | dev-clean-loss: 23.43389 | dev-clean-TER: 95.94 | dev-clean-WER: 100.01 | avg-isz: 1264 | avg-tsz: 222 | max-tsz: 400 | hrs: 28.10 | thrpt(sec/sec): 180.27 epoch: 2 | nupdates: 7000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:32 | bch(ms): 572.70 | smp(ms): 401.47 | fwd(ms): 53.50 | crit-fwd(ms): 2.90 | bwd(ms): 104.98 | optim(ms): 12.55 | loss: 32.70524 | train-TER: 96.84 | train-WER: 99.82 | dev-clean-loss: 23.06044 | dev-clean-TER: 98.26 | dev-clean-WER: 99.99 | avg-isz: 1288 | avg-tsz: 227 | max-tsz: 337 | hrs: 28.63 | thrpt(sec/sec): 179.96 epoch: 3 | nupdates: 8000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:28 | bch(ms): 568.03 | smp(ms): 398.84 | fwd(ms): 52.82 | crit-fwd(ms): 2.76 | bwd(ms): 103.67 | optim(ms): 12.50 | loss: 32.36959 | train-TER: 98.32 | train-WER: 99.94 | dev-clean-loss: 23.04484 | dev-clean-TER: 98.22 | dev-clean-WER: 100.00 | avg-isz: 1271 | avg-tsz: 224 | max-tsz: 340 | hrs: 28.25 | thrpt(sec/sec): 179.02 epoch: 3 | nupdates: 9000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:13 | bch(ms): 553.29 | smp(ms): 387.38 | fwd(ms): 51.75 | crit-fwd(ms): 2.73 | bwd(ms): 101.51 | optim(ms): 12.44 | loss: 32.02084 | train-TER: 97.19 | train-WER: 99.91 | dev-clean-loss: 23.46285 | dev-clean-TER: 94.37 | dev-clean-WER: 100.06 | avg-isz: 1245 | avg-tsz: 219 | max-tsz: 338 | hrs: 27.67 | thrpt(sec/sec): 180.04 epoch: 3 | nupdates: 10000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:39 | bch(ms): 579.09 | smp(ms): 408.38 | fwd(ms): 53.32 | crit-fwd(ms): 2.86 | bwd(ms): 104.66 | optim(ms): 12.52 | loss: 32.67176 | train-TER: 98.66 | train-WER: 99.79 | dev-clean-loss: 23.44814 | dev-clean-TER: 93.35 | dev-clean-WER: 100.06 | avg-isz: 1285 | avg-tsz: 226 | max-tsz: 331 | hrs: 28.56 | thrpt(sec/sec): 177.54 epoch: 4 | nupdates: 11000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:37 | bch(ms): 577.91 | smp(ms): 408.18 | fwd(ms): 52.92 | crit-fwd(ms): 2.79 | bwd(ms): 104.04 | optim(ms): 12.57 | loss: 32.57986 | train-TER: 96.44 | train-WER: 99.78 | dev-clean-loss: 23.04585 | dev-clean-TER: 98.22 | dev-clean-WER: 100.00 | avg-isz: 1281 | avg-tsz: 226 | max-tsz: 400 | hrs: 28.47 | thrpt(sec/sec): 177.38 epoch: 4 | nupdates: 12000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:18 | bch(ms): 558.08 | smp(ms): 392.21 | fwd(ms): 51.76 | crit-fwd(ms): 2.74 | bwd(ms): 101.44 | optim(ms): 12.47 | loss: 32.05249 | train-TER: 98.16 | train-WER: 99.66 | dev-clean-loss: 23.16368 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1246 | avg-tsz: 220 | max-tsz: 400 | hrs: 27.71 | thrpt(sec/sec): 178.72 epoch: 4 | nupdates: 13000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:27 | bch(ms): 567.50 | smp(ms): 399.71 | fwd(ms): 52.35 | crit-fwd(ms): 2.78 | bwd(ms): 102.77 | optim(ms): 12.47 | loss: 32.29328 | train-TER: 98.45 | train-WER: 99.78 | dev-clean-loss: 23.08375 | dev-clean-TER: 98.26 | dev-clean-WER: 99.99 | avg-isz: 1261 | avg-tsz: 222 | max-tsz: 340 | hrs: 28.03 | thrpt(sec/sec): 177.82 epoch: 4 | nupdates: 14000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:36 | bch(ms): 576.85 | smp(ms): 406.54 | fwd(ms): 53.12 | crit-fwd(ms): 2.79 | bwd(ms): 104.48 | optim(ms): 12.52 | loss: 32.61915 | train-TER: 97.07 | train-WER: 99.52 | dev-clean-loss: 23.11742 | dev-clean-TER: 95.92 | dev-clean-WER: 100.01 | avg-isz: 1285 | avg-tsz: 227 | max-tsz: 338 | hrs: 28.56 | thrpt(sec/sec): 178.22 epoch: 5 | nupdates: 15000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:29 | bch(ms): 569.95 | smp(ms): 401.34 | fwd(ms): 52.68 | crit-fwd(ms): 2.86 | bwd(ms): 103.25 | optim(ms): 12.48 | loss: 32.31531 | train-TER: 98.37 | train-WER: 99.93 | dev-clean-loss: 23.26244 | dev-clean-TER: 95.04 | dev-clean-WER: 99.29 | avg-isz: 1265 | avg-tsz: 223 | max-tsz: 340 | hrs: 28.12 | thrpt(sec/sec): 177.61 epoch: 5 | nupdates: 16000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:23 | bch(ms): 563.97 | smp(ms): 397.21 | fwd(ms): 52.05 | crit-fwd(ms): 2.74 | bwd(ms): 102.07 | optim(ms): 12.44 | loss: 32.19752 | train-TER: 98.89 | train-WER: 99.68 | dev-clean-loss: 23.09425 | dev-clean-TER: 96.46 | dev-clean-WER: 100.00 | avg-isz: 1253 | avg-tsz: 221 | max-tsz: 336 | hrs: 27.86 | thrpt(sec/sec): 177.82 epoch: 5 | nupdates: 17000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:25 | bch(ms): 565.35 | smp(ms): 396.86 | fwd(ms): 52.57 | crit-fwd(ms): 2.88 | bwd(ms): 103.13 | optim(ms): 12.59 | loss: 32.38066 | train-TER: 97.09 | train-WER: 99.38 | dev-clean-loss: 23.06108 | dev-clean-TER: 98.22 | dev-clean-WER: 100.00 | avg-isz: 1267 | avg-tsz: 223 | max-tsz: 400 | hrs: 28.17 | thrpt(sec/sec): 179.36 epoch: 6 | nupdates: 18000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:33 | bch(ms): 573.98 | smp(ms): 403.00 | fwd(ms): 53.48 | crit-fwd(ms): 2.93 | bwd(ms): 104.75 | optim(ms): 12.53 | loss: 32.66110 | train-TER: 94.53 | train-WER: 99.61 | dev-clean-loss: 23.57501 | dev-clean-TER: 91.99 | dev-clean-WER: 100.25 | avg-isz: 1287 | avg-tsz: 226 | max-tsz: 335 | hrs: 28.60 | thrpt(sec/sec): 179.39 epoch: 6 | nupdates: 19000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:11 | bch(ms): 551.98 | smp(ms): 386.54 | fwd(ms): 51.55 | crit-fwd(ms): 2.73 | bwd(ms): 101.19 | optim(ms): 12.48 | loss: 31.99667 | train-TER: 97.30 | train-WER: 99.80 | dev-clean-loss: 23.05809 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1243 | avg-tsz: 219 | max-tsz: 331 | hrs: 27.64 | thrpt(sec/sec): 180.28 epoch: 6 | nupdates: 20000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:31 | bch(ms): 571.90 | smp(ms): 404.15 | fwd(ms): 52.37 | crit-fwd(ms): 2.76 | bwd(ms): 102.76 | optim(ms): 12.42 | loss: 32.27088 | train-TER: 97.32 | train-WER: 99.57 | dev-clean-loss: 23.05034 | dev-clean-TER: 98.42 | dev-clean-WER: 100.00 | avg-isz: 1260 | avg-tsz: 223 | max-tsz: 400 | hrs: 28.01 | thrpt(sec/sec): 176.33 epoch: 6 | nupdates: 21000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:32 | bch(ms): 572.95 | smp(ms): 402.08 | fwd(ms): 53.44 | crit-fwd(ms): 2.92 | bwd(ms): 104.60 | optim(ms): 12.63 | loss: 32.68200 | train-TER: 98.90 | train-WER: 99.58 | dev-clean-loss: 23.34007 | dev-clean-TER: 95.04 | dev-clean-WER: 99.29 | avg-isz: 1285 | avg-tsz: 226 | max-tsz: 340 | hrs: 28.58 | thrpt(sec/sec): 179.55 epoch: 7 | nupdates: 22000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:33 | bch(ms): 573.67 | smp(ms): 403.87 | fwd(ms): 53.13 | crit-fwd(ms): 2.83 | bwd(ms): 103.96 | optim(ms): 12.50 | loss: 32.48077 | train-TER: 98.56 | train-WER: 99.64 | dev-clean-loss: 23.15174 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1279 | avg-tsz: 225 | max-tsz: 400 | hrs: 28.43 | thrpt(sec/sec): 178.38 epoch: 7 | nupdates: 23000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:17 | bch(ms): 557.41 | smp(ms): 391.75 | fwd(ms): 51.60 | crit-fwd(ms): 2.72 | bwd(ms): 101.34 | optim(ms): 12.53 | loss: 32.02057 | train-TER: 96.42 | train-WER: 98.97 | dev-clean-loss: 23.05276 | dev-clean-TER: 98.17 | dev-clean-WER: 100.00 | avg-isz: 1243 | avg-tsz: 219 | max-tsz: 336 | hrs: 27.62 | thrpt(sec/sec): 178.41 epoch: 7 | nupdates: 24000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:31 | bch(ms): 571.66 | smp(ms): 402.81 | fwd(ms): 52.72 | crit-fwd(ms): 2.89 | bwd(ms): 103.42 | optim(ms): 12.51 | loss: 32.46000 | train-TER: 98.67 | train-WER: 99.79 | dev-clean-loss: 23.04578 | dev-clean-TER: 98.06 | dev-clean-WER: 99.99 | avg-isz: 1270 | avg-tsz: 224 | max-tsz: 338 | hrs: 28.23 | thrpt(sec/sec): 177.79 epoch: 8 | nupdates: 25000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:48 | bch(ms): 588.23 | smp(ms): 415.77 | fwd(ms): 53.95 | crit-fwd(ms): 2.89 | bwd(ms): 105.74 | optim(ms): 12.56 | loss: 32.83195 | train-TER: 98.43 | train-WER: 99.86 | dev-clean-loss: 23.15296 | dev-clean-TER: 94.86 | dev-clean-WER: 99.29 | avg-isz: 1295 | avg-tsz: 228 | max-tsz: 337 | hrs: 28.80 | thrpt(sec/sec): 176.24 epoch: 8 | nupdates: 26000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:14 | bch(ms): 554.12 | smp(ms): 389.28 | fwd(ms): 51.32 | crit-fwd(ms): 2.70 | bwd(ms): 100.79 | optim(ms): 12.53 | loss: 31.88365 | train-TER: 97.63 | train-WER: 99.69 | dev-clean-loss: 23.14110 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1236 | avg-tsz: 218 | max-tsz: 335 | hrs: 27.47 | thrpt(sec/sec): 178.50 epoch: 8 | nupdates: 27000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:21 | bch(ms): 561.55 | smp(ms): 394.89 | fwd(ms): 52.00 | crit-fwd(ms): 2.74 | bwd(ms): 101.97 | optim(ms): 12.48 | loss: 32.14686 | train-TER: 98.23 | train-WER: 100.04 | dev-clean-loss: 23.42147 | dev-clean-TER: 91.85 | dev-clean-WER: 99.49 | avg-isz: 1251 | avg-tsz: 220 | max-tsz: 336 | hrs: 27.81 | thrpt(sec/sec): 178.29 epoch: 8 | nupdates: 28000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:36 | bch(ms): 576.47 | smp(ms): 406.09 | fwd(ms): 53.23 | crit-fwd(ms): 2.91 | bwd(ms): 104.51 | optim(ms): 12.45 | loss: 32.65874 | train-TER: 97.83 | train-WER: 99.79 | dev-clean-loss: 23.10430 | dev-clean-TER: 97.50 | dev-clean-WER: 99.99 | avg-isz: 1284 | avg-tsz: 226 | max-tsz: 400 | hrs: 28.55 | thrpt(sec/sec): 178.28 epoch: 9 | nupdates: 29000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:32 | bch(ms): 572.38 | smp(ms): 401.84 | fwd(ms): 53.27 | crit-fwd(ms): 2.91 | bwd(ms): 104.54 | optim(ms): 12.54 | loss: 32.63521 | train-TER: 98.74 | train-WER: 99.93 | dev-clean-loss: 23.04945 | dev-clean-TER: 98.26 | dev-clean-WER: 99.99 | avg-isz: 1286 | avg-tsz: 226 | max-tsz: 338 | hrs: 28.59 | thrpt(sec/sec): 179.80 epoch: 9 | nupdates: 30000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:20 | bch(ms): 560.22 | smp(ms): 393.34 | fwd(ms): 52.03 | crit-fwd(ms): 2.78 | bwd(ms): 102.11 | optim(ms): 12.53 | loss: 32.17733 | train-TER: 96.97 | train-WER: 99.80 | dev-clean-loss: 23.04797 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1252 | avg-tsz: 221 | max-tsz: 331 | hrs: 27.84 | thrpt(sec/sec): 178.87 epoch: 9 | nupdates: 31000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:24 | bch(ms): 564.43 | smp(ms): 397.77 | fwd(ms): 51.99 | crit-fwd(ms): 2.77 | bwd(ms): 101.89 | optim(ms): 12.58 | loss: 32.10176 | train-TER: 96.57 | train-WER: 99.84 | dev-clean-loss: 23.08535 | dev-clean-TER: 96.76 | dev-clean-WER: 100.00 | avg-isz: 1251 | avg-tsz: 221 | max-tsz: 400 | hrs: 27.81 | thrpt(sec/sec): 177.35 epoch: 9 | nupdates: 32000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:43 | bch(ms): 583.40 | smp(ms): 410.53 | fwd(ms): 54.13 | crit-fwd(ms): 2.96 | bwd(ms): 105.98 | optim(ms): 12.53 | loss: 32.87311 | train-TER: 97.08 | train-WER: 99.80 | dev-clean-loss: 23.10452 | dev-clean-TER: 95.64 | dev-clean-WER: 96.97 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 336 | hrs: 28.90 | thrpt(sec/sec): 178.31 epoch: 10 | nupdates: 33000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:20 | bch(ms): 560.38 | smp(ms): 393.92 | fwd(ms): 51.88 | crit-fwd(ms): 2.78 | bwd(ms): 101.92 | optim(ms): 12.46 | loss: 32.13317 | train-TER: 97.37 | train-WER: 99.33 | dev-clean-loss: 23.14042 | dev-clean-TER: 96.53 | dev-clean-WER: 100.00 | avg-isz: 1251 | avg-tsz: 221 | max-tsz: 400 | hrs: 27.82 | thrpt(sec/sec): 178.69 epoch: 10 | nupdates: 34000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:21 | bch(ms): 561.44 | smp(ms): 395.41 | fwd(ms): 51.74 | crit-fwd(ms): 2.73 | bwd(ms): 101.63 | optim(ms): 12.45 | loss: 32.01955 | train-TER: 97.98 | train-WER: 99.65 | dev-clean-loss: 23.04906 | dev-clean-TER: 98.17 | dev-clean-WER: 100.00 | avg-isz: 1245 | avg-tsz: 220 | max-tsz: 338 | hrs: 27.68 | thrpt(sec/sec): 177.47 epoch: 10 | nupdates: 35000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:40 | bch(ms): 580.88 | smp(ms): 408.28 | fwd(ms): 53.85 | crit-fwd(ms): 2.90 | bwd(ms): 105.94 | optim(ms): 12.61 | loss: 32.91233 | train-TER: 98.06 | train-WER: 99.72 | dev-clean-loss: 23.04463 | dev-clean-TER: 97.02 | dev-clean-WER: 99.99 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 331 | hrs: 28.89 | thrpt(sec/sec): 179.05 epoch: 11 | nupdates: 36000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:39 | bch(ms): 579.65 | smp(ms): 409.86 | fwd(ms): 53.07 | crit-fwd(ms): 2.84 | bwd(ms): 103.97 | optim(ms): 12.56 | loss: 32.54920 | train-TER: 97.21 | train-WER: 99.40 | dev-clean-loss: 23.30506 | dev-clean-TER: 90.16 | dev-clean-WER: 99.96 | avg-isz: 1278 | avg-tsz: 224 | max-tsz: 331 | hrs: 28.40 | thrpt(sec/sec): 176.39 epoch: 11 | nupdates: 37000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:26 | bch(ms): 566.95 | smp(ms): 399.40 | fwd(ms): 52.42 | crit-fwd(ms): 2.80 | bwd(ms): 102.47 | optim(ms): 12.46 | loss: 32.21070 | train-TER: 98.01 | train-WER: 99.85 | dev-clean-loss: 23.04511 | dev-clean-TER: 98.21 | dev-clean-WER: 99.74 | avg-isz: 1257 | avg-tsz: 222 | max-tsz: 338 | hrs: 27.94 | thrpt(sec/sec): 177.42 epoch: 11 | nupdates: 38000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:17 | bch(ms): 557.90 | smp(ms): 392.31 | fwd(ms): 51.70 | crit-fwd(ms): 2.80 | bwd(ms): 101.21 | optim(ms): 12.48 | loss: 32.03005 | train-TER: 96.65 | train-WER: 99.72 | dev-clean-loss: 23.06297 | dev-clean-TER: 96.42 | dev-clean-WER: 96.96 | avg-isz: 1244 | avg-tsz: 219 | max-tsz: 335 | hrs: 27.65 | thrpt(sec/sec): 178.40 epoch: 11 | nupdates: 39000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:45 | bch(ms): 585.01 | smp(ms): 413.43 | fwd(ms): 53.64 | crit-fwd(ms): 2.84 | bwd(ms): 105.16 | optim(ms): 12.56 | loss: 32.74588 | train-TER: 98.85 | train-WER: 99.78 | dev-clean-loss: 23.35425 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1291 | avg-tsz: 227 | max-tsz: 340 | hrs: 28.70 | thrpt(sec/sec): 176.60 epoch: 12 | nupdates: 40000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:21 | bch(ms): 561.32 | smp(ms): 394.20 | fwd(ms): 52.10 | crit-fwd(ms): 2.77 | bwd(ms): 102.29 | optim(ms): 12.54 | loss: 32.20609 | train-TER: 95.48 | train-WER: 99.93 | dev-clean-loss: 23.07049 | dev-clean-TER: 99.08 | dev-clean-WER: 99.97 | avg-isz: 1259 | avg-tsz: 222 | max-tsz: 400 | hrs: 27.98 | thrpt(sec/sec): 179.45 epoch: 12 | nupdates: 41000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:21 | bch(ms): 561.74 | smp(ms): 395.54 | fwd(ms): 51.89 | crit-fwd(ms): 2.71 | bwd(ms): 101.67 | optim(ms): 12.44 | loss: 32.09764 | train-TER: 97.70 | train-WER: 99.68 | dev-clean-loss: 23.07327 | dev-clean-TER: 97.84 | dev-clean-WER: 99.59 | avg-isz: 1249 | avg-tsz: 220 | max-tsz: 340 | hrs: 27.77 | thrpt(sec/sec): 177.96 epoch: 12 | nupdates: 42000 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:09:37 | bch(ms): 577.11 | smp(ms): 407.44 | fwd(ms): 52.96 | crit-fwd(ms): 2.83 | bwd(ms): 103.93 | optim(ms): 12.58 | loss: 32.59089 | train-TER: 97.66 | train-WER: 99.48 | dev-clean-loss: 23.19812 | dev-clean-TER: 94.44 | dev-clean-WER: 98.99 | avg-isz: 1278 | avg-tsz: 225 | max-tsz: 400 | hrs: 28.41 | thrpt(sec/sec): 177.24
comments:
the WER and TER do not change during training in both sets. which parameters should be change to get a better WER.
Your loss is going down, just very slowly. Try to increase learning rate, say try 1.0 and 2.0
I changed the learning rate to 1.0 and 2.0 as you said for both models (character based model vs word-piece based model).
Here are the results for character based model
a) free lexicon (character based model) with lr=1.0: epoch: 1 | nupdates: 17578 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:51:19 | bch(ms): 584.79 | smp(ms): 191.77 | fwd(ms): 49.26 | crit-fwd(ms): 2.62 | bwd(ms): 331.69 | optim(ms): 11.50 | loss: 35.86607 | train-TER: 93.83 | train-WER: 100.03 | dev-clean-loss: 23.91013 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | dev-other-loss: 22.55607 | dev-other-TER: 98.95 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 336.12 epoch: 2 | nupdates: 35156 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:51:16 | bch(ms): 584.64 | smp(ms): 189.63 | fwd(ms): 49.22 | crit-fwd(ms): 2.63 | bwd(ms): 332.64 | optim(ms): 11.54 | loss: 36.05096 | train-TER: 95.62 | train-WER: 100.26 | dev-clean-loss: 29.09443 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | dev-other-loss: 27.40213 | dev-other-TER: 98.95 | dev-other-WER: 98.43 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 336.21 epoch: 3 | nupdates: 52734 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:51:00 | bch(ms): 583.73 | smp(ms): 190.05 | fwd(ms): 49.21 | crit-fwd(ms): 2.62 | bwd(ms): 332.50 | optim(ms): 11.53 | loss: 36.11816 | train-TER: 94.46 | train-WER: 100.63 | dev-clean-loss: 28.67924 | dev-clean-TER: 92.65 | dev-clean-WER: 100.00 | dev-other-loss: 27.22012 | dev-other-TER: 91.88 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 336.74 epoch: 4 | nupdates: 70312 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:52:57 | bch(ms): 590.35 | smp(ms): 193.64 | fwd(ms): 49.24 | crit-fwd(ms): 2.62 | bwd(ms): 334.44 | optim(ms): 11.60 | loss: 36.17968 | train-TER: 95.51 | train-WER: 100.76 | dev-clean-loss: 31.38752 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | dev-other-loss: 29.56150 | dev-other-TER: 100.00 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 332.96 epoch: 5 | nupdates: 87890 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:53:11 | bch(ms): 591.16 | smp(ms): 195.98 | fwd(ms): 49.25 | crit-fwd(ms): 2.63 | bwd(ms): 333.80 | optim(ms): 11.62 | loss: 36.07346 | train-TER: 95.08 | train-WER: 100.46 | dev-clean-loss: 23.28692 | dev-clean-TER: 94.02 | dev-clean-WER: 100.02 | dev-other-loss: 22.03997 | dev-other-TER: 93.24 | dev-other-WER: 100.01 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 332.50 epoch: 6 | nupdates: 105468 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:51:50 | bch(ms): 586.57 | smp(ms): 192.37 | fwd(ms): 49.22 | crit-fwd(ms): 2.62 | bwd(ms): 331.36 | optim(ms): 11.59 | loss: 36.12878 | train-TER: 95.59 | train-WER: 100.34 | dev-clean-loss: 24.88455 | dev-clean-TER: 98.22 | dev-clean-WER: 100.00 | dev-other-loss: 23.53914 | dev-other-TER: 97.97 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 335.11 epoch: 7 | nupdates: 123046 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:51:45 | bch(ms): 586.27 | smp(ms): 191.51 | fwd(ms): 49.20 | crit-fwd(ms): 2.62 | bwd(ms): 331.50 | optim(ms): 11.59 | loss: 36.15675 | train-TER: 95.24 | train-WER: 100.19 | dev-clean-loss: 43.52873 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | dev-other-loss: 41.11164 | dev-other-TER: 98.95 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 335.28 epoch: 8 | nupdates: 140624 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:53:51 | bch(ms): 593.45 | smp(ms): 195.02 | fwd(ms): 49.26 | crit-fwd(ms): 2.64 | bwd(ms): 334.40 | optim(ms): 11.67 | loss: 36.08983 | train-TER: 95.26 | train-WER: 99.98 | dev-clean-loss: 24.26945 | dev-clean-TER: 98.81 | dev-clean-WER: 100.00 | dev-other-loss: 22.93704 | dev-other-TER: 98.59 | dev-other-WER: 100.00 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 331.22 epoch: 9 | nupdates: 158202 | lr: 1.000000 | lrcriterion: 0.000000 | runtime: 02:54:31 | bch(ms): 595.70 | smp(ms): 197.50 | fwd(ms): 49.30 | crit-fwd(ms): 2.64 | bwd(ms): 334.96 | optim(ms): 11.68 | loss: 36.18343 | train-TER: 94.71 | train-WER: 100.55 | dev-clean-loss: 23.13983 | dev-clean-TER: 99.08 | dev-clean-WER: 99.44 | dev-other-loss: 21.87959 | dev-other-TER: 98.95 | dev-other-WER: 99.37 | avg-isz: 1228 | avg-tsz: 216 | max-tsz: 484 | hrs: 959.77 | thrpt(sec/sec): 329.97
b) free lexicon (character based model) with lr=2.0:
epoch: 1 | nupdates: 1000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:20 | bch(ms): 440.67 | smp(ms): 282.28 | fwd(ms): 50.00 | crit-fwd(ms): 2.57 | bwd(ms): 97.14 | optim(ms): 11.08 | loss: 70.31400 | train-TER: 97.93 | train-WER: 100.06 | dev-clean-loss: 114.64851 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1240 | avg-tsz: 218 | max-tsz: 400 | hrs: 27.57 | thrpt(sec/sec): 225.21 epoch: 1 | nupdates: 2000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:24 | bch(ms): 444.31 | smp(ms): 283.62 | fwd(ms): 50.01 | crit-fwd(ms): 2.61 | bwd(ms): 99.34 | optim(ms): 11.17 | loss: 77.84458 | train-TER: 98.14 | train-WER: 99.66 | dev-clean-loss: 35.89710 | dev-clean-TER: 97.31 | dev-clean-WER: 100.00 | avg-isz: 1265 | avg-tsz: 223 | max-tsz: 331 | hrs: 28.13 | thrpt(sec/sec): 227.90 epoch: 1 | nupdates: 3000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:29 | bch(ms): 449.70 | smp(ms): 288.20 | fwd(ms): 50.28 | crit-fwd(ms): 2.66 | bwd(ms): 99.86 | optim(ms): 11.19 | loss: 76.24076 | train-TER: 96.40 | train-WER: 99.93 | dev-clean-loss: 46.20064 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1274 | avg-tsz: 224 | max-tsz: 340 | hrs: 28.31 | thrpt(sec/sec): 226.66 epoch: 2 | nupdates: 4000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.46 | smp(ms): 296.14 | fwd(ms): 51.05 | crit-fwd(ms): 2.75 | bwd(ms): 100.92 | optim(ms): 11.17 | loss: 78.13109 | train-TER: 93.07 | train-WER: 100.74 | dev-clean-loss: 60.21857 | dev-clean-TER: 99.09 | dev-clean-WER: 99.97 | avg-isz: 1286 | avg-tsz: 227 | max-tsz: 340 | hrs: 28.59 | thrpt(sec/sec): 224.01 epoch: 2 | nupdates: 5000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.01 | smp(ms): 289.47 | fwd(ms): 49.39 | crit-fwd(ms): 2.60 | bwd(ms): 97.75 | optim(ms): 11.23 | loss: 78.09015 | train-TER: 95.83 | train-WER: 99.76 | dev-clean-loss: 24.35269 | dev-clean-TER: 85.33 | dev-clean-WER: 100.73 | avg-isz: 1242 | avg-tsz: 220 | max-tsz: 338 | hrs: 27.61 | thrpt(sec/sec): 221.84 epoch: 2 | nupdates: 6000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:26 | bch(ms): 446.62 | smp(ms): 285.95 | fwd(ms): 49.96 | crit-fwd(ms): 2.66 | bwd(ms): 99.25 | optim(ms): 11.28 | loss: 76.36320 | train-TER: 94.17 | train-WER: 102.82 | dev-clean-loss: 79.89963 | dev-clean-TER: 99.18 | dev-clean-WER: 100.00 | avg-isz: 1264 | avg-tsz: 222 | max-tsz: 400 | hrs: 28.10 | thrpt(sec/sec): 226.48 epoch: 2 | nupdates: 7000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.92 | smp(ms): 296.19 | fwd(ms): 51.00 | crit-fwd(ms): 2.69 | bwd(ms): 101.06 | optim(ms): 11.50 | loss: 81.61852 | train-TER: 96.22 | train-WER: 99.73 | dev-clean-loss: 50.72168 | dev-clean-TER: 90.99 | dev-clean-WER: 100.00 | avg-isz: 1288 | avg-tsz: 227 | max-tsz: 337 | hrs: 28.63 | thrpt(sec/sec): 224.09 epoch: 3 | nupdates: 8000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:34 | bch(ms): 454.97 | smp(ms): 293.30 | fwd(ms): 50.49 | crit-fwd(ms): 2.71 | bwd(ms): 99.79 | optim(ms): 11.21 | loss: 77.99796 | train-TER: 99.15 | train-WER: 99.94 | dev-clean-loss: 50.79997 | dev-clean-TER: 97.57 | dev-clean-WER: 100.00 | avg-isz: 1271 | avg-tsz: 224 | max-tsz: 340 | hrs: 28.25 | thrpt(sec/sec): 223.51 epoch: 3 | nupdates: 9000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:27 | bch(ms): 447.29 | smp(ms): 288.64 | fwd(ms): 49.34 | crit-fwd(ms): 2.56 | bwd(ms): 97.92 | optim(ms): 11.21 | loss: 78.76849 | train-TER: 98.54 | train-WER: 99.69 | dev-clean-loss: 39.74490 | dev-clean-TER: 94.23 | dev-clean-WER: 100.00 | avg-isz: 1245 | avg-tsz: 219 | max-tsz: 338 | hrs: 27.67 | thrpt(sec/sec): 222.71 epoch: 3 | nupdates: 10000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:34 | bch(ms): 454.40 | smp(ms): 291.17 | fwd(ms): 50.92 | crit-fwd(ms): 2.77 | bwd(ms): 100.81 | optim(ms): 11.29 | loss: 79.76194 | train-TER: 93.96 | train-WER: 99.90 | dev-clean-loss: 55.72755 | dev-clean-TER: 96.48 | dev-clean-WER: 100.00 | avg-isz: 1285 | avg-tsz: 226 | max-tsz: 331 | hrs: 28.56 | thrpt(sec/sec): 226.26 epoch: 4 | nupdates: 11000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:38 | bch(ms): 458.57 | smp(ms): 295.87 | fwd(ms): 50.69 | crit-fwd(ms): 2.69 | bwd(ms): 100.57 | optim(ms): 11.26 | loss: 80.32580 | train-TER: 93.77 | train-WER: 100.57 | dev-clean-loss: 38.92238 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1281 | avg-tsz: 226 | max-tsz: 400 | hrs: 28.47 | thrpt(sec/sec): 223.54 epoch: 4 | nupdates: 12000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.63 | smp(ms): 289.61 | fwd(ms): 49.54 | crit-fwd(ms): 2.62 | bwd(ms): 98.05 | optim(ms): 11.25 | loss: 79.87906 | train-TER: 96.20 | train-WER: 99.77 | dev-clean-loss: 43.59809 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1246 | avg-tsz: 220 | max-tsz: 400 | hrs: 27.71 | thrpt(sec/sec): 222.32 epoch: 4 | nupdates: 13000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:32 | bch(ms): 452.28 | smp(ms): 291.91 | fwd(ms): 49.90 | crit-fwd(ms): 2.61 | bwd(ms): 99.04 | optim(ms): 11.25 | loss: 80.79181 | train-TER: 95.95 | train-WER: 100.00 | dev-clean-loss: 25.04029 | dev-clean-TER: 81.98 | dev-clean-WER: 99.79 | avg-isz: 1261 | avg-tsz: 222 | max-tsz: 340 | hrs: 28.03 | thrpt(sec/sec): 223.13 epoch: 4 | nupdates: 14000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:37 | bch(ms): 457.53 | smp(ms): 294.51 | fwd(ms): 50.84 | crit-fwd(ms): 2.70 | bwd(ms): 100.71 | optim(ms): 11.30 | loss: 79.33066 | train-TER: 97.48 | train-WER: 99.96 | dev-clean-loss: 54.05744 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1285 | avg-tsz: 227 | max-tsz: 338 | hrs: 28.56 | thrpt(sec/sec): 224.69 epoch: 5 | nupdates: 15000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:34 | bch(ms): 454.64 | smp(ms): 293.68 | fwd(ms): 50.12 | crit-fwd(ms): 2.68 | bwd(ms): 99.35 | optim(ms): 11.31 | loss: 80.20424 | train-TER: 97.57 | train-WER: 99.93 | dev-clean-loss: 80.59340 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1265 | avg-tsz: 223 | max-tsz: 340 | hrs: 28.12 | thrpt(sec/sec): 222.65 epoch: 5 | nupdates: 16000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.42 | smp(ms): 288.97 | fwd(ms): 49.56 | crit-fwd(ms): 2.55 | bwd(ms): 98.49 | optim(ms): 11.24 | loss: 79.17398 | train-TER: 97.07 | train-WER: 99.88 | dev-clean-loss: 24.21229 | dev-clean-TER: 84.64 | dev-clean-WER: 97.89 | avg-isz: 1253 | avg-tsz: 221 | max-tsz: 336 | hrs: 27.86 | thrpt(sec/sec): 223.64 epoch: 5 | nupdates: 17000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.25 | smp(ms): 289.49 | fwd(ms): 50.01 | crit-fwd(ms): 2.64 | bwd(ms): 99.26 | optim(ms): 11.30 | loss: 79.83188 | train-TER: 93.87 | train-WER: 100.38 | dev-clean-loss: 23.34992 | dev-clean-TER: 95.72 | dev-clean-WER: 100.01 | avg-isz: 1267 | avg-tsz: 223 | max-tsz: 400 | hrs: 28.17 | thrpt(sec/sec): 225.21 epoch: 6 | nupdates: 18000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:37 | bch(ms): 457.68 | smp(ms): 294.34 | fwd(ms): 50.94 | crit-fwd(ms): 2.67 | bwd(ms): 100.91 | optim(ms): 11.32 | loss: 79.42595 | train-TER: 96.72 | train-WER: 99.61 | dev-clean-loss: 47.34291 | dev-clean-TER: 99.09 | dev-clean-WER: 99.97 | avg-isz: 1287 | avg-tsz: 226 | max-tsz: 335 | hrs: 28.60 | thrpt(sec/sec): 224.98 epoch: 6 | nupdates: 19000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.36 | smp(ms): 289.62 | fwd(ms): 49.48 | crit-fwd(ms): 2.63 | bwd(ms): 97.90 | optim(ms): 11.18 | loss: 77.76990 | train-TER: 96.37 | train-WER: 99.55 | dev-clean-loss: 23.45342 | dev-clean-TER: 98.21 | dev-clean-WER: 99.74 | avg-isz: 1243 | avg-tsz: 219 | max-tsz: 331 | hrs: 27.64 | thrpt(sec/sec): 221.95 epoch: 6 | nupdates: 20000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:31 | bch(ms): 451.37 | smp(ms): 291.19 | fwd(ms): 49.78 | crit-fwd(ms): 2.62 | bwd(ms): 98.85 | optim(ms): 11.35 | loss: 80.06844 | train-TER: 94.24 | train-WER: 103.60 | dev-clean-loss: 65.97295 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1260 | avg-tsz: 223 | max-tsz: 400 | hrs: 28.01 | thrpt(sec/sec): 223.41 epoch: 6 | nupdates: 21000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:37 | bch(ms): 458.00 | smp(ms): 294.75 | fwd(ms): 50.84 | crit-fwd(ms): 2.71 | bwd(ms): 100.89 | optim(ms): 11.28 | loss: 79.39963 | train-TER: 90.41 | train-WER: 102.33 | dev-clean-loss: 32.86953 | dev-clean-TER: 91.10 | dev-clean-WER: 100.01 | avg-isz: 1285 | avg-tsz: 226 | max-tsz: 340 | hrs: 28.58 | thrpt(sec/sec): 224.62 epoch: 7 | nupdates: 22000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:36 | bch(ms): 456.52 | smp(ms): 294.19 | fwd(ms): 50.65 | crit-fwd(ms): 2.67 | bwd(ms): 100.27 | optim(ms): 11.26 | loss: 81.09687 | train-TER: 97.64 | train-WER: 99.96 | dev-clean-loss: 31.68626 | dev-clean-TER: 98.21 | dev-clean-WER: 100.00 | avg-isz: 1279 | avg-tsz: 225 | max-tsz: 400 | hrs: 28.43 | thrpt(sec/sec): 224.15 epoch: 7 | nupdates: 23000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:26 | bch(ms): 446.01 | smp(ms): 287.57 | fwd(ms): 49.27 | crit-fwd(ms): 2.56 | bwd(ms): 97.66 | optim(ms): 11.33 | loss: 78.08577 | train-TER: 91.59 | train-WER: 101.94 | dev-clean-loss: 31.55287 | dev-clean-TER: 98.17 | dev-clean-WER: 100.00 | avg-isz: 1243 | avg-tsz: 219 | max-tsz: 336 | hrs: 27.62 | thrpt(sec/sec): 222.98 epoch: 7 | nupdates: 24000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:33 | bch(ms): 453.67 | smp(ms): 292.49 | fwd(ms): 50.20 | crit-fwd(ms): 2.67 | bwd(ms): 99.58 | optim(ms): 11.20 | loss: 78.84847 | train-TER: 93.89 | train-WER: 101.18 | dev-clean-loss: 112.90509 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1270 | avg-tsz: 224 | max-tsz: 338 | hrs: 28.23 | thrpt(sec/sec): 224.02 epoch: 8 | nupdates: 25000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.78 | smp(ms): 295.43 | fwd(ms): 51.19 | crit-fwd(ms): 2.70 | bwd(ms): 101.52 | optim(ms): 11.43 | loss: 82.14781 | train-TER: 93.18 | train-WER: 102.36 | dev-clean-loss: 24.40559 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1295 | avg-tsz: 228 | max-tsz: 337 | hrs: 28.80 | thrpt(sec/sec): 225.47 epoch: 8 | nupdates: 26000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:25 | bch(ms): 445.27 | smp(ms): 287.66 | fwd(ms): 49.09 | crit-fwd(ms): 2.56 | bwd(ms): 97.13 | optim(ms): 11.20 | loss: 79.24170 | train-TER: 96.06 | train-WER: 99.91 | dev-clean-loss: 65.59103 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1236 | avg-tsz: 218 | max-tsz: 335 | hrs: 27.47 | thrpt(sec/sec): 222.14 epoch: 8 | nupdates: 27000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:27 | bch(ms): 447.74 | smp(ms): 288.73 | fwd(ms): 49.53 | crit-fwd(ms): 2.58 | bwd(ms): 98.17 | optim(ms): 11.11 | loss: 81.23465 | train-TER: 93.51 | train-WER: 104.37 | dev-clean-loss: 82.15044 | dev-clean-TER: 99.07 | dev-clean-WER: 100.00 | avg-isz: 1251 | avg-tsz: 220 | max-tsz: 336 | hrs: 27.81 | thrpt(sec/sec): 223.61 epoch: 8 | nupdates: 28000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:36 | bch(ms): 456.94 | smp(ms): 294.28 | fwd(ms): 50.66 | crit-fwd(ms): 2.69 | bwd(ms): 100.52 | optim(ms): 11.28 | loss: 82.82159 | train-TER: 93.99 | train-WER: 102.50 | dev-clean-loss: 118.46701 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1284 | avg-tsz: 226 | max-tsz: 400 | hrs: 28.55 | thrpt(sec/sec): 224.92 epoch: 9 | nupdates: 29000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.86 | smp(ms): 296.65 | fwd(ms): 50.90 | crit-fwd(ms): 2.69 | bwd(ms): 100.86 | optim(ms): 11.25 | loss: 80.23945 | train-TER: 95.20 | train-WER: 99.74 | dev-clean-loss: 70.43391 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1286 | avg-tsz: 226 | max-tsz: 338 | hrs: 28.59 | thrpt(sec/sec): 223.79 epoch: 9 | nupdates: 30000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.88 | smp(ms): 289.51 | fwd(ms): 49.58 | crit-fwd(ms): 2.64 | bwd(ms): 98.28 | optim(ms): 11.31 | loss: 80.16855 | train-TER: 92.10 | train-WER: 99.41 | dev-clean-loss: 32.49803 | dev-clean-TER: 89.91 | dev-clean-WER: 100.00 | avg-isz: 1252 | avg-tsz: 221 | max-tsz: 331 | hrs: 27.84 | thrpt(sec/sec): 223.24 epoch: 9 | nupdates: 31000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.56 | smp(ms): 289.45 | fwd(ms): 49.60 | crit-fwd(ms): 2.60 | bwd(ms): 98.06 | optim(ms): 11.25 | loss: 79.22703 | train-TER: 93.12 | train-WER: 102.34 | dev-clean-loss: 23.54675 | dev-clean-TER: 98.21 | dev-clean-WER: 99.74 | avg-isz: 1251 | avg-tsz: 221 | max-tsz: 400 | hrs: 27.81 | thrpt(sec/sec): 223.16 epoch: 9 | nupdates: 32000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:42 | bch(ms): 462.52 | smp(ms): 297.67 | fwd(ms): 51.41 | crit-fwd(ms): 2.74 | bwd(ms): 101.90 | optim(ms): 11.34 | loss: 81.35484 | train-TER: 95.64 | train-WER: 101.29 | dev-clean-loss: 106.97868 | dev-clean-TER: 95.67 | dev-clean-WER: 100.00 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 336 | hrs: 28.90 | thrpt(sec/sec): 224.91 epoch: 10 | nupdates: 33000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:31 | bch(ms): 451.31 | smp(ms): 291.98 | fwd(ms): 49.59 | crit-fwd(ms): 2.59 | bwd(ms): 98.32 | optim(ms): 11.23 | loss: 80.56866 | train-TER: 94.75 | train-WER: 99.49 | dev-clean-loss: 51.62118 | dev-clean-TER: 89.76 | dev-clean-WER: 100.00 | avg-isz: 1251 | avg-tsz: 221 | max-tsz: 400 | hrs: 27.82 | thrpt(sec/sec): 221.88 epoch: 10 | nupdates: 34000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:27 | bch(ms): 447.42 | smp(ms): 289.02 | fwd(ms): 49.38 | crit-fwd(ms): 2.60 | bwd(ms): 97.67 | optim(ms): 11.15 | loss: 79.56533 | train-TER: 93.88 | train-WER: 99.78 | dev-clean-loss: 38.09843 | dev-clean-TER: 97.38 | dev-clean-WER: 100.00 | avg-isz: 1245 | avg-tsz: 220 | max-tsz: 338 | hrs: 27.68 | thrpt(sec/sec): 222.69 epoch: 10 | nupdates: 35000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.92 | smp(ms): 295.26 | fwd(ms): 51.38 | crit-fwd(ms): 2.79 | bwd(ms): 101.70 | optim(ms): 11.37 | loss: 82.55240 | train-TER: 96.52 | train-WER: 99.60 | dev-clean-loss: 36.27242 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 331 | hrs: 28.89 | thrpt(sec/sec): 226.14 epoch: 11 | nupdates: 36000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:40 | bch(ms): 460.51 | smp(ms): 298.29 | fwd(ms): 50.50 | crit-fwd(ms): 2.64 | bwd(ms): 100.23 | optim(ms): 11.32 | loss: 81.72366 | train-TER: 99.60 | train-WER: 99.52 | dev-clean-loss: 51.76071 | dev-clean-TER: 96.47 | dev-clean-WER: 100.00 | avg-isz: 1278 | avg-tsz: 224 | max-tsz: 331 | hrs: 28.40 | thrpt(sec/sec): 222.02 epoch: 11 | nupdates: 37000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.33 | smp(ms): 290.69 | fwd(ms): 49.61 | crit-fwd(ms): 2.59 | bwd(ms): 98.55 | optim(ms): 11.30 | loss: 78.84121 | train-TER: 96.02 | train-WER: 99.75 | dev-clean-loss: 134.50258 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1257 | avg-tsz: 222 | max-tsz: 338 | hrs: 27.94 | thrpt(sec/sec): 223.36 epoch: 11 | nupdates: 38000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:26 | bch(ms): 446.85 | smp(ms): 288.44 | fwd(ms): 49.41 | crit-fwd(ms): 2.59 | bwd(ms): 97.64 | optim(ms): 11.19 | loss: 79.47098 | train-TER: 96.98 | train-WER: 99.91 | dev-clean-loss: 39.25880 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1244 | avg-tsz: 219 | max-tsz: 335 | hrs: 27.65 | thrpt(sec/sec): 222.73 epoch: 11 | nupdates: 39000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:42 | bch(ms): 462.62 | smp(ms): 298.86 | fwd(ms): 51.06 | crit-fwd(ms): 2.71 | bwd(ms): 101.19 | optim(ms): 11.35 | loss: 79.18715 | train-TER: 97.95 | train-WER: 99.84 | dev-clean-loss: 38.98231 | dev-clean-TER: 95.83 | dev-clean-WER: 100.00 | avg-isz: 1291 | avg-tsz: 227 | max-tsz: 340 | hrs: 28.70 | thrpt(sec/sec): 223.32 epoch: 12 | nupdates: 40000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:32 | bch(ms): 452.61 | smp(ms): 292.92 | fwd(ms): 49.72 | crit-fwd(ms): 2.62 | bwd(ms): 98.44 | optim(ms): 11.33 | loss: 78.45905 | train-TER: 88.72 | train-WER: 100.33 | dev-clean-loss: 23.74148 | dev-clean-TER: 87.65 | dev-clean-WER: 100.47 | avg-isz: 1259 | avg-tsz: 222 | max-tsz: 400 | hrs: 27.98 | thrpt(sec/sec): 222.56 epoch: 12 | nupdates: 41000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.52 | smp(ms): 291.20 | fwd(ms): 49.65 | crit-fwd(ms): 2.63 | bwd(ms): 98.22 | optim(ms): 11.28 | loss: 80.94342 | train-TER: 96.27 | train-WER: 100.73 | dev-clean-loss: 28.18108 | dev-clean-TER: 94.86 | dev-clean-WER: 99.29 | avg-isz: 1249 | avg-tsz: 220 | max-tsz: 340 | hrs: 27.77 | thrpt(sec/sec): 221.89 epoch: 12 | nupdates: 42000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:33 | bch(ms): 453.58 | smp(ms): 291.64 | fwd(ms): 50.37 | crit-fwd(ms): 2.67 | bwd(ms): 100.04 | optim(ms): 11.33 | loss: 81.52560 | train-TER: 91.48 | train-WER: 102.58 | dev-clean-loss: 31.26792 | dev-clean-TER: 79.52 | dev-clean-WER: 100.39 | avg-isz: 1278 | avg-tsz: 225 | max-tsz: 400 | hrs: 28.41 | thrpt(sec/sec): 225.51 epoch: 13 | nupdates: 43000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:39 | bch(ms): 459.68 | smp(ms): 296.87 | fwd(ms): 50.77 | crit-fwd(ms): 2.70 | bwd(ms): 100.48 | optim(ms): 11.39 | loss: 80.26540 | train-TER: 96.45 | train-WER: 99.89 | dev-clean-loss: 24.28372 | dev-clean-TER: 88.70 | dev-clean-WER: 100.15 | avg-isz: 1282 | avg-tsz: 226 | max-tsz: 337 | hrs: 28.50 | thrpt(sec/sec): 223.23 epoch: 13 | nupdates: 44000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:27 | bch(ms): 447.62 | smp(ms): 288.32 | fwd(ms): 49.75 | crit-fwd(ms): 2.63 | bwd(ms): 98.15 | optim(ms): 11.21 | loss: 80.94006 | train-TER: 97.76 | train-WER: 99.77 | dev-clean-loss: 28.38509 | dev-clean-TER: 91.82 | dev-clean-WER: 100.13 | avg-isz: 1250 | avg-tsz: 220 | max-tsz: 340 | hrs: 27.78 | thrpt(sec/sec): 223.41 epoch: 13 | nupdates: 45000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:28 | bch(ms): 448.40 | smp(ms): 288.48 | fwd(ms): 49.73 | crit-fwd(ms): 2.59 | bwd(ms): 98.86 | optim(ms): 11.16 | loss: 80.46395 | train-TER: 97.61 | train-WER: 99.90 | dev-clean-loss: 58.06954 | dev-clean-TER: 96.48 | dev-clean-WER: 100.00 | avg-isz: 1256 | avg-tsz: 222 | max-tsz: 338 | hrs: 27.93 | thrpt(sec/sec): 224.22 epoch: 13 | nupdates: 46000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:38 | bch(ms): 458.51 | smp(ms): 295.67 | fwd(ms): 50.85 | crit-fwd(ms): 2.66 | bwd(ms): 100.61 | optim(ms): 11.21 | loss: 80.85963 | train-TER: 98.58 | train-WER: 99.96 | dev-clean-loss: 91.65457 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1286 | avg-tsz: 227 | max-tsz: 400 | hrs: 28.59 | thrpt(sec/sec): 224.44 epoch: 14 | nupdates: 47000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:41 | bch(ms): 461.43 | smp(ms): 298.13 | fwd(ms): 50.90 | crit-fwd(ms): 2.74 | bwd(ms): 100.87 | optim(ms): 11.35 | loss: 81.03719 | train-TER: 91.38 | train-WER: 101.37 | dev-clean-loss: 44.32747 | dev-clean-TER: 98.31 | dev-clean-WER: 100.00 | avg-isz: 1286 | avg-tsz: 226 | max-tsz: 329 | hrs: 28.60 | thrpt(sec/sec): 223.10 epoch: 14 | nupdates: 48000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.90 | smp(ms): 291.36 | fwd(ms): 49.70 | crit-fwd(ms): 2.61 | bwd(ms): 98.35 | optim(ms): 11.30 | loss: 78.07580 | train-TER: 94.12 | train-WER: 100.93 | dev-clean-loss: 27.15199 | dev-clean-TER: 96.60 | dev-clean-WER: 100.00 | avg-isz: 1253 | avg-tsz: 221 | max-tsz: 337 | hrs: 27.86 | thrpt(sec/sec): 222.40 epoch: 14 | nupdates: 49000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:33 | bch(ms): 453.55 | smp(ms): 293.60 | fwd(ms): 49.78 | crit-fwd(ms): 2.63 | bwd(ms): 98.67 | optim(ms): 11.29 | loss: 78.48264 | train-TER: 97.51 | train-WER: 99.67 | dev-clean-loss: 33.38644 | dev-clean-TER: 90.34 | dev-clean-WER: 100.00 | avg-isz: 1258 | avg-tsz: 221 | max-tsz: 340 | hrs: 27.98 | thrpt(sec/sec): 222.06 epoch: 15 | nupdates: 50000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:43 | bch(ms): 463.40 | smp(ms): 299.08 | fwd(ms): 51.33 | crit-fwd(ms): 2.83 | bwd(ms): 101.54 | optim(ms): 11.28 | loss: 81.23402 | train-TER: 96.01 | train-WER: 100.00 | dev-clean-loss: 71.76475 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1297 | avg-tsz: 229 | max-tsz: 400 | hrs: 28.84 | thrpt(sec/sec): 224.01 epoch: 15 | nupdates: 51000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:31 | bch(ms): 451.48 | smp(ms): 292.16 | fwd(ms): 49.70 | crit-fwd(ms): 2.63 | bwd(ms): 98.26 | optim(ms): 11.17 | loss: 81.94340 | train-TER: 92.75 | train-WER: 100.43 | dev-clean-loss: 108.17205 | dev-clean-TER: 99.15 | dev-clean-WER: 100.00 | avg-isz: 1252 | avg-tsz: 220 | max-tsz: 400 | hrs: 27.84 | thrpt(sec/sec): 222.02 epoch: 15 | nupdates: 52000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.93 | smp(ms): 291.84 | fwd(ms): 49.60 | crit-fwd(ms): 2.58 | bwd(ms): 98.18 | optim(ms): 11.12 | loss: 78.77276 | train-TER: 94.00 | train-WER: 101.38 | dev-clean-loss: 96.86416 | dev-clean-TER: 99.07 | dev-clean-WER: 100.00 | avg-isz: 1252 | avg-tsz: 221 | max-tsz: 335 | hrs: 27.83 | thrpt(sec/sec): 222.17 epoch: 15 | nupdates: 53000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:35 | bch(ms): 455.90 | smp(ms): 293.43 | fwd(ms): 50.61 | crit-fwd(ms): 2.71 | bwd(ms): 100.28 | optim(ms): 11.39 | loss: 79.97819 | train-TER: 95.20 | train-WER: 99.71 | dev-clean-loss: 41.27049 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | avg-isz: 1278 | avg-tsz: 225 | max-tsz: 336 | hrs: 28.42 | thrpt(sec/sec): 224.39 epoch: 16 | nupdates: 54000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:37 | bch(ms): 457.72 | smp(ms): 295.34 | fwd(ms): 50.73 | crit-fwd(ms): 2.72 | bwd(ms): 100.18 | optim(ms): 11.30 | loss: 79.15926 | train-TER: 99.30 | train-WER: 99.94 | dev-clean-loss: 41.66816 | dev-clean-TER: 92.68 | dev-clean-WER: 100.25 | avg-isz: 1277 | avg-tsz: 225 | max-tsz: 400 | hrs: 28.40 | thrpt(sec/sec): 223.35 epoch: 16 | nupdates: 55000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:26 | bch(ms): 446.81 | smp(ms): 289.72 | fwd(ms): 48.95 | crit-fwd(ms): 2.55 | bwd(ms): 96.75 | optim(ms): 11.21 | loss: 78.14921 | train-TER: 95.05 | train-WER: 101.92 | dev-clean-loss: 23.63302 | dev-clean-TER: 92.14 | dev-clean-WER: 98.65 | avg-isz: 1232 | avg-tsz: 217 | max-tsz: 340 | hrs: 27.39 | thrpt(sec/sec): 220.71 epoch: 16 | nupdates: 56000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:33 | bch(ms): 453.83 | smp(ms): 293.35 | fwd(ms): 49.96 | crit-fwd(ms): 2.62 | bwd(ms): 99.09 | optim(ms): 11.24 | loss: 80.67174 | train-TER: 97.23 | train-WER: 99.68 | dev-clean-loss: 41.78508 | dev-clean-TER: 94.17 | dev-clean-WER: 100.00 | avg-isz: 1263 | avg-tsz: 223 | max-tsz: 337 | hrs: 28.08 | thrpt(sec/sec): 222.76 epoch: 16 | nupdates: 57000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:44 | bch(ms): 464.56 | smp(ms): 298.97 | fwd(ms): 51.71 | crit-fwd(ms): 2.76 | bwd(ms): 102.44 | optim(ms): 11.27 | loss: 81.06220 | train-TER: 97.89 | train-WER: 99.60 | dev-clean-loss: 30.23222 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1308 | avg-tsz: 230 | max-tsz: 337 | hrs: 29.08 | thrpt(sec/sec): 225.38 epoch: 17 | nupdates: 58000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 450.90 | smp(ms): 291.74 | fwd(ms): 49.53 | crit-fwd(ms): 2.58 | bwd(ms): 98.26 | optim(ms): 11.20 | loss: 78.58495 | train-TER: 92.51 | train-WER: 99.83 | dev-clean-loss: 130.31794 | dev-clean-TER: 97.30 | dev-clean-WER: 97.95 | avg-isz: 1254 | avg-tsz: 221 | max-tsz: 400 | hrs: 27.87 | thrpt(sec/sec): 222.55 epoch: 17 | nupdates: 59000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:30 | bch(ms): 451.00 | smp(ms): 290.76 | fwd(ms): 49.92 | crit-fwd(ms): 2.61 | bwd(ms): 98.87 | optim(ms): 11.26 | loss: 80.08238 | train-TER: 86.26 | train-WER: 101.69 | dev-clean-loss: 54.10762 | dev-clean-TER: 99.10 | dev-clean-WER: 99.17 | avg-isz: 1261 | avg-tsz: 223 | max-tsz: 340 | hrs: 28.03 | thrpt(sec/sec): 223.77 epoch: 17 | nupdates: 60000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:35 | bch(ms): 455.45 | smp(ms): 293.48 | fwd(ms): 50.52 | crit-fwd(ms): 2.68 | bwd(ms): 99.92 | optim(ms): 11.31 | loss: 80.26236 | train-TER: 96.46 | train-WER: 101.03 | dev-clean-loss: 38.42426 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1275 | avg-tsz: 224 | max-tsz: 331 | hrs: 28.34 | thrpt(sec/sec): 223.99 epoch: 18 | nupdates: 61000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:38 | bch(ms): 458.51 | smp(ms): 296.17 | fwd(ms): 50.57 | crit-fwd(ms): 2.65 | bwd(ms): 100.14 | optim(ms): 11.43 | loss: 80.03189 | train-TER: 93.94 | train-WER: 99.91 | dev-clean-loss: 63.16160 | dev-clean-TER: 99.10 | dev-clean-WER: 99.17 | avg-isz: 1277 | avg-tsz: 225 | max-tsz: 337 | hrs: 28.39 | thrpt(sec/sec): 222.90 epoch: 18 | nupdates: 62000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:23 | bch(ms): 443.58 | smp(ms): 288.09 | fwd(ms): 48.37 | crit-fwd(ms): 2.52 | bwd(ms): 95.80 | optim(ms): 11.14 | loss: 80.08102 | train-TER: 94.12 | train-WER: 99.95 | dev-clean-loss: 57.71799 | dev-clean-TER: 99.10 | dev-clean-WER: 100.00 | avg-isz: 1219 | avg-tsz: 215 | max-tsz: 338 | hrs: 27.10 | thrpt(sec/sec): 219.97 epoch: 18 | nupdates: 63000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:35 | bch(ms): 455.20 | smp(ms): 293.54 | fwd(ms): 50.31 | crit-fwd(ms): 2.68 | bwd(ms): 99.84 | optim(ms): 11.31 | loss: 81.25633 | train-TER: 94.87 | train-WER: 99.62 | dev-clean-loss: 40.56295 | dev-clean-TER: 97.29 | dev-clean-WER: 100.00 | avg-isz: 1271 | avg-tsz: 223 | max-tsz: 336 | hrs: 28.26 | thrpt(sec/sec): 223.47 epoch: 18 | nupdates: 64000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:45 | bch(ms): 465.98 | smp(ms): 300.33 | fwd(ms): 51.71 | crit-fwd(ms): 2.81 | bwd(ms): 102.27 | optim(ms): 11.40 | loss: 81.18731 | train-TER: 94.87 | train-WER: 101.28 | dev-clean-loss: 24.49495 | dev-clean-TER: 97.45 | dev-clean-WER: 97.74 | avg-isz: 1308 | avg-tsz: 231 | max-tsz: 400 | hrs: 29.07 | thrpt(sec/sec): 224.60 epoch: 19 | nupdates: 65000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:07:31 | bch(ms): 451.38 | smp(ms): 292.33 | fwd(ms): 49.61 | crit-fwd(ms): 2.57 | bwd(ms): 97.98 | optim(ms): 11.30 | loss: 81.98406 | train-TER: 97.97 | train-WER: 99.87 | dev-clean-loss: 28.72688 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1248 | avg-tsz: 220 | max-tsz: 337 | hrs: 27.75 | thrpt(sec/sec): 221.28 epoch: 19 | nupdates: 66000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:32 | bch(ms): 392.66 | smp(ms): 236.87 | fwd(ms): 48.65 | crit-fwd(ms): 2.49 | bwd(ms): 96.63 | optim(ms): 10.34 | loss: 79.09839 | train-TER: 93.48 | train-WER: 102.27 | dev-clean-loss: 69.01516 | dev-clean-TER: 99.08 | dev-clean-WER: 99.99 | avg-isz: 1236 | avg-tsz: 218 | max-tsz: 400 | hrs: 27.48 | thrpt(sec/sec): 251.93 epoch: 19 | nupdates: 67000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:16 | bch(ms): 376.50 | smp(ms): 215.90 | fwd(ms): 50.34 | crit-fwd(ms): 2.59 | bwd(ms): 100.10 | optim(ms): 9.99 | loss: 81.07499 | train-TER: 95.08 | train-WER: 100.00 | dev-clean-loss: 32.69910 | dev-clean-TER: 96.23 | dev-clean-WER: 100.01 | avg-isz: 1284 | avg-tsz: 226 | max-tsz: 338 | hrs: 28.55 | thrpt(sec/sec): 273.01 epoch: 20 | nupdates: 68000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:21 | bch(ms): 381.67 | smp(ms): 219.09 | fwd(ms): 51.05 | crit-fwd(ms): 2.63 | bwd(ms): 101.35 | optim(ms): 10.02 | loss: 80.57166 | train-TER: 99.01 | train-WER: 100.00 | dev-clean-loss: 36.19828 | dev-clean-TER: 97.40 | dev-clean-WER: 100.00 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 340 | hrs: 28.89 | thrpt(sec/sec): 272.53 epoch: 20 | nupdates: 69000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:12 | bch(ms): 372.60 | smp(ms): 216.26 | fwd(ms): 48.95 | crit-fwd(ms): 2.48 | bwd(ms): 97.26 | optim(ms): 9.98 | loss: 77.83804 | train-TER: 97.90 | train-WER: 99.86 | dev-clean-loss: 47.56911 | dev-clean-TER: 98.27 | dev-clean-WER: 100.00 | avg-isz: 1247 | avg-tsz: 220 | max-tsz: 337 | hrs: 27.72 | thrpt(sec/sec): 267.82 epoch: 20 | nupdates: 70000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:12 | bch(ms): 372.70 | smp(ms): 215.66 | fwd(ms): 49.16 | crit-fwd(ms): 2.50 | bwd(ms): 97.73 | optim(ms): 10.00 | loss: 79.01311 | train-TER: 93.63 | train-WER: 102.74 | dev-clean-loss: 30.23248 | dev-clean-TER: 97.36 | dev-clean-WER: 98.94 | avg-isz: 1253 | avg-tsz: 221 | max-tsz: 340 | hrs: 27.86 | thrpt(sec/sec): 269.13 epoch: 20 | nupdates: 71000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:16 | bch(ms): 376.66 | smp(ms): 214.59 | fwd(ms): 50.85 | crit-fwd(ms): 2.62 | bwd(ms): 101.07 | optim(ms): 9.99 | loss: 80.81136 | train-TER: 97.04 | train-WER: 99.97 | dev-clean-loss: 25.25624 | dev-clean-TER: 89.18 | dev-clean-WER: 100.01 | avg-isz: 1300 | avg-tsz: 229 | max-tsz: 400 | hrs: 28.89 | thrpt(sec/sec): 276.11 epoch: 21 | nupdates: 72000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:19 | bch(ms): 379.85 | smp(ms): 220.57 | fwd(ms): 49.97 | crit-fwd(ms): 2.54 | bwd(ms): 99.14 | optim(ms): 10.02 | loss: 79.80162 | train-TER: 94.19 | train-WER: 98.95 | dev-clean-loss: 25.50102 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1271 | avg-tsz: 224 | max-tsz: 340 | hrs: 28.26 | thrpt(sec/sec): 267.86 epoch: 21 | nupdates: 73000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:11 | bch(ms): 371.71 | smp(ms): 215.62 | fwd(ms): 48.86 | crit-fwd(ms): 2.48 | bwd(ms): 97.08 | optim(ms): 10.00 | loss: 79.21552 | train-TER: 94.78 | train-WER: 99.81 | dev-clean-loss: 45.37678 | dev-clean-TER: 95.38 | dev-clean-WER: 100.00 | avg-isz: 1246 | avg-tsz: 220 | max-tsz: 400 | hrs: 27.71 | thrpt(sec/sec): 268.36 epoch: 21 | nupdates: 74000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:16 | bch(ms): 376.05 | smp(ms): 217.11 | fwd(ms): 49.82 | crit-fwd(ms): 2.54 | bwd(ms): 98.96 | optim(ms): 9.99 | loss: 80.06317 | train-TER: 94.53 | train-WER: 99.77 | dev-clean-loss: 63.00018 | dev-clean-TER: 95.84 | dev-clean-WER: 100.00 | avg-isz: 1271 | avg-tsz: 224 | max-tsz: 323 | hrs: 28.25 | thrpt(sec/sec): 270.40 epoch: 22 | nupdates: 75000 | lr: 2.000000 | lrcriterion: 0.000000 | runtime: 00:06:20 | bch(ms): 380.75 | smp(ms): 220.07 | fwd(ms): 50.40 | crit-fwd(ms): 2.60 | bwd(ms): 100.12 | optim(ms): 10.00 | loss: 80.27049 | train-TER: 97.41 | train-WER: 100.13 | dev-clean-loss: 77.67307 | dev-clean-TER: 99.09 | dev-clean-WER: 98.37 | avg-isz: 1286 | avg-tsz: 226 | max-tsz: 338 | hrs: 28.58 | thrpt(sec/sec): 270.21
as shown by the results, the WER is stuck into 100% and the loss is decreasing slowly.
I have anther question regarding the LR, Is it possible to use a LR larger than 1.0 ? , because it should be in the range between 0.0 and 1.0
I also tried to change the momentum to 0.8, but the loss become Nan. https://github.com/facebookresearch/wav2letter/issues/719#issuecomment-682351751 I am wondering why the configuration file and model from recipes do not give a good result as it shown in the paper.
thanks in advance and i am waiting for your reply.
I have the same issue in the Steaming Convnet model as in this comment. https://github.com/facebookresearch/wav2letter/issues/555#issuecomment-708277031
First, there is no any restrictions that lr should be [0, 1]. Second with lr = 1 and lr = 2 in your log loss actually doesn't decrease, which means that these values of lr are too high.
Why in you latest log it is different number of updates in epochs? Can you point what is the total batch size?
Recipes are provided for the full Librispeech, which can be different for the 100h or other sets (you need to tune optimization itself, probably also dropout of the model). Also we trained with specific total batch size which can also affects the optimization procedure.
cc @vineelpratap on optimization for streaming convnet arch.
thanks for your reply. In the second log "free lexicon (character based model) with lr=2.0", i set the reportiters = 1000 to fast visualization, and quick a look for the loss and WER to decide if i am on the right track to minimize the loss or not. i am using two GPUs (RTX 2080 TI) for training with a batch size=8 (Librispeech: train-clean-100).
In the first log "free lexicon (character based model) with lr=1.0", i used the whole dataset (Librispeech: train 960hr) with two GPUs (RTX 2080 TI) and BS=8.
I've had the same issue with the streaming_convnets word-piece model applied to LibriSpeech 100h, which I've managed to solve by tweaking the SpecAugment parameters in the architecture file.
First, I've tried removing SpecAugment by deleting the "SAUG 80 27 2 100 1.0 2" line in "am_500ms_future_context.arch". With this, the loss has started decreasing and the dev-clean WER has gone down to 26.35% by epoch 25.
To continue with, I've enabled back SpecAugment but using a milder configuration with less and smaller time and frequency masks: "SAUG 20 15 1 35 0.2 1". With such setting, the dev-clean WER has been reduced to 25.02% by epoch 25.
My guess is that milder versions of SpecAugment could help achieving a faster convergence for LibriSpeech 100h. The original version in the recipe model, known as LibriSpeech double (LD), was originally optimized for LibriSpeech 960h, and might be too much for LibriSpeech 100h.
Anyway, other optimizations could be possible, but I hope this one helps you to start tweaking around! I'm attaching a plot of the WER during the first 25 epochs for three SpecAugment configurations: the original one (LD), the milder version (LibriSpeech Mild = LM) and no SpecAugment.
