# These results were obtained around svn revision 23 (just prior to # tagging kaldi-1.0). # Note: these results will vary somewhat from OS to OS, because # some algorithms call rand(). First, comparing with published results feb89 oct89 feb91 sep92 avg 2.77 4.02 3.30 6.29 4.10 % from my ICASSP'99 paper on Frame Discrimination (ML baseline) 3.20 4.10 2.86 6.06 4.06 % from decode_tri2c (which is triphone + CMN) exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) # Monophone system, subset exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533) # First triphone pass exp/decode_tri1_fmllr/wer:Average WER is 3.837868 (481 / 12533) # + fMLLR exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regression-tree exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533) # Second triphone pass exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ fMLLR per utterance ] exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533) # + diagonal fMLLR exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ diagonal fMLLR per utterance] exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533) # diagonal fMLLR, then estimate fMLLR and re-decode exp/decode_tri2b/wer:Average WER is 3.143701 (394 / 12533) # Exponential transform exp/decode_tri2b_fmllr/wer:Average WER is 3.055932 (383 / 12533) # +fMLLR exp/decode_tri2b_utt/wer:Average WER is 3.295300 (413 / 12533) # [adapt per-utt] exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk) exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC) exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + LDA features exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass) exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533) exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533) # feature-level VTLN + diag fMLLR exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533) # as above, per utt. exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 3.694247 (463 / 12533) # feature-level VTLN but no fMLLR exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA exp/decode_tri2i/wer:Average WER is 3.981489 (499 / 12533) # Triple-deltas + HLDA exp/decode_tri2j/wer:Average WER is 3.853826 (483 / 12533) # Triple-deltas + LDA + MLLT exp/decode_tri2k/wer:Average WER is 3.071890 (385 / 12533) # LDA + exponential transform exp/decode_tri2k_utt/wer:Average WER is 3.039974 (381 / 12533) # per-utterance adaptation exp/decode_tri2k_fmllr/wer:Average WER is 2.641028 (331 / 12533) # fMLLR (per-spk) exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.688901 (337 / 12533) # +regression-tree exp/decode_tri2l/wer:Average WER is 2.704859 (339 / 12533) # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test) exp/decode_tri2l_utt/wer:Average WER is 4.930982 (618 / 12533) # [ as decode_tri2l but per-utt in test. ] # sgmma is SGMM without speaker vectors. exp/decode_sgmma/wer:Average WER is 3.319237 (416 / 12533) exp/decode_sgmma_fmllr/wer:Average WER is 2.934308 (289 / 9849) exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.303279 (414 / 12533) exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 3.191574 (400 / 12533) # sgmmb is SGMM with speaker vectors. exp/decode_sgmmb/wer:Average WER is 2.760712 (346 / 12533) exp/decode_sgmmb_fmllr/wer:Average WER is 2.585175 (324 / 12533) exp/decode_sgmmb_utt/wer:Average WER is 2.808585 (352 / 12533) # sgmmc is like sgmmb but with gender dependency exp/decode_sgmmc/wer:Average WER is 2.696880 (338 / 12533) exp/decode_sgmmc_fmllr/wer:Average WER is 2.457512 (308 / 12533) # "norm" is normalizing weights per gender.. exp/decode_sgmmc_norm/wer:Average WER is 2.696880 (338 / 12533) exp/decode_sgmmc_fmllr_norm/wer:Average WER is 2.425596 (304 / 12533) # sgmmd is like sgmmb but with LDA+MLLT features. exp/decode_sgmmd/wer:Average WER is 2.449533 (307 / 12533) exp/decode_sgmmd_fmllr/wer:Average WER is 2.305912 (289 / 12533) #### Note: stuff below this line may be out of date / not computed # with most recent version of toolkit. # note: when changing (phn,spk) dimensions from (40,39) -> (30,30), # WER in decode_sgmmb/ went from 2.62 to 2.92 # when changing from (40,39) -> (50,39) [40->50 on iter 3], # WER in decode_sgmmb/ went from 2.62 to 2.66 [and test likelihood # got worse]. # sgmmc is as sgmmb but with gender-dependent UBM, with 250 # Gaussians per gender instead of 400 Gaussians. Note: use # gender info in test. exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533) exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533) # notes on timing of training with ATLAS vs. MKL: # all below are with -O0. # tested time taken with "time steps/train_tri2a.sh" # [on svatava]: with "default" compile (which is 32-bit+ATLAS) # real 14m19.458s # user 15m38.695s # 64-bit+ATLAS: # # 64-bit+MKL: # real 12m45.664s # user 13m53.770s # + removed -O0 -DKALDI_PARANOID: # [made almost no difference to training]: # real 12m31.829s # user 13m48.967s # sys 0m28.146s # 64-bit but ATLAS instead of MKL # [and with default options, which includes: -O0 -DKALI_PARANOID]. #real 10m50.088s #user 12m6.914s #sys 0m17.419s # Did this again: #real 10m17.891s #user 11m28.695s #sys 0m14.087s # But when I tested "fmllr-diag-gmm-test", all after removing # the options -O0 -DKALDI_PARANOID, the ordering of timing was # different: # 64-bit+ATLAS was 0.361s # 64-bit+MKL was 0.307s # 32-bit+ATLAS was 0.571s # Testing /homes/eva/q/qpovey/sourceforge/kaldi/trunk/src/gmm/am-diag-gmm-test: # 64-bit+ATLAS was 0.171s # 32-bit+ATLAS was 0.205s # 64-bit+MKL was 0.291s