RESULTS 3.11 KB
Newer Older
Dan Povey's avatar
Dan Povey committed
1 2 3 4 5 6

# These results were obtained around svn revision 23 (just prior to
# tagging kaldi-1.0).
# Note: these results will vary somewhat from OS to OS, because
# some algorithms call rand().

Dan Povey's avatar
Dan Povey committed
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
First, comparing with published results

feb89 oct89 feb91 sep92   avg
  2.77 4.02 3.30 6.29 4.10  % from my ICASSP'99 paper on Frame Discrimination (ML baseline)
  3.20 4.10 2.86 6.06 4.06  % from decode_tri2c (which is triphone + CMN)

exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) 
exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533)   # First triphone pass
exp/decode_tri1_fmllr/wer:Average WER is 4.707572 (590 / 12533)  # + fMLLR
exp/decode_tri1_regtree_fmllr/wer:Average WER is 4.707572 (590 / 12533)  # + regression-tree
exp/decode_tri2a/wer:Average WER is 4.476183 (561 / 12533)  # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.718184 (466 / 12533)  # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 4.452246 (558 / 12533)  # [ fMLLR per utterance ]
exp/decode_tri2b/wer:Average WER is 3.103806 (389 / 12533)  # Exponential transform
exp/decode_tri2c/wer:Average WER is 3.789994 (475 / 12533)  # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.188941 (525 / 12533)  # MLLT (= global STC)
exp/decode_tri2e/wer:Average WER is 4.923003 (617 / 12533)  # splice-9-frames + LDA features
exp/decode_tri2f/wer:Average WER is 3.782015 (474 / 12533)  # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.670310 (460 / 12533)  # Linear VTLN (LVTLN); includes mean-only fMLLR
exp/decode_tri2g_diag/wer:Average WER is 3.550626 (445 / 12533) # +change mean-only to diagonal fMLLR
exp/decode_tri2g_vtln/wer:Average WER is 3.534668 (443 / 12533) # More conventional VTLN (+mean-only fMLLR)
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.438921 (431 / 12533) #+change mean-only to diagonal fMLLR
29
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.614458 (453 / 12533)  #[per-utt]
Dan Povey's avatar
Dan Povey committed
30 31 32 33 34
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 4.069257 (510 / 12533) # more conventional VTLN, no fMLLR
exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 4.077236 (511 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.694247 (463 / 12533) # Triple-deltas + LDA + MLLT
exp/decode_tri2k/wer:Average WER is 2.768691 (347 / 12533) # LDA + exponential transform
35
exp/decode_tri2k_utt/wer:Average WER is 3.024017 (379 / 12533)  # per-utterance adaptation.
Dan Povey's avatar
Dan Povey committed
36 37 38 39 40
exp/decode_tri2k_fmllr/wer:Average WER is 2.481449 (311 / 12533) # + fMLLR
exp/decode_tri2l/wer:Average WER is 2.688901 (337 / 12533)   # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 5.066624 (635 / 12533)  # [ as decode_tri2l but per-utt in test. ]


41
# sgmma is SGMM without speaker vectors.
42 43
exp/decode_sgmma/wer:Average WER is 3.151680 (395 / 12533) 
exp/decode_sgmma_fmllr/wer:Average WER is 2.768691 (347 / 12533) 
44 45

# sgmmb is SGMM with speaker vectors.
46
exp/decode_sgmmb/wer:Average WER is 2.680922 (336 / 12533) 
47
exp/decode_sgmmb_utt/wer:Average WER is 2.728796 (342 / 12533) 
Dan Povey's avatar
Dan Povey committed
48
exp/decode_sgmmb_fmllr/wer:Average WER is 2.553259 (320 / 12533)