RESULTS 9.1 KB
Newer Older
Dan Povey's avatar
Dan Povey committed
1 2 3 4 5 6

# These results were obtained around svn revision 23 (just prior to
# tagging kaldi-1.0).
# Note: these results will vary somewhat from OS to OS, because
# some algorithms call rand().

Dan Povey's avatar
Dan Povey committed
7 8 9 10 11 12
First, comparing with published results

feb89 oct89 feb91 sep92   avg
  2.77 4.02 3.30 6.29 4.10  % from my ICASSP'99 paper on Frame Discrimination (ML baseline)
  3.20 4.10 2.86 6.06 4.06  % from decode_tri2c (which is triphone + CMN)

13 14 15 16 17 18 19 20 21 22
exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) # Monophone system, subset

exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533)    # First triphone pass
exp/decode_tri1_fmllr/wer:Average WER is 3.837868 (481 / 12533) # + fMLLR
exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regression-tree


exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533)  # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ fMLLR per utterance ]
23 24 25 26
exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533)  # + diagonal fMLLR
exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ diagonal fMLLR per utterance]
exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533)  # diagonal fMLLR, then estimate fMLLR and re-decode

27
exp/decode_tri2b/wer:Average WER is 3.303279 (414 / 12533) # Exponential transform
28
exp/decode_tri2b_fmllr/wer:Average WER is 3.047953 (382 / 12533) # +fMLLR
29 30 31 32 33 34 35
exp/decode_tri2b_utt/wer:Average WER is 3.335195 (418 / 12533) # [adapt per-utt]
exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC)
exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + LDA features
exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN
exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test
36 37
exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass)
exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533) 
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533)  # feature-level VTLN  + diag fMLLR
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533)  # as above, per utt.
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 3.694247 (463 / 12533) # feature-level VTLN but no fMLLR

exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 3.981489 (499 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.853826 (483 / 12533) # Triple-deltas + LDA + MLLT
exp/decode_tri2k/wer:Average WER is 2.968164 (372 / 12533) # LDA + exponential transform
exp/decode_tri2k_utt/wer:Average WER is 3.175616 (398 / 12533) # per-utterance adaptation.
exp/decode_tri2k_fmllr/wer:Average WER is 2.505386 (314 / 12533) # +fMLLR (per-spk)
exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.513365 (315 / 12533) # +regression tree

exp/decode_tri2l/wer:Average WER is 2.704859 (339 / 12533) # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 4.930982 (618 / 12533) # [ as decode_tri2l but per-utt in test. ]

# sgmma is SGMM without speaker vectors.
exp/decode_sgmma/wer:Average WER is 3.319237 (416 / 12533) 
exp/decode_sgmma_fmllr/wer:Average WER is 2.934308 (289 / 9849) 
exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.303279 (414 / 12533) 
exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 3.191574 (400 / 12533) 

# sgmmb is SGMM with speaker vectors.
exp/decode_sgmmb/wer:Average WER is 2.760712 (346 / 12533) 
exp/decode_sgmmb_utt/wer:Average WER is 2.808585 (352 / 12533) 
exp/decode_sgmmb_fmllr/wer:Average WER is 2.553259 (320 / 12533) 


# sgmmc is like sgmmb but with gender dependency [doesn't help here]
exp/decode_sgmmc/wer:Average WER is 2.776670 (348 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.601133 (326 / 12533) 



exp/decode_tri2a/wer:Average WER is 4.476183 (561 / 12533)  
exp/decode_tri2a_fmllr/wer:Average WER is 3.718184 (466 / 12533)  
exp/decode_tri2a_fmllr_utt/wer:Average WER is 4.452246 (558 / 12533) 
exp/decode_tri2b/wer:Average WER is 2.992101 (375 / 12533)  
exp/decode_tri2b_utt/wer:Average WER is 3.247427 (407 / 12533)  
exp/decode_tri2c/wer:Average WER is 3.789994 (475 / 12533)  
exp/decode_tri2d/wer:Average WER is 4.188941 (525 / 12533)  
exp/decode_tri2e/wer:Average WER is 4.923003 (617 / 12533)  
exp/decode_tri2f/wer:Average WER is 3.782015 (474 / 12533)  
exp/decode_tri2g/wer:Average WER is 3.670310 (460 / 12533)  
Dan Povey's avatar
Dan Povey committed
82 83 84
exp/decode_tri2g_diag/wer:Average WER is 3.550626 (445 / 12533) # +change mean-only to diagonal fMLLR
exp/decode_tri2g_vtln/wer:Average WER is 3.534668 (443 / 12533) # More conventional VTLN (+mean-only fMLLR)
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.438921 (431 / 12533) #+change mean-only to diagonal fMLLR
85
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.614458 (453 / 12533)  #[per-utt]
Dan Povey's avatar
Dan Povey committed
86 87 88 89
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 4.069257 (510 / 12533) # more conventional VTLN, no fMLLR
exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 4.077236 (511 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.694247 (463 / 12533) # Triple-deltas + LDA + MLLT
Dan Povey's avatar
Dan Povey committed
90 91 92 93
exp/decode_tri2k/wer:Average WER is 2.856459 (358 / 12533) # LDA + exponential transform
exp/decode_tri2k_utt/wer:Average WER is 3.071890 (385 / 12533) # per-utterance adaptation.
exp/decode_tri2k_fmllr/wer:Average WER is 2.585175 (324 / 12533) # +fMLLR (per-spk)
exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.561238 (321 / 12533)  # +regression tree
Dan Povey's avatar
Dan Povey committed
94 95 96
exp/decode_tri2l/wer:Average WER is 2.688901 (337 / 12533)   # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 5.066624 (635 / 12533)  # [ as decode_tri2l but per-utt in test. ]

97 98
exp/decode_tri2m/wer:Average WER is 3.223490 (404 / 12533)  # Splice + LDA + MLLT + Linear VTLN
exp/decode_tri2m_diag/wer:Average WER is 3.119764 (391 / 12533) # diagonal not offset CMLLR component
99 100
exp/decode_tri2m_diag_fmllr/wer:Average WER is 2.784649 (349 / 12533)  # diagonal CMLLR component; then est. CMLLR and re-decode again.
exp/decode_tri2m_diag_utt/wer:Average WER is 3.279343 (411 / 12533)  # [per-utterance]
101 102 103 104
exp/decode_tri2m_vtln/wer:Average WER is 4.747467 (595 / 12533) # feature-level VTLN computation
exp/decode_tri2m_vtln_diag/wer:Average WER is 3.087848 (387 / 12533) # diagonal, not offset, adapt
exp/decode_tri2m_vtln_diag_utt/wer:Average WER is 4.340541 (544 / 12533) # per-utterance, diag adapt.

Dan Povey's avatar
Dan Povey committed
105

106
# sgmma is SGMM without speaker vectors.
107
exp/decode_sgmma/wer:Average WER is 3.151680 (395 / 12533) 
Dan Povey's avatar
Dan Povey committed
108 109 110
exp/decode_sgmma_fmllr/wer:Average WER is 2.728796 (342 / 12533) 
exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.087848 (387 / 12533) 
exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 2.896354 (363 / 12533) 
111 112

# sgmmb is SGMM with speaker vectors.
Dan Povey's avatar
Dan Povey committed
113 114 115 116
exp/decode_sgmmb/wer:Average WER is 2.617091 (328 / 12533) 
exp/decode_sgmmb_utt/wer:Average WER is 2.696880 (338 / 12533) 
exp/decode_sgmmb_fmllr/wer:Average WER is 2.505386 (314 / 12533) 

Dan Povey's avatar
Dan Povey committed
117 118 119 120
# sgmmc is like sgmmb but with gender dependency [doesn't help here]
exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533) 

Dan Povey's avatar
Dan Povey committed
121 122 123 124 125
# note: when changing (phn,spk) dimensions from (40,39) -> (30,30),
# WER in decode_sgmmb/ went from 2.62 to 2.92
# when changing from (40,39) -> (50,39)  [40->50 on iter 3],
# WER in decode_sgmmb/ went from 2.62 to 2.66 [and test likelihood
# got worse].
Dan Povey's avatar
Dan Povey committed
126 127 128 129

# sgmmc is as sgmmb but with gender-dependent UBM, with 250
# Gaussians per gender instead of 400 Gaussians.  Note: use
# gender info in test.
Dan Povey's avatar
Dan Povey committed
130 131
exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533) 
Dan Povey's avatar
Dan Povey committed
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171


# notes on timing of training with ATLAS vs. MKL:
# all below are with -O0.
# tested time taken with "time steps/train_tri2a.sh"
#  [on svatava]: with "default" compile (which is 32-bit+ATLAS)
#   real    14m19.458s
#   user    15m38.695s
# 64-bit+ATLAS:
#
# 64-bit+MKL:
# real    12m45.664s
# user    13m53.770s
# + removed -O0 -DKALDI_PARANOID:
# [made almost no difference to training]:
# real    12m31.829s
# user    13m48.967s
# sys 0m28.146s

# 64-bit but ATLAS instead of MKL
# [and with default options, which includes: -O0 -DKALI_PARANOID].
#real    10m50.088s
#user    12m6.914s
#sys 0m17.419s
# Did this again:
#real    10m17.891s
#user    11m28.695s
#sys     0m14.087s

# But when I tested "fmllr-diag-gmm-test", all after removing
# the options -O0 -DKALDI_PARANOID, the ordering of timing was
# different:
# 64-bit+ATLAS was 0.361s
# 64-bit+MKL was 0.307s
# 32-bit+ATLAS was 0.571s

# Testing /homes/eva/q/qpovey/sourceforge/kaldi/trunk/src/gmm/am-diag-gmm-test:
# 64-bit+ATLAS was 0.171s
# 32-bit+ATLAS was 0.205s
# 64-bit+MKL was 0.291s