RESULTS 7.97 KB
Newer Older
Dan Povey's avatar
Dan Povey committed
1 2 3 4 5 6

# These results were obtained around svn revision 23 (just prior to
# tagging kaldi-1.0).
# Note: these results will vary somewhat from OS to OS, because
# some algorithms call rand().

Dan Povey's avatar
Dan Povey committed
7 8 9 10 11 12
First, comparing with published results

feb89 oct89 feb91 sep92   avg
  2.77 4.02 3.30 6.29 4.10  % from my ICASSP'99 paper on Frame Discrimination (ML baseline)
  3.20 4.10 2.86 6.06 4.06  % from decode_tri2c (which is triphone + CMN)

13 14 15 16 17 18 19
exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) # Monophone system, subset

exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533)    # First triphone pass
exp/decode_tri1_fmllr/wer:Average WER is 3.837868 (481 / 12533) # + fMLLR
exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regression-tree


20 21 22 23 24 25 26 27 28 29 30 31 32
# Note: looks like we had not tuned the acwt...
svatava:s1: grep WER exp/decode_tri1_latgen/wer{,_?,_??}
exp/decode_tri1_latgen/wer:Average WER is 4.420330 (554 / 12533) 
exp/decode_tri1_latgen/wer_6:Average WER is 3.917657 (491 / 12533)
exp/decode_tri1_latgen/wer_7:Average WER is 3.813931 (478 / 12533)
exp/decode_tri1_latgen/wer_8:Average WER is 3.869784 (485 / 12533)
exp/decode_tri1_latgen/wer_9:Average WER is 4.013405 (503 / 12533)
exp/decode_tri1_latgen/wer_10:Average WER is 4.085215 (512 / 12533)
exp/decode_tri1_latgen/wer_11:Average WER is 4.188941 (525 / 12533)
exp/decode_tri1_latgen/wer_12:Average WER is 4.420330 (554 / 12533)
exp/decode_tri1_latgen/wer_13:Average WER is 4.555972 (571 / 12533)


33 34 35
exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533)  # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ fMLLR per utterance ]
36 37 38 39
exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533)  # + diagonal fMLLR
exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ diagonal fMLLR per utterance]
exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533)  # diagonal fMLLR, then estimate fMLLR and re-decode

40 41 42
exp/decode_tri2b/wer:Average WER is 3.143701 (394 / 12533)  # Exponential transform
exp/decode_tri2b_fmllr/wer:Average WER is 3.055932 (383 / 12533)  # +fMLLR
exp/decode_tri2b_utt/wer:Average WER is 3.295300 (413 / 12533)  # [adapt per-utt]
43 44 45 46 47 48
exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC)
exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + LDA features
exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN
exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test
49 50
exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass)
exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533) 
51 52 53 54 55 56 57 58
exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533)  # feature-level VTLN  + diag fMLLR
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533)  # as above, per utt.
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 3.694247 (463 / 12533) # feature-level VTLN but no fMLLR

exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 3.981489 (499 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.853826 (483 / 12533) # Triple-deltas + LDA + MLLT
59 60


61
exp/decode_tri2k/wer:Average WER is 3.071890 (385 / 12533) # LDA + exponential transform (ET)
62 63 64
exp/decode_tri2k_utt/wer:Average WER is 3.039974 (381 / 12533)  # per-utterance adaptation
exp/decode_tri2k_fmllr/wer:Average WER is 2.641028 (331 / 12533) # fMLLR (per-spk)
exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.688901 (337 / 12533)  # +regression-tree
65 66 67 68

exp/decode_tri2l/wer:Average WER is 2.704859 (339 / 12533) # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 4.930982 (618 / 12533) # [ as decode_tri2l but per-utt in test. ]

69 70 71 72 73 74 75 76 77 78 79 80
# linear-VTLN on top of LDA+MLLT features.
exp/decode_tri2m/wer:Average WER is 3.223490 (404 / 12533) # offset-only transform after VTLN part of LVTLN
exp/decode_tri2m_diag/wer:Average WER is 3.119764 (391 / 12533)  # diagonal transform after VTLN part of LVTLN
exp/decode_tri2m_diag_fmllr/wer:Average WER is 2.784649 (349 / 12533) # + fMLLR
exp/decode_tri2m_diag_utt/wer:Average WER is 3.279343 (411 / 12533)   # [per-utt]
exp/decode_tri2m_vtln/wer:Average WER is 4.747467 (595 / 12533)   # feature-space VTLN, plus offset-only transform
                                                                  #  (for some reason it failed)
exp/decode_tri2m_vtln_diag/wer:Average WER is 3.087848 (387 / 12533)  # + diagonal transform
exp/decode_tri2m_vtln_diag_utt/wer:Average WER is 4.340541 (544 / 12533)  # [per-utterance]
exp/decode_tri2m_vtln_nofmllr/wer:Average WER is 5.784728 (725 / 12533)  # feature-space VTLN, with no fMLLR


81 82 83 84 85 86 87 88
# sgmma is SGMM without speaker vectors.
exp/decode_sgmma/wer:Average WER is 3.319237 (416 / 12533) 
exp/decode_sgmma_fmllr/wer:Average WER is 2.934308 (289 / 9849) 
exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.303279 (414 / 12533) 
exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 3.191574 (400 / 12533) 

# sgmmb is SGMM with speaker vectors.
exp/decode_sgmmb/wer:Average WER is 2.760712 (346 / 12533) 
89
exp/decode_sgmmb_fmllr/wer:Average WER is 2.585175 (324 / 12533) 
90 91
exp/decode_sgmmb_utt/wer:Average WER is 2.808585 (352 / 12533) 

92 93 94 95 96 97
# sgmmc is like sgmmb but with gender dependency
exp/decode_sgmmc/wer:Average WER is 2.696880 (338 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.457512 (308 / 12533) 
 # "norm" is normalizing weights per gender..
 exp/decode_sgmmc_norm/wer:Average WER is 2.696880 (338 / 12533) 
 exp/decode_sgmmc_fmllr_norm/wer:Average WER is 2.425596 (304 / 12533) 
98

99 100 101
# sgmmd is like sgmmb but with LDA+MLLT features.
exp/decode_sgmmd/wer:Average WER is 2.449533 (307 / 12533) 
exp/decode_sgmmd_fmllr/wer:Average WER is 2.305912 (289 / 12533) 
102

103 104 105
# sgmme is like sgmmb but with LDA+ET features.
exp/decode_sgmme/wer:Average WER is 2.321870 (291 / 12533) 
exp/decode_sgmme_fmllr/wer:Average WER is 2.154313 (270 / 12533) 
Dan Povey's avatar
Dan Povey committed
106

107

108 109
#### Note: stuff below this line may be out of date / not computed
# with most recent version of toolkit.
Dan Povey's avatar
Dan Povey committed
110 111 112 113 114
# note: when changing (phn,spk) dimensions from (40,39) -> (30,30),
# WER in decode_sgmmb/ went from 2.62 to 2.92
# when changing from (40,39) -> (50,39)  [40->50 on iter 3],
# WER in decode_sgmmb/ went from 2.62 to 2.66 [and test likelihood
# got worse].
Dan Povey's avatar
Dan Povey committed
115 116 117 118

# sgmmc is as sgmmb but with gender-dependent UBM, with 250
# Gaussians per gender instead of 400 Gaussians.  Note: use
# gender info in test.
Dan Povey's avatar
Dan Povey committed
119 120
exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533) 
Dan Povey's avatar
Dan Povey committed
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160


# notes on timing of training with ATLAS vs. MKL:
# all below are with -O0.
# tested time taken with "time steps/train_tri2a.sh"
#  [on svatava]: with "default" compile (which is 32-bit+ATLAS)
#   real    14m19.458s
#   user    15m38.695s
# 64-bit+ATLAS:
#
# 64-bit+MKL:
# real    12m45.664s
# user    13m53.770s
# + removed -O0 -DKALDI_PARANOID:
# [made almost no difference to training]:
# real    12m31.829s
# user    13m48.967s
# sys 0m28.146s

# 64-bit but ATLAS instead of MKL
# [and with default options, which includes: -O0 -DKALI_PARANOID].
#real    10m50.088s
#user    12m6.914s
#sys 0m17.419s
# Did this again:
#real    10m17.891s
#user    11m28.695s
#sys     0m14.087s

# But when I tested "fmllr-diag-gmm-test", all after removing
# the options -O0 -DKALDI_PARANOID, the ordering of timing was
# different:
# 64-bit+ATLAS was 0.361s
# 64-bit+MKL was 0.307s
# 32-bit+ATLAS was 0.571s

# Testing /homes/eva/q/qpovey/sourceforge/kaldi/trunk/src/gmm/am-diag-gmm-test:
# 64-bit+ATLAS was 0.171s
# 32-bit+ATLAS was 0.205s
# 64-bit+MKL was 0.291s