RESULTS 6.19 KB
Newer Older
Dan Povey's avatar
Dan Povey committed
1 2 3 4 5 6

# These results were obtained around svn revision 23 (just prior to
# tagging kaldi-1.0).
# Note: these results will vary somewhat from OS to OS, because
# some algorithms call rand().

Dan Povey's avatar
Dan Povey committed
7 8 9 10 11 12
First, comparing with published results

feb89 oct89 feb91 sep92   avg
  2.77 4.02 3.30 6.29 4.10  % from my ICASSP'99 paper on Frame Discrimination (ML baseline)
  3.20 4.10 2.86 6.06 4.06  % from decode_tri2c (which is triphone + CMN)

13 14 15 16 17 18 19 20 21 22
exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) # Monophone system, subset

exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533)    # First triphone pass
exp/decode_tri1_fmllr/wer:Average WER is 3.837868 (481 / 12533) # + fMLLR
exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regression-tree


exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533)  # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ fMLLR per utterance ]
23 24 25 26
exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533)  # + diagonal fMLLR
exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533)  # [ diagonal fMLLR per utterance]
exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533)  # diagonal fMLLR, then estimate fMLLR and re-decode

27 28 29
exp/decode_tri2b/wer:Average WER is 3.143701 (394 / 12533)  # Exponential transform
exp/decode_tri2b_fmllr/wer:Average WER is 3.055932 (383 / 12533)  # +fMLLR
exp/decode_tri2b_utt/wer:Average WER is 3.295300 (413 / 12533)  # [adapt per-utt]
30 31 32 33 34 35
exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC)
exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + LDA features
exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN
exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test
36 37
exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass)
exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533) 
38 39 40 41 42 43 44 45
exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533)  # feature-level VTLN  + diag fMLLR
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533)  # as above, per utt.
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 3.694247 (463 / 12533) # feature-level VTLN but no fMLLR

exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 3.981489 (499 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.853826 (483 / 12533) # Triple-deltas + LDA + MLLT
46 47 48 49 50 51


exp/decode_tri2k/wer:Average WER is 3.071890 (385 / 12533) # LDA + exponential transform
exp/decode_tri2k_utt/wer:Average WER is 3.039974 (381 / 12533)  # per-utterance adaptation
exp/decode_tri2k_fmllr/wer:Average WER is 2.641028 (331 / 12533) # fMLLR (per-spk)
exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.688901 (337 / 12533)  # +regression-tree
52 53 54 55 56 57 58 59 60 61 62 63

exp/decode_tri2l/wer:Average WER is 2.704859 (339 / 12533) # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 4.930982 (618 / 12533) # [ as decode_tri2l but per-utt in test. ]

# sgmma is SGMM without speaker vectors.
exp/decode_sgmma/wer:Average WER is 3.319237 (416 / 12533) 
exp/decode_sgmma_fmllr/wer:Average WER is 2.934308 (289 / 9849) 
exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.303279 (414 / 12533) 
exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 3.191574 (400 / 12533) 

# sgmmb is SGMM with speaker vectors.
exp/decode_sgmmb/wer:Average WER is 2.760712 (346 / 12533) 
64
exp/decode_sgmmb_fmllr/wer:Average WER is 2.585175 (324 / 12533) 
65 66
exp/decode_sgmmb_utt/wer:Average WER is 2.808585 (352 / 12533) 

67 68 69 70 71 72
# sgmmc is like sgmmb but with gender dependency
exp/decode_sgmmc/wer:Average WER is 2.696880 (338 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.457512 (308 / 12533) 
 # "norm" is normalizing weights per gender..
 exp/decode_sgmmc_norm/wer:Average WER is 2.696880 (338 / 12533) 
 exp/decode_sgmmc_fmllr_norm/wer:Average WER is 2.425596 (304 / 12533) 
73

74 75 76
# sgmmd is like sgmmb but with LDA+MLLT features.
exp/decode_sgmmd/wer:Average WER is 2.449533 (307 / 12533) 
exp/decode_sgmmd_fmllr/wer:Average WER is 2.305912 (289 / 12533) 
77 78 79



Dan Povey's avatar
Dan Povey committed
80

81 82
#### Note: stuff below this line may be out of date / not computed
# with most recent version of toolkit.
Dan Povey's avatar
Dan Povey committed
83 84 85 86 87
# note: when changing (phn,spk) dimensions from (40,39) -> (30,30),
# WER in decode_sgmmb/ went from 2.62 to 2.92
# when changing from (40,39) -> (50,39)  [40->50 on iter 3],
# WER in decode_sgmmb/ went from 2.62 to 2.66 [and test likelihood
# got worse].
Dan Povey's avatar
Dan Povey committed
88 89 90 91

# sgmmc is as sgmmb but with gender-dependent UBM, with 250
# Gaussians per gender instead of 400 Gaussians.  Note: use
# gender info in test.
Dan Povey's avatar
Dan Povey committed
92 93
exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533) 
exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533) 
Dan Povey's avatar
Dan Povey committed
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133


# notes on timing of training with ATLAS vs. MKL:
# all below are with -O0.
# tested time taken with "time steps/train_tri2a.sh"
#  [on svatava]: with "default" compile (which is 32-bit+ATLAS)
#   real    14m19.458s
#   user    15m38.695s
# 64-bit+ATLAS:
#
# 64-bit+MKL:
# real    12m45.664s
# user    13m53.770s
# + removed -O0 -DKALDI_PARANOID:
# [made almost no difference to training]:
# real    12m31.829s
# user    13m48.967s
# sys 0m28.146s

# 64-bit but ATLAS instead of MKL
# [and with default options, which includes: -O0 -DKALI_PARANOID].
#real    10m50.088s
#user    12m6.914s
#sys 0m17.419s
# Did this again:
#real    10m17.891s
#user    11m28.695s
#sys     0m14.087s

# But when I tested "fmllr-diag-gmm-test", all after removing
# the options -O0 -DKALDI_PARANOID, the ordering of timing was
# different:
# 64-bit+ATLAS was 0.361s
# 64-bit+MKL was 0.307s
# 32-bit+ATLAS was 0.571s

# Testing /homes/eva/q/qpovey/sourceforge/kaldi/trunk/src/gmm/am-diag-gmm-test:
# 64-bit+ATLAS was 0.171s
# 32-bit+ATLAS was 0.205s
# 64-bit+MKL was 0.291s