Commit 1db07651 authored by Ho Yin Chan's avatar Ho Yin Chan
Browse files

trunk:egs/hkust/ some recent experiments setup and results

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@2906 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 573f2789
About HKUST Mandarin Telephone Speech
The data below were collected from Human Language Technology Center, HKUST
LDC2005S15 : http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2005S15
LDC2005T32 : http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2005T32
The data were collected from Human Language Technology Center, HKUST
s5: The experiments here were based on the above corpus
s5b: The experiments here were based on 255 hours mandarin telephone speech (Part of ears
mandarin telephone data + other telephone recordings) as described below:
Part of Ears - 793 sides of mandarin telephone conversations dated between
2004 and 2005, which comprises ~84 hours of speech data (in 8000Hz sampling rate
and 8 bits sampling depth).
Other telephone speech 1 - ~57000 recording segments from ~300 batches/speakers of
recording segments, which comprises ~66 hours of speech data.
Other telephone speech 2 - ~81000 recording segments with a unique of 1800
reading sentences recorded under telephone, which comprises ~105 hours of speech data.
987 recording segments from 2 speakers held out from training set were used for evaluation.
### 16k wordlist partial close LM
tri1/decode_eval/cer_10:%CER 50.28 [ 3802 / 7562, 1547 ins, 403 del, 1852 sub ]
tri2/decode_eval/cer_10:%CER 47.09 [ 3561 / 7562, 1405 ins, 414 del, 1742 sub ]
tri3a/decode_eval/cer_10:%CER 44.18 [ 3341 / 7562, 1113 ins, 441 del, 1787 sub ]
tri4a/decode_eval/cer_10:%CER 30.23 [ 2286 / 7562, 530 ins, 492 del, 1264 sub ]
tri4a_20k/decode_eval/cer_10:%CER 32.43 [ 2452 / 7562, 537 ins, 480 del, 1435 sub ]
tri5a/decode_eval/cer_10:%CER 28.89 [ 2185 / 7562, 498 ins, 517 del, 1170 sub ]
tri5a_fmmi_b0.1/decode_eval_iter1/cer_10:%CER 28.00 [ 2117 / 7562, 460 ins, 524 del, 1133 sub ]
tri5a_fmmi_b0.1/decode_eval_iter2/cer_10:%CER 27.47 [ 2077 / 7562, 438 ins, 548 del, 1091 sub ]
tri5a_fmmi_b0.1/decode_eval_iter3/cer_10:%CER 26.59 [ 2011 / 7562, 447 ins, 539 del, 1025 sub ]
tri5a_fmmi_b0.1/decode_eval_iter4/cer_10:%CER 29.91 [ 2262 / 7562, 619 ins, 516 del, 1127 sub ]
tri5a_fmmi_b0.1/decode_eval_iter5/cer_10:%CER 29.24 [ 2211 / 7562, 655 ins, 479 del, 1077 sub ]
tri5a_fmmi_b0.1/decode_eval_iter6/cer_10:%CER 27.10 [ 2049 / 7562, 552 ins, 483 del, 1014 sub ]
tri5a_fmmi_b0.1/decode_eval_iter7/cer_10:%CER 24.97 [ 1888 / 7562, 462 ins, 549 del, 877 sub ]
tri5a_fmmi_b0.1/decode_eval_iter8/cer_10:%CER 25.23 [ 1908 / 7562, 445 ins, 613 del, 850 sub ]
tri5a_mmi_b0.1/decode_eval1/cer_10:%CER 24.93 [ 1885 / 7562, 408 ins, 466 del, 1011 sub ]
tri5a_mmi_b0.1/decode_eval2/cer_10:%CER 23.25 [ 1758 / 7562, 370 ins, 486 del, 902 sub ]
tri5a_mmi_b0.1/decode_eval3/cer_10:%CER 23.64 [ 1788 / 7562, 402 ins, 501 del, 885 sub ]
tri5a_mmi_b0.1/decode_eval4/cer_10:%CER 23.58 [ 1783 / 7562, 392 ins, 561 del, 830 sub ]
sgmm_5a/decode_eval/cer_10:%CER 26.40 [ 1996 / 7562, 418 ins, 701 del, 877 sub ]
sgmm_5a_mmi_b0.1/decode_eval1/cer_10:%CER 24.93 [ 1885 / 7562, 401 ins, 597 del, 887 sub ]
sgmm_5a_mmi_b0.1/decode_eval2/cer_10:%CER 24.52 [ 1854 / 7562, 386 ins, 596 del, 872 sub ]
sgmm_5a_mmi_b0.1/decode_eval3/cer_10:%CER 23.79 [ 1799 / 7562, 378 ins, 593 del, 828 sub ]
sgmm_5a_mmi_b0.1/decode_eval4/cer_10:%CER 23.87 [ 1805 / 7562, 380 ins, 597 del, 828 sub ]
nnet_8m_6l/decode_eval_iter50/cer_10:%CER 33.25 [ 2514 / 7562, 435 ins, 750 del, 1329 sub ]
nnet_8m_6l/decode_eval_iter100/cer_10:%CER 30.40 [ 2299 / 7562, 543 ins, 476 del, 1280 sub ]
nnet_8m_6l/decode_eval_iter150/cer_10:%CER 26.74 [ 2022 / 7562, 423 ins, 578 del, 1021 sub ]
nnet_8m_6l/decode_eval_iter200/cer_10:%CER 26.20 [ 1981 / 7562, 421 ins, 546 del, 1014 sub ]
nnet_8m_6l/decode_eval_iter210/cer_10:%CER 26.62 [ 2013 / 7562, 436 ins, 569 del, 1008 sub ]
nnet_8m_6l/decode_eval_iter220/cer_10:%CER 26.41 [ 1997 / 7562, 412 ins, 545 del, 1040 sub ]
nnet_8m_6l/decode_eval_iter230/cer_10:%CER 26.98 [ 2040 / 7562, 435 ins, 614 del, 991 sub ]
nnet_8m_6l/decode_eval_iter240/cer_10:%CER 27.86 [ 2107 / 7562, 468 ins, 552 del, 1087 sub ]
nnet_8m_6l/decode_eval_iter250/cer_10:%CER 26.01 [ 1967 / 7562, 409 ins, 565 del, 993 sub ]
nnet_8m_6l/decode_eval_iter260/cer_10:%CER 26.61 [ 2012 / 7562, 419 ins, 555 del, 1038 sub ]
nnet_8m_6l/decode_eval_iter270/cer_10:%CER 25.72 [ 1945 / 7562, 405 ins, 533 del, 1007 sub ]
nnet_8m_6l/decode_eval_iter280/cer_10:%CER 27.43 [ 2074 / 7562, 424 ins, 605 del, 1045 sub ]
nnet_8m_6l/decode_eval_iter290/cer_10:%CER 26.37 [ 1994 / 7562, 410 ins, 572 del, 1012 sub ]
nnet_8m_6l/decode_eval/cer_10:%CER 25.55 [ 1932 / 7562, 405 ins, 549 del, 978 sub ]
### 16K wordlist close LM, the perplexity of the LM was optimized with the sentences of evaluation data
tri1/decode_eval_closelm/cer_10:%CER 46.69 [ 3531 / 7562, 1205 ins, 407 del, 1919 sub ]
tri2/decode_eval_closelm/cer_10:%CER 44.18 [ 3341 / 7562, 1136 ins, 421 del, 1784 sub ]
tri3a/decode_eval_closelm/cer_10:%CER 51.53 [ 3897 / 7562, 1218 ins, 467 del, 2212 sub ]
tri4a/decode_eval_closelm/cer_10:%CER 22.81 [ 1725 / 7562, 411 ins, 480 del, 834 sub ]
tri4a_20k/decode_eval_closelm/cer_10:%CER 25.17 [ 1903 / 7562, 439 ins, 467 del, 997 sub ]
tri5a/decode_eval_closelm/cer_10:%CER 22.60 [ 1709 / 7562, 384 ins, 520 del, 805 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter1/cer_10:%CER 21.81 [ 1649 / 7562, 363 ins, 524 del, 762 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter2/cer_10:%CER 21.17 [ 1601 / 7562, 358 ins, 487 del, 756 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter3/cer_10:%CER 21.81 [ 1649 / 7562, 387 ins, 473 del, 789 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter4/cer_10:%CER 27.07 [ 2047 / 7562, 519 ins, 493 del, 1035 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter5/cer_10:%CER 24.76 [ 1872 / 7562, 472 ins, 478 del, 922 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter6/cer_10:%CER 22.51 [ 1702 / 7562, 389 ins, 516 del, 797 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter7/cer_10:%CER 20.46 [ 1547 / 7562, 345 ins, 486 del, 716 sub ]
tri5a_fmmi_b0.1/decode_eval_closelm_iter8/cer_10:%CER 20.75 [ 1569 / 7562, 330 ins, 549 del, 690 sub ]
tri5a_mmi_b0.1/decode_eval_closelm1/cer_10:%CER 19.08 [ 1443 / 7562, 320 ins, 433 del, 690 sub ]
tri5a_mmi_b0.1/decode_eval_closelm2/cer_10:%CER 17.83 [ 1348 / 7562, 305 ins, 438 del, 605 sub ]
tri5a_mmi_b0.1/decode_eval_closelm3/cer_10:%CER 19.72 [ 1491 / 7562, 381 ins, 449 del, 661 sub ]
tri5a_mmi_b0.1/decode_eval_closelm4/cer_10:%CER 18.34 [ 1387 / 7562, 312 ins, 465 del, 610 sub ]
sgmm_5a/decode_eval_closelm/cer_10:%CER 23.00 [ 1739 / 7562, 473 ins, 633 del, 633 sub ]
sgmm_5a_mmi_b0.1/decode_eval_closelm1/cer_10:%CER 21.48 [ 1624 / 7562, 459 ins, 531 del, 634 sub ]
sgmm_5a_mmi_b0.1/decode_eval_closelm2/cer_10:%CER 21.17 [ 1601 / 7562, 449 ins, 530 del, 622 sub ]
sgmm_5a_mmi_b0.1/decode_eval_closelm3/cer_10:%CER 21.05 [ 1592 / 7562, 448 ins, 530 del, 614 sub ]
sgmm_5a_mmi_b0.1/decode_eval_closelm4/cer_10:%CER 21.03 [ 1590 / 7562, 446 ins, 530 del, 614 sub ]
nnet_8m_6l/decode_eval_closelm_iter50/cer_10:%CER 27.12 [ 2051 / 7562, 383 ins, 615 del, 1053 sub ]
nnet_8m_6l/decode_eval_closelm_iter100/cer_10:%CER 24.33 [ 1840 / 7562, 466 ins, 462 del, 912 sub ]
nnet_8m_6l/decode_eval_closelm_iter150/cer_10:%CER 21.34 [ 1614 / 7562, 364 ins, 476 del, 774 sub ]
nnet_8m_6l/decode_eval_closelm_iter200/cer_10:%CER 20.56 [ 1555 / 7562, 332 ins, 485 del, 738 sub ]
nnet_8m_6l/decode_eval_closelm_iter210/cer_10:%CER 20.67 [ 1563 / 7562, 349 ins, 494 del, 720 sub ]
nnet_8m_6l/decode_eval_closelm_iter220/cer_10:%CER 21.98 [ 1662 / 7562, 357 ins, 531 del, 774 sub ]
nnet_8m_6l/decode_eval_closelm_iter230/cer_10:%CER 22.30 [ 1686 / 7562, 360 ins, 539 del, 787 sub ]
nnet_8m_6l/decode_eval_closelm_iter240/cer_10:%CER 22.19 [ 1678 / 7562, 376 ins, 508 del, 794 sub ]
nnet_8m_6l/decode_eval_closelm_iter250/cer_10:%CER 21.52 [ 1627 / 7562, 354 ins, 523 del, 750 sub ]
nnet_8m_6l/decode_eval_closelm_iter260/cer_10:%CER 20.97 [ 1586 / 7562, 347 ins, 499 del, 740 sub ]
nnet_8m_6l/decode_eval_closelm_iter270/cer_10:%CER 20.50 [ 1550 / 7562, 348 ins, 465 del, 737 sub ]
nnet_8m_6l/decode_eval_closelm_iter280/cer_10:%CER 21.44 [ 1621 / 7562, 354 ins, 520 del, 747 sub ]
nnet_8m_6l/decode_eval_closelm_iter290/cer_10:%CER 20.40 [ 1543 / 7562, 323 ins, 492 del, 728 sub ]
nnet_8m_6l/decode_eval_closelm/cer_10:%CER 20.68 [ 1564 / 7562, 351 ins, 483 del, 730 sub ]
# "queue.pl" uses qsub. The options to it are
# options to qsub. If you have GridEngine installed,
# change this to a queue you have access to.
# Otherwise, use "run.pl", which will run jobs locally
# (make sure your --num-jobs options are no more than
# the number of cpus on your machine.
#a) JHU cluster options
#export train_cmd="queue.pl -l arch=*64*"
#export decode_cmd="queue.pl -l arch=*64* -l ram_free=4G,mem_free=4G"
#export cuda_cmd="..."
#export mkgraph_cmd="queue.pl -l arch=*64* ram_free=4G,mem_free=4G"
#b) BUT cluster options
#export train_cmd="queue.pl -q all.q@@blade -l ram_free=1200M,mem_free=1200M"
#export decode_cmd="queue.pl -q all.q@@blade -l ram_free=1700M,mem_free=1700M"
#export decodebig_cmd="queue.pl -q all.q@@blade -l ram_free=4G,mem_free=4G"
#export cuda_cmd="queue.pl -q long.q@@pco203 -l gpu=1"
#export cuda_cmd="queue.pl -q long.q@pcspeech-gpu"
#export mkgraph_cmd="queue.pl -q all.q@@servers -l ram_free=4G,mem_free=4G"
#c) run it locally...
export train_cmd=run.pl
export decode_cmd=run.pl
export cuda_cmd=run.pl
export mkgraph_cmd=run.pl
#!/bin/bash
#
if [ $# -ne 3 ]
then
echo $0 input_lang_dir output_lang_test_dir gzip_arpa_lm
exit
fi
if [ -f path.sh ]; then . path.sh; fi
indir=$1
outdir=$2
silprob=0.5
mkdir -p ${outdir} data/train
arpa_lm=$3
[ ! -f $arpa_lm ] && echo No such file $arpa_lm && exit 1;
cp -r ${indir}/* ${outdir}
# grep -v '<s> <s>' etc. is only for future-proofing this script. Our
# LM doesn't have these "invalid combinations". These can cause
# determinization failures of CLG [ends up being epsilon cycles].
# Note: remove_oovs.pl takes a list of words in the LM that aren't in
# our word list. Since our LM doesn't have any, we just give it
# /dev/null [we leave it in the script to show how you'd do it].
gunzip -c "$arpa_lm" | \
grep -v '<s> <s>' | \
grep -v '</s> <s>' | \
grep -v '</s> </s>' | \
arpa2fst - | fstprint | \
utils/remove_oovs.pl /dev/null | \
utils/eps2disambig.pl | utils/s2eps.pl | fstcompile --isymbols=${outdir}/words.txt \
--osymbols=${outdir}/words.txt --keep_isymbols=false --keep_osymbols=false | \
fstrmepsilon > ${outdir}/G.fst
fstisstochastic ${outdir}/G.fst
echo "Checking how stochastic G is (the first of these numbers should be small):"
fstisstochastic ${outdir}/G.fst
## Check lexicon.
## just have a look and make sure it seems sane.
echo "First few lines of lexicon FST:"
fstprint --isymbols=${indir}/phones.txt --osymbols=${indir}/words.txt ${indir}/L.fst | head
echo Performing further checks
# Checking that G.fst is determinizable.
fstdeterminize ${outdir}/G.fst /dev/null || echo Error determinizing G.
# Checking that L_disambig.fst is determinizable.
fstdeterminize ${outdir}/L_disambig.fst /dev/null || echo Error determinizing L.
# Checking that disambiguated lexicon times G is determinizable
# Note: we do this with fstdeterminizestar not fstdeterminize, as
# fstdeterminize was taking forever (presumbaly relates to a bug
# in this version of OpenFst that makes determinization slow for
# some case).
fsttablecompose ${outdir}/L_disambig.fst ${outdir}/G.fst | \
fstdeterminizestar >/dev/null || echo Error
# Checking that LG is stochastic:
fsttablecompose ${indir}/L_disambig.fst ${outdir}/G.fst | \
fstisstochastic || echo LG is not stochastic
echo p1_format_data succeeded.
#!/bin/bash
# Apache2.0
# Prepared by Hong Kong University of Science and Technology (Author: Ricky Chan Ho Yin)
#
. cmd.sh
mkdir data data/train data/eval
### Data preparation - Training data, evaluation data. Please refer http://kaldi.sourceforge.net/data_prep.html as well
utils/prepare_lang.sh data/local/dict "<UNK>" data/local/lang data/lang
utils/prepare_lang.sh data/local/dict.closelm "UNKNOWNGMM" data/local/lang.closelm data/lang.closelm
local/p1_format_data.sh data/lang data/lang_test data/local/lang/conv2_ears_16kwl.tg.gz
local/p1_format_data.sh data/lang.closelm data/lang_test_closelm data/local/lang/close_conv_ears_16kwl.tg.gz
### Feature extraction (training data)
mfccdir=mfcc
steps/make_mfcc.sh --nj 20 --cmd "$train_cmd" data/train exp/make_mfcc/train $mfccdir || exit 1;
utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train $mfccdir
utils/fix_data_dir.sh data/train
### Feature extraction (evaluation data)
steps/make_mfcc.sh --cmd "$train_cmd" --nj 2 data/eval exp/make_mfcc/eval $mfccdir || exit 1;
utils/utt2spk_to_spk2utt.pl data/eval/utt2spk > data/eval/spk2utt
steps/compute_cmvn_stats.sh data/eval exp/make_mfcc/eval $mfccdir || exit 1;
utils/fix_data_dir.sh data/eval
### We start acoustic model training here, build from HMM-GMM
### Mono phone training
steps/train_mono.sh --nj 20 --cmd "$train_cmd" data/train data/lang exp/mono0a || exit 1;
steps/align_si.sh --nj 30 --cmd "$train_cmd" data/train data/lang exp/mono0a exp/mono0a_ali
### Tri phone training
steps/train_deltas.sh --cmd "$train_cmd" 2500 20000 data/train data/lang exp/mono0a_ali exp/tri1
steps/align_si.sh --nj 30 --cmd "$train_cmd" data/train data/lang exp/tri1 exp/tri1_ali || exit 1;
utils/mkgraph.sh data/lang_test exp/tri1 exp/tri1/graph
utils/mkgraph.sh data/lang_test_closelm exp/tri1 exp/tri1/graph_closelm
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph data/eval exp/tri1/decode_eval
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph_closelm data/eval exp/tri1/decode_eval_closelm
### Tri phone training (better alignment)
steps/train_deltas.sh --cmd "$train_cmd" 2500 20000 data/train data/lang exp/tri1_ali exp/tri2 || exit 1;
steps/align_si.sh --nj 30 --cmd "$train_cmd" data/train data/lang exp/tri2 exp/tri2_ali || exit 1;
utils/mkgraph.sh data/lang_test exp/tri2 exp/tri2/graph
utils/mkgraph.sh data/lang_test_closelm exp/tri2 exp/tri2/graph_closelm
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri2/graph data/eval exp/tri2/decode_eval
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri2/graph_closelm data/eval exp/tri2/decode_eval_closelm
### Training with LDA+MLLT feature spaces transformation
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 20000 data/train data/lang exp/tri2_ali exp/tri3a || exit 1;
utils/mkgraph.sh data/lang_test exp/tri3a exp/tri3a/graph
utils/mkgraph.sh data/lang_test_closelm exp/tri3a exp/tri3a/graph_closelm
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri3a/graph data/eval exp/tri3a/decode_eval
steps/decode.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri3a/graph_closelm data/eval exp/tri3a/decode_eval_closelm
### SAT (speaker adaptive training)
steps/align_fmllr.sh --nj 30 --cmd "$train_cmd" data/train data/lang exp/tri3a exp/tri3a_ali || exit 1;
steps/train_sat.sh --cmd "$train_cmd" 4000 100000 data/train data/lang exp/tri3a_ali exp/tri4a || exit 1;
steps/train_sat.sh --cmd "$train_cmd" 2500 20000 data/train data/lang exp/tri3a_ali_100k exp/tri4a_20k || exit 1;
utils/mkgraph.sh data/lang_test exp/tri4a exp/tri4a/graph
utils/mkgraph.sh data/lang_test_closelm exp/tri4a exp/tri4a/graph_closelm
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri4a/graph data/eval exp/tri4a/decode_eval
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri4a/graph_closelm data/eval exp/tri4a/decode_eval_closelm
utils/mkgraph.sh data/lang_test exp/tri4a_20k exp/tri4a_20k/graph
utils/mkgraph.sh data/lang_test_closelm exp/tri4a_20k exp/tri4a_20k/graph_closelm
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri4a_20k/graph data/eval exp/tri4a_20k/decode_eval
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri4a_20k/graph_closelm data/eval exp/tri4a_20k/decode_eval_closelm
### SAT (speaker adaptive training on 100K model, with better alignment)
steps/align_fmllr.sh --nj 30 --cmd "$train_cmd" data/train data/lang exp/tri4a exp/tri4a_ali_100k
steps/train_sat.sh --cmd "$train_cmd" 4000 100000 data/train data/lang exp/tri4a_ali_100k exp/tri5a || exit 1;
utils/mkgraph.sh data/lang_test exp/tri5a exp/tri5a/graph &
utils/mkgraph.sh data/lang_test_closelm exp/tri5a exp/tri5a/graph_closelm &
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri5a/graph data/eval exp/tri5a/decode_eval &
steps/decode_fmllr.sh --nj 2 --cmd "$decode_cmd" --config conf/decode.config exp/tri5a/graph_closelm data/eval exp/tri5a/decode_eval_closelm &
### Discriminative training
## (feature-space MMI + boosted MMI)
steps/align_fmllr.sh --nj 25 --cmd "$train_cmd" data/train data/lang exp/tri5a exp/tri5a_ali_dt100k || exit 1;
steps/make_denlats.sh --nj 25 --cmd "$decode_cmd" --transform-dir exp/tri5a_ali_dt100k --config conf/decode.config --sub-split 25 data/train data/lang exp/tri5a exp/tri5a_denlats_dt100k || exit 1;
steps/train_diag_ubm.sh --silence-weight 0.5 --nj 25 --cmd "$train_cmd" 800 data/train data/lang exp/tri5a_ali_dt100k exp/tri5a_dubm_dt
steps/train_mmi_fmmi.sh --learning-rate 0.005 --boost 0.1 --cmd "$train_cmd" data/train data/lang exp/tri5a_ali_dt100k exp/tri5a_dubm_dt exp/tri5a_denlats_dt100k exp/tri5a_fmmi_b0.1 || exit 1;
for n in 1 2 3 4 5 6 7 8 ; do
steps/decode_fmmi.sh --nj 2 --cmd run.pl --iter $n --config conf/decode.config --transform-dir exp/tri5a/decode_eval exp/tri5a/graph data/eval exp/tri5a_fmmi_b0.1/decode_eval_iter${n} &
steps/decode_fmmi.sh --nj 2 --cmd run.pl --iter $n --config conf/decode.config --transform-dir exp/tri5a/decode_eval_closelm exp/tri5a/graph_closelm data/eval exp/tri5a_fmmi_b0.1/decode_eval_closelm_iter${n} &
done
## (boosted MMI only) (***remark: the lattices don't necessary re-generate again as in below two lines as exp/tri5a_ali_dt100k generated already)
steps/align_fmllr.sh --nj 40 --cmd "$train_cmd" data/train data/lang exp/tri5a exp/tri5a_ali_100k || exit 1;
steps/make_denlats.sh --nj 40 --cmd "$decode_cmd" --transform-dir exp/tri5a_ali_100k --config conf/decode.config --sub-split 40 data/train data/lang exp/tri5a exp/tri5a_denlats_100k || exit 1;
steps/train_mmi.sh --cmd "$decode_cmd" --boost 0.1 data/train data/lang exp/tri5a_{ali,denlats}_100k exp/tri5a_mmi_b0.1 || exit 1;
for n in 1 2 3 4; do
steps/decode.sh --nj 2 --iter $n --cmd "$decode_cmd" --config conf/decode.config --transform-dir exp/tri5a/decode_eval exp/tri5a/graph data/eval exp/tri5a_mmi_b0.1/decode_eval$n &
steps/decode.sh --nj 2 --iter $n --cmd "$decode_cmd" --config conf/decode.config --transform-dir exp/tri5a/decode_eval_closelm exp/tri5a/graph_closelm data/eval exp/tri5a_mmi_b0.1/decode_eval_closelm$n &
done
## SGMM (subspace gaussian mixture model), excluding the "speaker-dependent weights"
steps/train_ubm.sh --silence-weight 0.5 --cmd "$train_cmd" 800 data/train data/lang exp/tri5a_ali_dt100k exp/ubm5a || exit 1;
steps/train_sgmm.sh --cmd "$train_cmd" 4500 40000 data/train data/lang exp/tri5a_ali_dt100k exp/ubm5a/final.ubm exp/sgmm_5a || exit 1;
utils/mkgraph.sh data/lang_test_closelm exp/sgmm_5a exp/sgmm_5a/graph_closelm
utils/mkgraph.sh data/lang_test exp/sgmm_5a exp/sgmm_5a/graph
steps/decode_sgmm.sh --nj 2 --cmd "$decode_cmd" --transform-dir exp/tri5a/decode_eval_closelm exp/sgmm_5a/graph_closelm data/eval exp/sgmm_5a/decode_eval_closelm
steps/decode_sgmm.sh --nj 2 --cmd "$decode_cmd" --transform-dir exp/tri5a/decode_eval exp/sgmm_5a/graph data/eval exp/sgmm_5a/decode_eval
# boosted MMI on SGMM
steps/align_sgmm.sh --nj 25 --cmd "$train_cmd" --transform-dir exp/tri5a_ali_dt100k --use-graphs true --use-gselect true data/train data/lang exp/sgmm_5a exp/sgmm_5a_ali
steps/make_denlats_sgmm.sh --nj 25 --sub-split 25 --cmd "$decode_cmd" --transform-dir exp/tri5a_ali_dt100k data/train data/lang exp/sgmm_5a_ali exp/sgmm_5a_denlats
steps/train_mmi_sgmm.sh --cmd "$decode_cmd" --transform-dir exp/tri5a_ali_dt100k --boost 0.1 data/train data/lang exp/sgmm_5a_ali exp/sgmm_5a_denlats exp/sgmm_5a_mmi_b0.1
for n in 1 2 3 4; do
steps/decode_sgmm_rescore.sh --cmd "$decode_cmd" --iter $n --transform-dir exp/tri5a/decode_eval_closelm data/lang_test_closelm data/eval exp/sgmm_5a/decode_eval_closelm exp/sgmm_5a_mmi_b0.1/decode_eval_closelm$n
steps/decode_sgmm_rescore.sh --cmd "$decode_cmd" --iter $n --transform-dir exp/tri5a/decode_eval data/lang_test data/eval exp/sgmm_5a/decode_eval exp/sgmm_5a_mmi_b0.1/decode_eval$n
done
### Neural Network (on top of LDA+MLLT+SAT model)
steps/train_nnet_cpu.sh --mix-up 8000 --initial-learning-rate 0.01 --final-learning-rate 0.001 --num-jobs-nnet 16 --num-hidden-layers 6 --num-parameters 8000000 --cmd "$decode_cmd" data/train data/lang exp/tri5a exp/nnet_8m_6l
# decoding on final model for NN
steps/decode_nnet_cpu.sh --cmd "$decode_cmd" --nj 2 --config conf/decode.config --transform-dir exp/tri5a/decode_eval exp/tri5a/graph data/eval exp/nnet_8m_6l/decode_eval
steps/decode_nnet_cpu.sh --cmd "$decode_cmd" --nj 2 --config conf/decode.config --transform-dir exp/tri5a/decode_eval_closelm exp/tri5a/graph_closelm data/eval exp/nnet_8m_6l/decode_eval_closelm
# better analysis, this explains why we need to have average parameters in the last ten iterations
for n in 290 280 270 260 250 240 230 220 210 200 150 100 50; do
steps/decode_nnet_cpu.sh --cmd "$decode_cmd" --nj 2 --iter $n --config conf/decode.config --transform-dir exp/tri5a/decode_eval exp/tri5a/graph data/eval exp/nnet_8m_6l/decode_eval_iter${n} &
steps/decode_nnet_cpu.sh --cmd "$decode_cmd" --nj 2 --iter $n --config conf/decode.config --transform-dir exp/tri5a/decode_eval_closelm exp/tri5a/graph_closelm data/eval exp/nnet_8m_6l/decode_eval_closelm_iter${n} &
done
### Scoring ###
local/ext/score.sh data/eval exp/tri1/graph exp/tri1/decode_eval
local/ext/score.sh data/eval exp/tri1/graph_closelm exp/tri1/decode_eval_closelm
local/ext/score.sh data/eval exp/tri2/graph exp/tri2/decode_eval
local/ext/score.sh data/eval exp/tri2/graph_closelm exp/tri2/decode_eval_closelm
local/ext/score.sh data/eval exp/tri3a/graph exp/tri3a/decode_eval
local/ext/score.sh data/eval exp/tri3a/graph_closelm exp/tri3a/decode_eval_closelm
local/ext/score.sh data/eval exp/tri4a/graph exp/tri4a/decode_eval
local/ext/score.sh data/eval exp/tri4a/graph_closelm exp/tri4a/decode_eval_closelm
local/ext/score.sh data/eval exp/tri4a_20k/graph exp/tri4a_20k/decode_eval
local/ext/score.sh data/eval exp/tri4a_20k/graph_closelm exp/tri4a_20k/decode_eval_closelm
local/ext/score.sh data/eval exp/tri5a/graph exp/tri5a/decode_eval
local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/tri5a/decode_eval_closelm
for n in 1 2 3 4 5 6 7 8; do local/ext/score.sh data/eval exp/tri5a/graph exp/tri5a_fmmi_b0.1/decode_eval_iter$n; done
for n in 1 2 3 4 5 6 7 8; do local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/tri5a_fmmi_b0.1/decode_eval_closelm_iter$n; done
local/ext/score.sh data/eval exp/tri5a/graph exp/tri5a_mmi_b0.1/decode_eval
local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/tri5a_mmi_b0.1/decode_eval_closelm
for n in 1 2 3 4; do local/ext/score.sh data/eval exp/tri5a/graph exp/tri5a_mmi_b0.1/decode_eval$n; done
for n in 1 2 3 4; do local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/tri5a_mmi_b0.1/decode_eval_closelm$n; done
local/ext/score.sh data/eval data/lang_test exp/sgmm_5a/decode_eval;
local/ext/score.sh data/eval data/lang_test_closelm exp/sgmm_5a/decode_eval_closelm;
for n in 1 2 3 4; do
local/ext/score.sh data/eval data/lang_test exp/sgmm_5a_mmi_b0.1/decode_eval$n;
local/ext/score.sh data/eval data/lang_test_closelm exp/sgmm_5a_mmi_b0.1/decode_eval_closelm$n;
done
local/ext/score.sh data/eval exp/tri5a/graph exp/nnet_8m_6l/decode_eval
local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/nnet_8m_6l/decode_eval_closelm
for n in 290 280 270 260 250 240 230 220 210 200 150 100 50; do
local/ext/score.sh data/eval exp/tri5a/graph exp/nnet_8m_6l/decode_eval_iter${n};
local/ext/score.sh data/eval exp/tri5a/graph_closelm exp/nnet_8m_6l/decode_eval_closelm_iter${n};
done
../../wsj/s5/steps/
\ No newline at end of file
../../wsj/s5/utils/
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment