Commit 2a589477 authored by Dan Povey's avatar Dan Povey
Browse files

Script fixes and additions, and minor fixes and README rearrangements.

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@114 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 75883cff
About the Resource Management corpus:
Clean speech in a medium-vocabulary task consisting
of commands to a (presumably imaginary) computer system. About 3 hours
of training data.
Available from the LDC as catalog number LDC93S3A (it may be possible to
get the same data using combinations of other catalog numbers, but this
is the one we used).
Each subdirectory of this directory contains the
scripts for a sequence of experiments.
......
......@@ -20,7 +20,12 @@ exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regr
exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533) # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ fMLLR per utterance ]
exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533) # + diagonal fMLLR
exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ diagonal fMLLR per utterance]
exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533) # diagonal fMLLR, then estimate fMLLR and re-decode
exp/decode_tri2b/wer:Average WER is 3.303279 (414 / 12533) # Exponential transform
exp/decode_tri2b_fmllr/wer:Average WER is 3.047953 (382 / 12533) # +fMLLR
exp/decode_tri2b_utt/wer:Average WER is 3.335195 (418 / 12533) # [adapt per-utt]
exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC)
......@@ -28,6 +33,8 @@ exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + L
exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN
exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test
exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass)
exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533)
exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533) # feature-level VTLN + diag fMLLR
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533) # as above, per utt.
......@@ -89,6 +96,8 @@ exp/decode_tri2l_utt/wer:Average WER is 5.066624 (635 / 12533) # [ as decode_tr
exp/decode_tri2m/wer:Average WER is 3.223490 (404 / 12533) # Splice + LDA + MLLT + Linear VTLN
exp/decode_tri2m_diag/wer:Average WER is 3.119764 (391 / 12533) # diagonal not offset CMLLR component
exp/decode_tri2m_diag_fmllr/wer:Average WER is 2.784649 (349 / 12533) # diagonal CMLLR component; then est. CMLLR and re-decode again.
exp/decode_tri2m_diag_utt/wer:Average WER is 3.279343 (411 / 12533) # [per-utterance]
exp/decode_tri2m_vtln/wer:Average WER is 4.747467 (595 / 12533) # feature-level VTLN computation
exp/decode_tri2m_vtln_diag/wer:Average WER is 3.087848 (387 / 12533) # diagonal, not offset, adapt
exp/decode_tri2m_vtln_diag_utt/wer:Average WER is 4.340541 (544 / 12533) # per-utterance, diag adapt.
......
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri2a
dir=exp/decode_tri2a_dfmllr
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=diag --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
# Diagonal fMLLR followed by full fMLLR
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri2a_dfmllr
dir=exp/decode_tri2a_dfmllr_fmllr
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=full --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# deocde_tri_fmllr.sh is as decode_tri.sh but estimating fMLLR in test,
# per speaker. There is no SAT.
# To be run from ..
if [ -f path.sh ]; then . path.sh; fi
srcdir=exp/decode_tri2a
dir=exp/decode_tri2a_dfmllr_utt
mkdir -p $dir
model=exp/tri2a/final.mdl
tree=exp/tri2a/tree
graphdir=exp/graph_tri2a
silphones=`cat data/silphones.csl`
mincount=500 # mincount before we estimate a transform.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
# Comment the two lines below to make this per-utterance.
# This would only work if $srcdir was also per-utterance [otherwise
# you'd have to mess with the script a bit].
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
sifeats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- |"
ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.01 $silphones $model ark:- ark:- | \
gmm-est-fmllr --fmllr-update-type=diag --fmllr-min-count=$mincount $spk2utt_opt $model \
"$sifeats" ark,o:- ark:$dir/${test}.fmllr 2>$dir/fmllr_${test}.log
feats="ark:add-deltas --print-args=false scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/${test}.fmllr ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.08333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra > $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2b_fmllr
srcdir=exp/decode_tri2b
mkdir -p $dir
model=exp/tri2b/final.mdl
alignmodel=exp/tri2b/final.alimdl
et=exp/tri2b/final.et
defaultmat=exp/tri2b/default.mat
tree=exp/tri2b/tree
graphdir=exp/graph_tri2b
silphones=`cat data/silphones.csl`
mincount=300
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$srcdir/et_${test}.trans ark:- ark:- |"
( ali-to-post ark:$srcdir/test_${test}.ali ark:- | \
weight-silence-post 0.0 $silphones $model ark:- ark:- | \
gmm-est-fmllr $spk2utt_opt --fmllr-min-count=$mincount \
$model "$feats" ark:- ark,t:$dir/fmllr_${test}.trans ) \
2>$dir/fmllr_${test}.log
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$srcdir/et_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
# Do final decoding...
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_diag_fmllr
mkdir -p $dir
model=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre1.tra ark,t:$dir/test_${test}_pre1.ali 2> $dir/predecode1_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre1.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pre2.tra ark,t:$dir/test_${test}_pre2.ali 2> $dir/predecode2_${test}.log
( ali-to-post ark:$dir/test_${test}_pre2.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-est-fmllr $spk2utt_opt $model "$feats" \
ark:- ark:$dir/fmllr_${test}.trans )
2>$dir/fmllr_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2g_diag_utt
mkdir -p $dir
model=exp/tri2g/final.mdl
alignmodel=exp/tri2g/final.alimdl
lvtln=exp/tri2g/final.lvtln
tree=exp/tri2g/tree
graphdir=exp/graph_tri2g
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:add-deltas scp:data/test_${test}.scp ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:add-deltas scp:data/test_${test}.scp ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2m_diag_fmllr
mkdir -p $dir
model=exp/tri2m/final.mdl
alignmodel=exp/tri2m/final.alimdl
lvtln=exp/tri2m/final.lvtln
mat=exp/tri2f/final.mat
tree=exp/tri2m/tree
graphdir=exp/graph_tri2m
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $mat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre1.tra ark,t:$dir/test_${test}_pre1.ali 2> $dir/predecode1_${test}.log
# Comment the two lines below to make this per-utterance.
spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre1.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}_pre2.tra ark,t:$dir/test_${test}_pre2.ali 2> $dir/predecode2_${test}.log
( ali-to-post ark:$dir/test_${test}_pre2.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-est-fmllr $spk2utt_opt $model "$feats" \
ark:- ark:$dir/fmllr_${test}.trans )
2>$dir/fmllr_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/fmllr_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
# Copyright 2010-2011 Microsoft Corporation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
# MERCHANTABLITY OR NON-INFRINGEMENT.
# See the Apache 2 License for the specific language governing permissions and
# limitations under the License.
# to be run from ..
if [ -f path.sh ]; then . path.sh; fi
dir=exp/decode_tri2m_diag_utt
mkdir -p $dir
model=exp/tri2m/final.mdl
alignmodel=exp/tri2m/final.alimdl
mat=exp/tri2f/final.mat
lvtln=exp/tri2m/final.lvtln
tree=exp/tri2m/tree
graphdir=exp/graph_tri2m
silphones=`cat data/silphones.csl`
# already made the graph.
scripts/mkgraph.sh $tree $model $graphdir
for test in mar87 oct87 feb89 oct89 feb91 sep92; do
(
sifeats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $mat ark:- ark:- |"
# First do SI decoding with alignment model.
# Use smaller beam for this, as less critical.
gmm-decode-faster --beam=15.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $alignmodel $graphdir/HCLG.fst "$sifeats" ark,t:$dir/test_${test}_pre.tra ark,t:$dir/test_${test}_pre.ali 2> $dir/predecode_${test}.log
# Comment the two lines below to make this per-utterance.
#spk2utt_opt=--spk2utt=ark:data/test_${test}.spk2utt
#utt2spk_opt=--utt2spk=ark:data/test_${test}.utt2spk
( ali-to-post ark:$dir/test_${test}_pre.ali ark:- | \
weight-silence-post 0.0 $silphones $alignmodel ark:- ark:- | \
gmm-post-to-gpost $alignmodel "$sifeats" ark:- ark:- | \
gmm-est-lvtln-trans --norm-type=diag --verbose=1 $spk2utt_opt $model $lvtln \
"$sifeats" ark:- ark:$dir/lvtln_${test}.trans ark,t:$dir/lvtln_${test}.warp ) \
2>$dir/lvtln_${test}.log || exit 1;
feats="ark:splice-feats scp:data/test_${test}.scp ark:- | transform-feats $mat ark:- ark:- | transform-feats $utt2spk_opt ark:$dir/lvtln_${test}.trans ark:- ark:- |"
gmm-decode-faster --beam=20.0 --acoustic-scale=0.083333 --word-symbol-table=data/words.txt $model $graphdir/HCLG.fst "$feats" ark,t:$dir/test_${test}.tra ark,t:$dir/test_${test}.ali 2> $dir/decode_${test}.log
# the ,p option lets it score partial output without dying..
scripts/sym2int.pl --ignore-first-field data/words.txt data_prep/test_${test}_trans.txt | \
compute-wer --mode=present ark:- ark,p:$dir/test_${test}.tra >& $dir/wer_${test}
) &
done
wait
grep WER $dir/wer_* | \
awk '{n=n+$4; d=d+$6} END{ printf("Average WER is %f (%d / %d) \n", 100.0*n/d, n, d); }' \
> $dir/wer
About the Wall Street Journal corpus:
This is a corpus of read
sentences from the Wall Street Journal, recorded under clean conditions.
The vocabulary is quite large. About 80 hours of training data.
Available from the LDC as either: [ catalog numbers LDC93S6A (WSJ0) and LDC94S13A (WSJ1) ]
or: [ catalog numbers LDC93S6B (WSJ0) and LDC94S13B (WSJ1) ]
The latter option is cheaper and includes only the Sennheiser
microphone data (which is all we use in the example scripts).
Each subdirectory of this directory contains the
scripts for a sequence of experiments.
......