Commit b02ad40b authored by Dan Povey's avatar Dan Povey
Browse files

trunk: merging sandbox/pawel to add the AMI recipe.

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@4276 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parents 519109f4 f72d2a7d
About the AMI corpus:
WEB: http://groups.inf.ed.ac.uk/ami/corpus/
LICENCE: http://groups.inf.ed.ac.uk/ami/corpus/license.shtml
"The AMI Meeting Corpus consists of 100 hours of meeting recordings. The recordings use a range of signals synchronized to a common timeline. These include close-talking and far-field microphones, individual and room-view video cameras, and output from a slide projector and an electronic whiteboard. During the meetings, the participants also have unsynchronized pens available to them that record what is written. The meetings were recorded in English using three different rooms with different acoustic properties, and include mostly non-native speakers." See http://groups.inf.ed.ac.uk/ami/corpus/overview.shtml for more details.
About the recipe:
s5)
The scripts under this directory build systems using AMI data only, this includes both training, development and evaluation sets (following Full ASR split on http://groups.inf.ed.ac.uk/ami/corpus/datasets.shtml). This is different from RT evaluation campaigns that usually combined couple of different meeting datasets from multiple sources. In general, the recipe reproduce baseline systems build in [1] but without propeirary components* that means we use CMUDict [2] and in the future will try to use open texts to estimate background language model.
Currently, one can build the systems for close-talking scenario, for which we refer as
-- IHM (Individual Headset Microphones)
and two variants of distant speech
-- SDM (Single Distant Microphone) using 1st micarray and,
-- MDM (Multiple Distant Microphones) where the mics are combined using BeamformIt [3] toolkit.
To run all su-recipes the following (non-standard) software is expected to be installed
1) SRILM - to build language models (look at KALDI_ROOT/tools/install_srilm.sh)
2) BeamformIt (for MDM scenario, installed with Kaldi tools)
3) Java (optional, but if available will be used to extract transcripts from XML)
[1] "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition", Pawel Swietojanski, Arnab Ghoshal and Steve Renals, In Proc. ASRU, December 2013
[2] http://www.speech.cs.cmu.edu/cgi-bin/cmudict
[3] "Acoustic beamforming for speaker diarization of meetings", Xavier Anguera, Chuck Wooters and Javier Hernando, IEEE Transactions on Audio, Speech and Language Processing, September 2007, volume 15, number 7, pp.2011-2023.
*) there is still optional dependency on Fisher transcripts (LDC2004T19, LDC2005T19) to build background language model and closely reproduce [1].
dev
exp/ihm/tri2a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev.ctm.filt.dtl:Percent Total Error = 38.0% (35925)
exp/ihm/tri3a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_14/dev.ctm.filt.dtl:Percent Total Error = 35.3% (33329)
exp/ihm/tri4a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev.ctm.filt.dtl:Percent Total Error = 32.1% (30364)
exp/ihm/tri4a_mmi_b0.1/decode_dev_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_12/dev.ctm.filt.dtl:Percent Total Error = 29.9% (28220)
eval
exp/ihm/tri2a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_13/eval.ctm.filt.dtl:Percent Total Error = 43.7% (39330)
exp/ihm/tri3a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_14/eval.ctm.filt.dtl:Percent Total Error = 40.4% (36385)
exp/ihm/tri4a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_13/eval_o4.ctm.filt.dtl:Percent Total Error = 35.0% (31463)
exp/ihm/tri4a_mmi_b0.1/decode_eval_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_12/eval_o4.ctm.filt.dtl:Percent Total Error = 31.7% (28518)
#Beamforming of 8 microphones, WER scores with up to 4 overlapping speakers
dev
exp/mdm8/tri2a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev_o4.ctm.filt.dtl:Percent Total Error = 58.8% (55568)
exp/mdm8/tri3a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev_o4.ctm.filt.dtl:Percent Total Error = 57.0% (53855)
exp/mdm8/tri3a_mmi_b0.1/decode_dev_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_10/dev_o4.ctm.filt.dtl:Percent Total Error = 54.9% (51926)
eval
exp/mdm8/tri2a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_13/eval_o4.ctm.filt.dtl:Percent Total Error = 64.4% (57916)
exp/mdm8/tri3a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_13/eval_o4.ctm.filt.dtl:Percent Total Error = 61.9% (55738)
exp/mdm8/tri3a_mmi_b0.1/decode_eval_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_10/eval_o4.ctm.filt.dtl:Percent Total Error = 59.3% (53370)
#the below are WER scores with up to 4 overlapping speakers
dev
exp/sdm1/tri2a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev_o4.ctm.filt.dtl:Percent Total Error = 66.9% (63190)
exp/sdm1/tri3a/decode_dev_ami_fsh.o3g.kn.pr1-7/ascore_13/dev_o4.ctm.filt.dtl:Percent Total Error = 64.5% (60963)
exp/sdm1/tri3a_mmi_b0.1/decode_dev_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_10/dev_o4.ctm.filt.dtl:Percent Total Error = 62.2% (58772)
eval
exp/sdm1/tri2a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_13/eval_o4.ctm.filt.dtl:Percent Total Error = 71.8% (64577)
exp/sdm1/tri3a/decode_eval_ami_fsh.o3g.kn.pr1-7/ascore_12/eval_o4.ctm.filt.dtl:Percent Total Error = 69.5% (62576)
exp/sdm1/tri3a_mmi_b0.1/decode_eval_3.mdl_ami_fsh.o3g.kn.pr1-7/ascore_10/eval_o4.ctm.filt.dtl:Percent Total Error = 67.2% (60447)
# "queue.pl" uses qsub. The options to it are
# options to qsub. If you have GridEngine installed,
# change this to a queue you have access to.
# Otherwise, use "run.pl", which will run jobs locally
# (make sure your --num-jobs options are no more than
# the number of cpus on your machine.
# On Eddie use:
#export train_cmd="queue.pl -P inf_hcrc_cstr_nst -l h_rt=08:00:00"
#export decode_cmd="queue.pl -P inf_hcrc_cstr_nst -l h_rt=05:00:00 -pe memory-2G 4"
#export highmem_cmd="queue.pl -P inf_hcrc_cstr_nst -l h_rt=05:00:00 -pe memory-2G 4"
#export scoring_cmd="queue.pl -P inf_hcrc_cstr_nst -l h_rt=00:20:00"
# To run locally, use:
export train_cmd=run.pl
export decode_cmd=run.pl
export highmem_cmd=run.pl
#BeamformIt sample configuration file for AMI data (http://groups.inf.ed.ac.uk/ami/download/)
# scrolling size to compute the delays
scroll_size = 250
# cross correlation computation window size
window_size = 500
#amount of maximum points for the xcorrelation taken into account
nbest_amount = 4
#flag wether to apply an automatic noise thresholding
do_noise_threshold = 1
#Percentage of frames with lower xcorr taken as noisy
noise_percent = 10
######## acoustic modelling parameters
#transition probabilities weight for multichannel decoding
trans_weight_multi = 25
trans_weight_nbest = 25
###
#flag wether to print the feaures after setting them, or not
print_features = 1
#flag wether to use the bad frames in the sum process
do_avoid_bad_frames = 1
#flag to use the best channel (SNR) as a reference
#defined from command line
do_compute_reference = 1
#flag wether to use a uem file or not(process all the file)
do_use_uem_file = 0
#flag wether to use an adaptative weights scheme or fixed weights
do_adapt_weights = 1
#flag wether to output the sph files or just run the system to create the auxiliary files
do_write_sph_files = 1
####directories where to store/retrieve info####
#channels_file = ./cfg-files/channels
#show needs to be passed as argument normally, here a default one is given just in case
#show_id = Ttmp
beam=11.0 # beam for decoding. Was 13.0 in the scripts.
first_beam=8.0 # beam for 1st-pass decoding in SAT.
--window-type=hamming # disable Dans window, use the standard
--use-energy=false # only fbank outputs
--sample-frequency=16000 # AMI is sampled at 16kHz
#--low-freq=64 # typical setup from Frantisek Grezl
#--high-freq=3800
--dither=1
--num-mel-bins=40 # 8kHz so we use 15 bins
--htk-compat=true # try to make it compatible with HTK
--use-energy=false # only non-default option.
--sample-frequency=16000
#!/bin/bash
#Copyright 2014, University of Edinburgh (Author: Pawel Swietojanski)
#Apache 2.0
wiener_filtering=false
nj=4
cmd=run.pl
# End configuration section
echo "$0 $@" # Print the command line for logging
[ -f ./path.sh ] && . ./path.sh; # source the path.
. parse_options.sh || exit 1;
if [ $# != 3 ]; then
echo "Wrong #arguments ($#, expected 4)"
echo "Usage: steps/ami_beamform.sh [options] <num-mics> <ami-dir> <wav-out-dir>"
echo "main options (for others, see top of script file)"
echo " --nj <nj> # number of parallel jobs"
echo " --cmd <cmd> # Command to run in parallel with"
echo " --wiener-filtering <true/false> # Cancel noise with Wiener filter prior to beamforming"
exit 1;
fi
numch=$1
sdir=$2
odir=$3
wdir=data/local/beamforming
mkdir -p $odir
mkdir -p $wdir/log
meetings=$wdir/meetings.list
cat local/split_train.orig local/split_dev.orig local/split_eval.orig | sort > $meetings
ch_inc=$((8/$numch))
bmf=
for ch in `seq 1 $ch_inc 8`; do
bmf="$bmf $ch"
done
echo "Will use the following channels: $bmf"
#make the channel file
if [ -f $wdir/channels_$numch ]; then
rm $wdir/channels_$numch
fi
touch $wdir/channels_$numch
while read line;
do
channels="$line "
for ch in $bmf; do
channels="$channels $line/audio/$line.Array1-0$ch.wav"
done
echo $channels >> $wdir/channels_$numch
done < $meetings
#do noise cancellation
if [ $wiener_filtering == "true" ]; then
echo "Wiener filtering not yet implemented."
exit 1;
fi
#do beamforming
echo -e "Beamforming\n"
$cmd JOB=1:$nj $wdir/log/beamform.JOB.log \
local/beamformit.sh $nj JOB $numch $meetings $sdir $odir
#!/bin/bash
# Copyright 2014, University of Edinburgh (Author: Pawel Swietojanski, Jonathan Kilgour)
if [ $# -ne 2 ]; then
echo "Usage: $0 <mic> <ami-dir>"
echo " where <mic> is either ihm, sdm or mdm and <ami-dir> is download space."
exit 1;
fi
mic=$1
adir=$2
amiurl=http://groups.inf.ed.ac.uk/ami
annotver=ami_public_manual_1.6.1
wdir=data/local/downloads
if [[ ! "$mic" =~ ^(ihm|sdm|mdm)$ ]]; then
echo "$0. Wrong <mic> option."
exit 1;
fi
mics="1 2 3 4 5 6 7 8"
if [ "$mic" == "sdm" ]; then
mics=1
fi
mkdir -p $adir
mkdir -p $wdir/log
#download annotations
annot="$adir/$annotver"
if [[ ! -d $adir/annotations || ! -f "$annot" ]]; then
echo "Downloading annotiations..."
wget -nv -O $annot.zip $amiurl/AMICorpusAnnotations/$annotver.zip &> $wdir/log/download_ami_annot.log
mkdir -p $adir/annotations
unzip -o -d $adir/annotations $annot.zip &> /dev/null
fi
[ ! -f "$adir/annotations/AMI-metadata.xml" ] && echo "$0: File AMI-Metadata.xml not found under $adir/annotations." && exit 1;
#download waves
cat local/split_train.orig local/split_eval.orig local/split_dev.orig > $wdir/ami_meet_ids.flist
wgetfile=$wdir/wget_$mic.sh
manifest="wget -O $adir/MANIFEST.TXT http://groups.inf.ed.ac.uk/ami/download/temp/amiBuild-04237-Sun-Jun-15-2014.manifest.txt"
license="wget -O $adir/LICENCE.TXT http://groups.inf.ed.ac.uk/ami/download/temp/Creative-Commons-Attribution-NonCommercial-ShareAlike-2.5.txt"
echo "#!/bin/bash" > $wgetfile
echo $manifest >> $wgetfile
echo $license >> $wgetfile
while read line; do
if [ "$mic" == "ihm" ]; then
extra_headset= #some meetings have 5 sepakers (headsets)
for mtg in EN2001a EN2001d EN2001e; do
[ "$mtg" == "$line" ] && extra_headset=4;
done
for m in 0 1 2 3 $extra_headset; do
echo "wget -nv -c -P $adir/$line/audio $amiurl/AMICorpusMirror/amicorpus/$line/audio/$line.Headset-$m.wav" >> $wgetfile
done
else
for m in $mics; do
echo "wget -nv -c -P $adir/$line/audio $amiurl/AMICorpusMirror/amicorpus/$line/audio/$line.Array1-0$m.wav" >> $wgetfile
done
fi
done < $wdir/ami_meet_ids.flist
chmod +x $wgetfile
echo "Downloading audio files for $mic scenario."
echo "Look at $wdir/log/download_ami_$mic.log for progress"
$wgetfile &> $wdir/log/download_ami_$mic.log
#do rough check if #wavs is as expected, it will fail anyway in data prep stage if it isn't
if [ "$mic" == "ihm" ]; then
num_files=`find $adir -iname *Headset*`
if [ $num_files -ne 687 ]; then
echo "Warning: Found $num_files headset wavs but expected 687. Check $wdir/log/download_ami_$mic.log for details."
exit 1;
fi
else
num_files=`find $adir -iname *Array1*`
if [[ $num_files -lt 1352 && "$mic" == "mdm" ]]; then
echo "Warning: Found $num_files distant Array1 waves but expected 1352 for mdm. Check $wdir/log/download_ami_$mic.log for details."
exit 1;
elif [[ $num_files -lt 169 && "$mic" == "sdm" ]]; then
echo "Warning: Found $num_files distant Array1 waves but expected 169 for sdm. Check $wdir/log/download_ami_$mic.log for details."
exit 1;
fi
fi
echo "Downloads of AMI corpus completed succesfully. License can be found under $adir/LICENCE.TXT"
exit 0;
#!/bin/bash
#
if [ -f path.sh ]; then . path.sh; fi
if [ $# -ne 1 ]; then
echo 'Usage: $0 <arpa-lm>'
exit
fi
silprob=0.5
arpa_lm=$1
[ ! -f $arpa_lm ] && echo No such file $arpa_lm && exit 1;
cp -r data/lang data/lang_test
# grep -v '<s> <s>' etc. is only for future-proofing this script. Our
# LM doesn't have these "invalid combinations". These can cause
# determinization failures of CLG [ends up being epsilon cycles].
# Note: remove_oovs.pl takes a list of words in the LM that aren't in
# our word list. Since our LM doesn't have any, we just give it
# /dev/null [we leave it in the script to show how you'd do it].
gunzip -c "$arpa_lm" | \
grep -v '<s> <s>' | \
grep -v '</s> <s>' | \
grep -v '</s> </s>' | \
arpa2fst - | fstprint | \
utils/remove_oovs.pl /dev/null | \
utils/eps2disambig.pl | utils/s2eps.pl | fstcompile --isymbols=data/lang_test/words.txt \
--osymbols=data/lang_test/words.txt --keep_isymbols=false --keep_osymbols=false | \
fstrmepsilon > data/lang_test/G.fst
fstisstochastic data/lang_test/G.fst
echo "Checking how stochastic G is (the first of these numbers should be small):"
fstisstochastic data/lang_test/G.fst
## Check lexicon.
## just have a look and make sure it seems sane.
echo "First few lines of lexicon FST:"
fstprint --isymbols=data/lang/phones.txt --osymbols=data/lang/words.txt data/lang/L.fst | head
echo Performing further checks
# Checking that G.fst is determinizable.
fstdeterminize data/lang_test/G.fst /dev/null || echo Error determinizing G.
# Checking that L_disambig.fst is determinizable.
fstdeterminize data/lang_test/L_disambig.fst /dev/null || echo Error determinizing L.
# Checking that disambiguated lexicon times G is determinizable
# Note: we do this with fstdeterminizestar not fstdeterminize, as
# fstdeterminize was taking forever (presumbaly relates to a bug
# in this version of OpenFst that makes determinization slow for
# some case).
fsttablecompose data/lang_test/L_disambig.fst data/lang_test/G.fst | \
fstdeterminizestar >/dev/null || echo Error
# Checking that LG is stochastic:
fsttablecompose data/lang/L_disambig.fst data/lang_test/G.fst | \
fstisstochastic || echo LG is not stochastic
echo AMI_format_data succeeded.
#!/bin/bash
# Copyright 2014, University of Edinburgh (Author: Pawel Swietojanski)
# AMI Corpus training data preparation
# Apache 2.0
# To be run from one directory above this script.
. path.sh
#check existing directories
if [ $# != 1 ]; then
echo "Usage: ami_ihm_data_prep.sh /path/to/AMI"
exit 1;
fi
AMI_DIR=$1
SEGS=data/local/annotations/train.txt
dir=data/local/ihm/train
mkdir -p $dir
# Audio data directory check
if [ ! -d $AMI_DIR ]; then
echo "Error: $AMI_DIR directory does not exists."
exit 1;
fi
# And transcripts check
if [ ! -f $SEGS ]; then
echo "Error: File $SEGS no found (run ami_text_prep.sh)."
exit 1;
fi
# find headset wav audio files only
find $AMI_DIR -iname '*.Headset-*.wav' | sort > $dir/wav.flist
n=`cat $dir/wav.flist | wc -l`
echo "In total, $n headset files were found."
[ $n -ne 687 ] && \
echo "Warning: expected 687 (168 mtgs x 4 mics + 3 mtgs x 5 mics) data files, found $n"
# (1a) Transcriptions preparation
# here we start with normalised transcriptions, the utt ids follow the convention
# AMI_MEETING_CHAN_SPK_STIME_ETIME
# AMI_ES2011a_H00_FEE041_0003415_0003484
# we use uniq as some (rare) entries are doubled in transcripts
awk '{meeting=$1; channel=$2; speaker=$3; stime=$4; etime=$5;
printf("AMI_%s_%s_%s_%07.0f_%07.0f", meeting, channel, speaker, int(100*stime+0.5), int(100*etime+0.5));
for(i=6;i<=NF;i++) printf(" %s", $i); printf "\n"}' $SEGS | sort | uniq > $dir/text
# (1b) Make segment files from transcript
awk '{
segment=$1;
split(segment,S,"[_]");
audioname=S[1]"_"S[2]"_"S[3]; startf=S[5]; endf=S[6];
print segment " " audioname " " startf*10/1000 " " endf*10/1000 " "
}' < $dir/text > $dir/segments
# (1c) Make wav.scp file.
sed -e 's?.*/??' -e 's?.wav??' $dir/wav.flist | \
perl -ne 'split; $_ =~ m/(.*)\..*\-([0-9])/; print "AMI_$1_H0$2\n"' | \
paste - $dir/wav.flist > $dir/wav1.scp
#Keep only train part of waves
awk '{print $2}' $dir/segments | sort -u | join - $dir/wav1.scp > $dir/wav2.scp
#replace path with an appropriate sox command that select single channel only
awk '{print $1" sox -c 1 -t wavpcm -s "$2" -t wavpcm - |"}' $dir/wav2.scp > $dir/wav.scp
# (1d) reco2file_and_channel
cat $dir/wav.scp \
| perl -ane '$_ =~ m:^(\S+)(H0[0-4])\s+.*\/([IETB].*)\.wav.*$: || die "bad label $_";
print "$1$2 $3 A\n"; ' > $dir/reco2file_and_channel || exit 1;
awk '{print $1}' $dir/segments | \
perl -ane '$_ =~ m:^(\S+)([FM][A-Z]{0,2}[0-9]{3}[A-Z]*)(\S+)$: || die "bad label $_";
print "$1$2$3 $1$2\n";' > $dir/utt2spk || exit 1;
sort -k 2 $dir/utt2spk | utils/utt2spk_to_spk2utt.pl > $dir/spk2utt || exit 1;
# Copy stuff into its final location
mkdir -p data/ihm/train
for f in spk2utt utt2spk wav.scp text segments reco2file_and_channel; do
cp $dir/$f data/ihm/train/$f || exit 1;
done
utils/validate_data_dir.sh --no-feats data/ihm/train || exit 1;
echo AMI IHM data preparation succeeded.
#!/bin/bash
# Copyright 2014, University of Edinburgh (Author: Pawel Swietojanski)
# AMI Corpus dev/eval data preparation
. path.sh
#check existing directories
if [ $# != 2 ]; then
echo "Usage: ami_*_scoring_data_prep_edin.sh /path/to/AMI set-name"
exit 1;
fi
AMI_DIR=$1
SET=$2
SEGS=data/local/annotations/$SET.txt
dir=data/local/ihm/$SET
mkdir -p $dir
# Audio data directory check
if [ ! -d $AMI_DIR ]; then
echo "Error: run.sh requires a directory argument"
exit 1;
fi
# And transcripts check
if [ ! -f $SEGS ]; then
echo "Error: File $SEGS no found (run ami_text_prep.sh)."
exit 1;
fi
# find headset wav audio files only, here we again get all
# the files in the corpora and filter only specific sessions
# while building segments
find $AMI_DIR -iname '*.Headset-*.wav' | sort > $dir/wav.flist
n=`cat $dir/wav.flist | wc -l`
echo "In total, $n headset files were found."
[ $n -ne 687 ] && \
echo "Warning: expected 687 (168 mtgs x 4 mics + 3 mtgs x 5 mics) data files, found $n"
# (1a) Transcriptions preparation
# here we start with normalised transcriptions, the utt ids follow the convention
# AMI_MEETING_CHAN_SPK_STIME_ETIME
# AMI_ES2011a_H00_FEE041_0003415_0003484
awk '{meeting=$1; channel=$2; speaker=$3; stime=$4; etime=$5;
printf("AMI_%s_%s_%s_%07.0f_%07.0f", meeting, channel, speaker, int(100*stime+0.5), int(100*etime+0.5));
for(i=6;i<=NF;i++) printf(" %s", $i); printf "\n"}' $SEGS | sort | uniq > $dir/text
# (1c) Make segment files from transcript
#segments file format is: utt-id side-id start-time end-time, e.g.:
awk '{
segment=$1;
split(segment,S,"[_]");
audioname=S[1]"_"S[2]"_"S[3]; startf=S[5]; endf=S[6];
print segment " " audioname " " startf*10/1000 " " endf*10/1000 " "
}' < $dir/text > $dir/segments
#prepare wav.scp
sed -e 's?.*/??' -e 's?.wav??' $dir/wav.flist | \
perl -ne 'split; $_ =~ m/(.*)\..*\-([0-9])/; print "AMI_$1_H0$2\n"' | \
paste - $dir/wav.flist > $dir/wav1.scp
#Keep only train part of waves
awk '{print $2}' $dir/segments | sort -u | join - $dir/wav1.scp > $dir/wav2.scp
#replace path with an appropriate sox command that select single channel only
awk '{print $1" sox -c 1 -t wavpcm -s "$2" -t wavpcm - |"}' $dir/wav2.scp > $dir/wav.scp
# (1d) reco2file_and_channel
cat $dir/wav.scp \
| perl -ane '$_ =~ m:^(\S+)(H0[0-4])\s+.*\/([IETB].*)\.wav.*$: || die "bad label $_";
print "$1$2 $3 A\n"; ' > $dir/reco2file_and_channel || exit 1;
awk '{print $1}' $dir/segments | \
perl -ane '$_ =~ m:^(\S+)([FM][A-Z]{0,2}[0-9]{3}[A-Z]*)(\S+)$: || die "segments: bad label $_";
print "$1$2$3 $1$2\n";' > $dir/utt2spk || exit 1;
sort -k 2 $dir/utt2spk | utils/utt2spk_to_spk2utt.pl > $dir/spk2utt || exit 1;
#check and correct the case when segment timings for given speaker overlap themself
#(important for simulatenous asclite scoring to proceed).
#There is actually only one such case for devset and automatic segmentetions
join $dir/utt2spkm $dir/segments | \
perl -ne '{BEGIN{$pu=""; $pt=0.0;} split;
if ($pu eq $_[1] && $pt > $_[3]) {
print "$_[0] $_[2] $_[3] $_[4]>$_[0] $_[2] $pt $_[4]\n"
}
$pu=$_[1]; $pt=$_[4];
}' > $dir/segments_to_fix
if [ `cat $dir/segments_to_fix | wc -l` -gt 0 ]; then
echo "$0. Applying following fixes to segments"
cat $dir/segments_to_fix
while read line; do
p1=`echo $line | awk -F'>' '{print $1}'`
p2=`echo $line | awk -F'>' '{print $2}'`
sed -ir "s!$p1!$p2!" $dir/segments
done < $dir/segments_to_fix
fi
# Copy stuff into its final locations
fdir=data/ihm/$SET
mkdir -p $fdir
for f in spk2utt utt2spk wav.scp text segments reco2file_and_channel; do
cp $dir/$f $fdir/$f || exit 1;
done
#Produce STMs for sclite scoring
local/convert2stm.pl $dir > $fdir/stm
cp local/english.glm $fdir/glm