Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
LINAGORA
L
LGS
Labs
kaldi-modelgen
Commits
5f96bffa
Commit
5f96bffa
authored
Mar 20, 2017
by
Abdelwahab HEBA
Browse files
convert kaldi input to json file used for keyword extraction and recommendation
parent
ba341bcf
Changes
2
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
367 additions
and
0 deletions
+367
-0
convert_inputkaldi_tojsonformat/convert_ink_to_json.sh
convert_inputkaldi_tojsonformat/convert_ink_to_json.sh
+9
-0
convert_inputkaldi_tojsonformat/debat.json
convert_inputkaldi_tojsonformat/debat.json
+358
-0
No files found.
convert_inputkaldi_tojsonformat/convert_ink_to_json.sh
0 → 100755
View file @
5f96bffa
#!/bin/bash
cat
$1
/segments |
awk
'{print $3,$4}'
>
$1
/seg.tmp
cat
$1
/utt2spk |
awk
'{print $2}'
>
$1
/spk.tmp
cat
$1
/text_microsoft |
awk
'{$1="";print $0}'
|
sed
's:^\s::g'
|
uniq
>
$1
/text.tmp
paste
-d
" "
$1
/seg.tmp
$1
/spk.tmp
>
$1
/segspk.tmp
paste
-d
" "
$1
/segspk.tmp
$1
/text.tmp
>
$1
/segspktext.tmp
cat
$1
/segspktext.tmp |
sort
-V
-k1
>
$1
/segspktext_sorted.tmp
cat
$1
/segspktext_sorted.tmp |
awk
-v
m
=
"
\x
0a"
-v
N
=
"4"
'{$N=m$N;printf "{\"from\": %s, \"until\": %s, \"speaker\": \"%s\",\"text\": \"%s\"},\n", $1 , $2, $3, substr($0,index($0,m)+1) }'
>
$1
/
$2
.json
convert_inputkaldi_tojsonformat/debat.json
0 → 100644
View file @
5f96bffa
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment