Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
LINAGORA
L
LGS
Labs
kaldi-modelgen
Commits
5bcfb89f
Commit
5bcfb89f
authored
Sep 21, 2017
by
Abdelwahab HEBA
Browse files
fix parse : mohammed v to mohammed cinq | mohammed vi to mohammed six etc..
parent
f61a95f2
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
10 additions
and
0 deletions
+10
-0
local/lm/parseESTERSyncV2_text.py
local/lm/parseESTERSyncV2_text.py
+5
-0
local/parseESTERSyncV2.py
local/parseESTERSyncV2.py
+5
-0
No files found.
local/lm/parseESTERSyncV2_text.py
View file @
5bcfb89f
...
...
@@ -9,6 +9,11 @@ import re
import
os.path
def
transformation_text
(
text
):
# ESTER Problem "Mohamed v" ===> "Mohammed cinq"
text
=
re
.
sub
(
"mohammed vi"
,
"mohamed six"
,
text
)
text
=
re
.
sub
(
"mohammed v"
,
"mohamed cinq"
,
text
)
# map all "mohamed" to "mohammed"
text
=
re
.
sub
(
"mohamed"
,
"mohammed"
,
text
)
# character normalization:
text
=
re
.
sub
(
"&"
,
"et"
,
text
)
text
=
re
.
sub
(
"\+"
,
"plus"
,
text
)
...
...
local/parseESTERSyncV2.py
View file @
5bcfb89f
...
...
@@ -9,6 +9,11 @@ import re
import
os.path
def
transformation_text
(
text
):
# ESTER Problem "Mohamed v" ===> "Mohammed cinq"
text
=
re
.
sub
(
"mohammed vi"
,
"mohamed six"
,
text
)
text
=
re
.
sub
(
"mohammed v"
,
"mohamed cinq"
,
text
)
# map all "mohamed" to "mohammed"
text
=
re
.
sub
(
"mohamed"
,
"mohammed"
,
text
)
# character normalization:
text
=
re
.
sub
(
"&"
,
"et"
,
text
)
text
=
re
.
sub
(
"æ"
,
"ae"
,
text
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment