Commit 921b8d66 authored by Dan Povey's avatar Dan Povey
Browse files

Modified the introduction of the paper.

git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@515 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 601911bf
...@@ -83,10 +83,10 @@ ...@@ -83,10 +83,10 @@
\makeatletter \makeatletter
\def\name#1{\gdef\@name{#1\\}} \def\name#1{\gdef\@name{#1\\}}
\makeatother \makeatother
\name{ Daniel Povey$^1$, Mirko Hannemann$^2$, \\ \name{ Daniel Povey$^1$, Mirko Hannemann$^{1,2}$, \\
{Gilles Boulianne}$^3$, {Luk\'{a}\v{s} Burget}$^4$, {Arnab Ghoshal}$^5$, {Milos Janda}$^2$, {Stefan Kombrink}$^2$, \\ {Gilles Boulianne}$^3$, {Luk\'{a}\v{s} Burget}$^4$, {Arnab Ghoshal}$^5$, {Milos Janda}$^2$, {Stefan Kombrink}$^2$, \\
{Petr Motl\'{i}\v{c}ek}$^6$, {Yanmin Qian}$^7$, {Ngoc Thang Vu}$^8$, {Korbinian Reidhammer}$^9$, {Karel Vesel\'{y}}$^2$ {Petr Motl\'{i}\v{c}ek}$^6$, {Yanmin Qian}$^7$, {Ngoc Thang Vu}$^8$, {Korbinian Reidhammer}$^9$, {Karel Vesel\'{y}}$^2$
\thanks{Thanks here}} \thanks{Thanks here.. remember Sanjeev.}}
%\makeatletter %\makeatletter
%\def\name#1{\gdef\@name{#1\\}} %\def\name#1{\gdef\@name{#1\\}}
%\makeatother %\makeatother
...@@ -133,17 +133,17 @@ for each word sequence. ...@@ -133,17 +133,17 @@ for each word sequence.
\section{Introduction} \section{Introduction}
The word ``lattice'' is used in the speech recognition literature to mean some In Section~\ref{sec:wfst} we give a Weighted Finite State Transducer
kind of compact representation of the most likely transcriptions of an utterance, (WFST) interpretation of the speech-recognition decoding problem, in order
in the form of a graph structure and normally including score and alignment to introduce notation for the rest of the paper. In Section~\ref{sec:lattices}
information in addition to the word labels. See we define the lattice generation problem, and review previous work.
for example~\cite{efficient_general,ney_word_graph,odell_thesis,saon2005anatomy}. In Section~\ref{sec:overview} we give an overview of our method,
and in Section~\ref{sec:details} we summarize some aspects of a determinization
algorithm that we use in our method. In Section~\ref{sec:exp} we give
experimental results, and in Section~\ref{sec:conc} we conclude.
[more history + context here] \section{WFSTs and the decoding problem}
\label{sec:wfst}
\section{Decoding with WFSTs}
The graph creation process we use in our toolkit, Kaldi~\cite{kaldi_paper}, The graph creation process we use in our toolkit, Kaldi~\cite{kaldi_paper},
is very close to the standard recipe described in~\cite{wfst}, is very close to the standard recipe described in~\cite{wfst},
...@@ -151,7 +151,7 @@ where the Weighted Finite State Transducer (WFST) decoding graph is ...@@ -151,7 +151,7 @@ where the Weighted Finite State Transducer (WFST) decoding graph is
\begin{equation} \begin{equation}
\HCLG = \min(\det(H \circ C \circ L \circ G)), \HCLG = \min(\det(H \circ C \circ L \circ G)),
\end{equation} \end{equation}
where $\circ$ is WFST composisition (note: view $\HCLG$ as a single symbol). where $\circ$ is WFST composisition (note: view $\HCLG$ as a single symbol).
For concreteness we will speak of ``costs'' rather For concreteness we will speak of ``costs'' rather
than weights, where a cost is a floating point number that typically represents a negated than weights, where a cost is a floating point number that typically represents a negated
log-probability. A WFST has a set of states with one distinguished log-probability. A WFST has a set of states with one distinguished
...@@ -196,7 +196,10 @@ Since the beam pruning is a part of any practical search procedure and cannot ...@@ -196,7 +196,10 @@ Since the beam pruning is a part of any practical search procedure and cannot
easily be avoided, we will define the desired outcome of lattice generation in terms easily be avoided, we will define the desired outcome of lattice generation in terms
of the visited subset $B$ of the search graph $S$. of the visited subset $B$ of the search graph $S$.
\section{Defining lattices and the lattice generation problem} \section{The lattice generation problem, and previous work}
\label{sec:lattices}
\subsection{Lattices, and the lattice generation problem}
There is no generally accepted single definition of a lattice. In~\cite{efficient_general} There is no generally accepted single definition of a lattice. In~\cite{efficient_general}
and~\cite{sak2010fly}, it is defined as a labeled, weighted, directed acyclic graph and~\cite{sak2010fly}, it is defined as a labeled, weighted, directed acyclic graph
...@@ -247,7 +250,7 @@ We note that by ``word-sequence'' we mean a sequence of whatever symbols are on ...@@ -247,7 +250,7 @@ We note that by ``word-sequence'' we mean a sequence of whatever symbols are on
output of $\HCLG$. In our experiments these symbols represent words, but not including output of $\HCLG$. In our experiments these symbols represent words, but not including
silence, which we represent via alternative paths in $L$. silence, which we represent via alternative paths in $L$.
\section{Previous lattice generation methods} \subsection{Previous lattice generation methods}
Lattice generation algorithms tend to be closely linked to a particular type of decoder, Lattice generation algorithms tend to be closely linked to a particular type of decoder,
but are often justified by the same kinds of ideas. but are often justified by the same kinds of ideas.
...@@ -298,6 +301,7 @@ that would be within the lattice-generation beam. In addition, this algorithm w ...@@ -298,6 +301,7 @@ that would be within the lattice-generation beam. In addition, this algorithm w
complex to implement efficiently. complex to implement efficiently.
\section{Overview of our algorithm} \section{Overview of our algorithm}
\label{sec:overview}
\subsection{Version without alignments} \subsection{Version without alignments}
...@@ -429,6 +433,7 @@ encoded into the weights. Of course, the costs and alignments are not in any ...@@ -429,6 +433,7 @@ encoded into the weights. Of course, the costs and alignments are not in any
sense ``synchronized'' with the words. sense ``synchronized'' with the words.
\section{Details of our $\epsilon$ removal and determinization algorithm} \section{Details of our $\epsilon$ removal and determinization algorithm}
\label{sec:details}
We implemented $\epsilon$ removal and determinization as a single algorithm We implemented $\epsilon$ removal and determinization as a single algorithm
because $\epsilon$-removal using the traditional approach would greatly because $\epsilon$-removal using the traditional approach would greatly
......
...@@ -16,11 +16,11 @@ ...@@ -16,11 +16,11 @@
year = 1997 year = 1997
} }
@article{ odell_thesis, @thesis{ odell_thesis,
title={The use of context in large vocabulary speech recognition}, title={The use of context in large vocabulary speech recognition},
author={Odell, J.J.}, author={Odell, J.J.},
year={1995}, year={1995},
publisher={Citeseer} publisher={Cambridge University Engineering Dept.}
} }
@inproceedings{sak2010fly, @inproceedings{sak2010fly,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment