Commit 921b8d66 authored by Dan Povey's avatar Dan Povey
Browse files

Modified the introduction of the paper.

git-svn-id: 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 601911bf
......@@ -83,10 +83,10 @@
\name{ Daniel Povey$^1$, Mirko Hannemann$^2$, \\
\name{ Daniel Povey$^1$, Mirko Hannemann$^{1,2}$, \\
{Gilles Boulianne}$^3$, {Luk\'{a}\v{s} Burget}$^4$, {Arnab Ghoshal}$^5$, {Milos Janda}$^2$, {Stefan Kombrink}$^2$, \\
{Petr Motl\'{i}\v{c}ek}$^6$, {Yanmin Qian}$^7$, {Ngoc Thang Vu}$^8$, {Korbinian Reidhammer}$^9$, {Karel Vesel\'{y}}$^2$
\thanks{Thanks here}}
\thanks{Thanks here.. remember Sanjeev.}}
......@@ -133,17 +133,17 @@ for each word sequence.
The word ``lattice'' is used in the speech recognition literature to mean some
kind of compact representation of the most likely transcriptions of an utterance,
in the form of a graph structure and normally including score and alignment
information in addition to the word labels. See
for example~\cite{efficient_general,ney_word_graph,odell_thesis,saon2005anatomy}.
In Section~\ref{sec:wfst} we give a Weighted Finite State Transducer
(WFST) interpretation of the speech-recognition decoding problem, in order
to introduce notation for the rest of the paper. In Section~\ref{sec:lattices}
we define the lattice generation problem, and review previous work.
In Section~\ref{sec:overview} we give an overview of our method,
and in Section~\ref{sec:details} we summarize some aspects of a determinization
algorithm that we use in our method. In Section~\ref{sec:exp} we give
experimental results, and in Section~\ref{sec:conc} we conclude.
[more history + context here]
\section{Decoding with WFSTs}
\section{WFSTs and the decoding problem}
The graph creation process we use in our toolkit, Kaldi~\cite{kaldi_paper},
is very close to the standard recipe described in~\cite{wfst},
......@@ -196,7 +196,10 @@ Since the beam pruning is a part of any practical search procedure and cannot
easily be avoided, we will define the desired outcome of lattice generation in terms
of the visited subset $B$ of the search graph $S$.
\section{Defining lattices and the lattice generation problem}
\section{The lattice generation problem, and previous work}
\subsection{Lattices, and the lattice generation problem}
There is no generally accepted single definition of a lattice. In~\cite{efficient_general}
and~\cite{sak2010fly}, it is defined as a labeled, weighted, directed acyclic graph
......@@ -247,7 +250,7 @@ We note that by ``word-sequence'' we mean a sequence of whatever symbols are on
output of $\HCLG$. In our experiments these symbols represent words, but not including
silence, which we represent via alternative paths in $L$.
\section{Previous lattice generation methods}
\subsection{Previous lattice generation methods}
Lattice generation algorithms tend to be closely linked to a particular type of decoder,
but are often justified by the same kinds of ideas.
......@@ -298,6 +301,7 @@ that would be within the lattice-generation beam. In addition, this algorithm w
complex to implement efficiently.
\section{Overview of our algorithm}
\subsection{Version without alignments}
......@@ -429,6 +433,7 @@ encoded into the weights. Of course, the costs and alignments are not in any
sense ``synchronized'' with the words.
\section{Details of our $\epsilon$ removal and determinization algorithm}
We implemented $\epsilon$ removal and determinization as a single algorithm
because $\epsilon$-removal using the traditional approach would greatly
......@@ -16,11 +16,11 @@
year = 1997
@article{ odell_thesis,
@thesis{ odell_thesis,
title={The use of context in large vocabulary speech recognition},
author={Odell, J.J.},
publisher={Cambridge University Engineering Dept.}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment