Commit 6ea57640 authored by Dan Povey's avatar Dan Povey
Browse files

trunk: Documentation changes including improvements to the look of the...

trunk: Documentation changes including improvements to the look of the website; minor script improvement (limit jobs of I/O intensive process)

git-svn-id: 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
parent 9e5816f0
......@@ -45,7 +45,9 @@ num_threads=1 # if >1, will use gmm-latgen-faster-parallel
parallel_opts= # If you supply num-threads, you should supply this too.
# End configuration section
max_fmllr_jobs=25 # I've seen the fMLLR jobs overload NFS badly if the decoding
# was started with a lot of many jobs, so we limit the number of
# parallel jobs to 25 by default. End configuration section
echo "$0 $@" # Print the command line for logging
[ -f ./ ] && . ./; # source the path.
......@@ -147,7 +149,7 @@ esac
## Now get the first-pass fMLLR transforms.
if [ $stage -le 1 ]; then
echo "$0: getting first-pass fMLLR transforms."
$cmd JOB=1:$nj $dir/log/fmllr_pass1.JOB.log \
$cmd --max-jobs-run $max_fmllr_jobs JOB=1:$nj $dir/log/fmllr_pass1.JOB.log \
gunzip -c $si_dir/lat.JOB.gz \| \
lattice-to-post --acoustic-scale=$acwt ark:- ark:- \| \
weight-silence-post $silence_weight $silphonelist $alignment_model ark:- ark:- \| \
......@@ -183,7 +185,7 @@ fi
## $dir/trans.1, etc.
if [ $stage -le 3 ]; then
echo "$0: estimating fMLLR transforms a second time."
$cmd JOB=1:$nj $dir/log/fmllr_pass2.JOB.log \
$cmd --max-jobs-run $max_fmllr_jobs JOB=1:$nj $dir/log/fmllr_pass2.JOB.log \
lattice-determinize-pruned$thread_string --acoustic-scale=$acwt --beam=4.0 \
"ark:gunzip -c $dir/lat.tmp.JOB.gz|" ark:- \| \
lattice-to-post --acoustic-scale=$acwt ark:- ark:- \| \
......@@ -17,7 +17,7 @@
# The PROJECT_NAME tag is a single word (or a sequence of words surrounded
# by quotes) that should identify the project.
# The PROJECT_NUMBER tag can be used to enter a project or revision number.
# This could be handy for archiving the generated documentation or
......@@ -25,6 +25,20 @@ PROJECT_NAME = 'KALDI'
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer
# a quick idea about the purpose of the project. Keep the description short.
# PROJECT_BRIEF = "Open source speech recognition"
# With the PROJECT_LOGO tag one can specify an logo or icon that is
# included in the documentation. The maximum height of the logo should not
# exceed 55 pixels and the maximum width should not exceed 200 pixels.
# Doxygen will copy the logo to the output directory.
PROJECT_LOGO = ../misc/logo/KaldiTextAndLogoSmall.png
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put.
# If a relative path is entered, it will be relative to the location
......@@ -639,7 +653,40 @@ HTML_FILE_EXTENSION = .html
# each generated HTML page. If it is left blank doxygen will generate a
# standard header.
## Note: doc/header.html is a modified version of the standard header which was generated
## using "doxygen -w html header.html footer.html stylesheet.css Doxyfile" (note, this may
## have to be updated if the doxygen version changes), and then the following line was added:
## <link rel="icon" type="image/png" href="">
HTML_HEADER = doc/header.html
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output.
# Doxygen will adjust the colors in the style sheet and background images
# according to this color. Hue is specified as an angle on a colorwheel,
# see for more information.
# For instance the value 0 represents red, 60 is yellow, 120 is green,
# 180 is cyan, 240 is blue, 300 purple, and 360 is red again.
# The allowed range is 0 to 359.
# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of
# the colors in the HTML output. For a value of 0 the output will use
# grayscales only. A value of 255 will produce the most vivid colors.
# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to
# the luminance component of the colors in the HTML output. Values below
# 100 gradually make the output lighter, whereas values above 100 make
# the output darker. The value divided by 100 is the actual gamma applied,
# so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2,
# and 100 does not change the gamma.
# The HTML_FOOTER tag can be used to specify a personal HTML footer for
# each generated HTML page. If it is left blank doxygen will generate a
......@@ -656,7 +703,6 @@ HTML_FOOTER =
# If the GENERATE_HTMLHELP tag is set to YES, additional index files
# will be generated that can be used as input for tools like the
# Microsoft HTML help workshop to generate a compressed HTML help file (.chm)
......@@ -24,6 +24,7 @@ doxygen
cp doc/*.pptx html/;
if [[ $(hostname -f) == * ]]; then
cp ../misc/logo/KaldiIco.png html/favicon.ico
tar -czf html.tar.gz html
scp html.tar.gz
......@@ -49,3 +50,10 @@ fi
# I added figures that I manually excerpted from
# and
# this is june 13, 2014, 6:11pm, check my email.
# Note (RE adding favicon): I generated the default header files like this (from
# src/) doxygen -w html header.html footer.html stylesheet.css Doxyfile then
# moved the header.html to doc/ and edited it to include the following snippet,
# and added it to the repo.
#<link rel="icon" type="image/png" href="">
......@@ -144,7 +144,7 @@ preprocessor variables, setting compile options, linking with libraries, and so
\section build_setup_platforms Which platforms has Kaldi been compiled on?
We have compiled Kaldi on Windows, Cygwin, various flavors of Linux (including
Ubuntu, CentOS, Debian and SUSE), and Darwin. We recommend you use g++ version
4.4 or above for the source to compile.
Ubuntu, CentOS, Debian, Red Hat and SUSE), and Darwin. We recommend you use g++ version
4.4 or above, although other compilers such as llvm and Intel's icc are also known to work.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="">
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<!--BEGIN PROJECT_NAME--><title>$projectname: $title</title><!--END PROJECT_NAME-->
<!--BEGIN !PROJECT_NAME--><title>$title</title><!--END !PROJECT_NAME-->
<link href="$relpath$tabs.css" rel="stylesheet" type="text/css"/>
<link rel="icon" href="favicon.ico" type="image/x-icon" />
<script type="text/javascript" src="$relpath$jquery.js"></script>
<script type="text/javascript" src="$relpath$dynsections.js"></script>
<link href="stylesheet.css" rel="stylesheet" type="text/css" />
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tr style="height: 56px;">
<td id="projectlogo"><img alt="Logo" src="$relpath$$projectlogo"/ style="padding: 4px 5px 1px 5px"></td>
<td style="padding-left: 0.5em;">
<div id="projectname" style="display:none">$projectname
<!--BEGIN PROJECT_NUMBER-->&#160;<span id="projectnumber">$projectnumber</span><!--END PROJECT_NUMBER-->
<!--BEGIN PROJECT_BRIEF--><div id="projectbrief">$projectbrief</div><!--END PROJECT_BRIEF-->
<td style="padding-left: 0.5em;">
<div id="projectbrief" style="display:none">$projectbrief</div>
<!-- end header part -->
......@@ -30,10 +30,7 @@
\mainpage Kaldi
Please see the \ref install_warning "instructions" on upgrading your repository to the
new location, following our upgrade to the "new" Sourceforge.
(see also Kaldi's <a href=> project page on Sourceforge </a>,
See also Kaldi's <a href=>project page on Sourceforge</a>,
and <a href=></a> where you can download pre-built models.
......@@ -42,7 +39,6 @@
- \subpage install
- \subpage dependencies
- \subpage legal
- \subpage roadmap
- \subpage tutorial
- \subpage data_prep
- \subpage build_setup
// doc/roadmap.dox
// Copyright 2014 Johns Hopkins University (author: Daniel Povey)
// See ../../COPYING for clarification regarding multiple authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
\page roadmap Plans for Kaldi development
This page describes the features we are currently working on or have recently
completed, and how we aim to develop Kaldi in the future.
\section roadmap_current Features we are currently working on
\subsection roadmap_current_online Online decoding
By online decoding, we mean decoding that can take in audio data frame by frame and
output a result with minimal latency. Currently, in ^/sandbox/online (see src/online2
and egs/rm/s5/local/, we are working on an improved framework
for online decoding. This is already functional in terms of GMM-based decoding.
The things that are currently preventing the setup from being merged back to trunk are:
- Issues with online pitch feature generation. We need to verify that the modified
pitch-feature extraction code does not cause a degradation for our existing offline
- We need to finalize the neural net version of the example scripts. Dan was hoping to
have finished this by now, but got involved with something else (speeding up the
neural network training). Also, he is undecided on whether to first work on an
improved version of the neural network recipe that works well without fMLLR adaptation
(e.g. using online-estimated iVectors)
It's currently not clear how long it will take to merge it back to trunk; it
could be a couple of weeks to a couple of months, but anyone is free to use
the sandbox/online version and they should be confident that this will
eventually be merged back to trunk.
\subsection roadmap_current_nnet Neural network related changes
There are various things that are being worked on by both Dan and Karel but are not
likely to be finalized soon. Some of this work relates to convolutional networks.
Dan has recently (June 2014) improved the speed of his neural network training
setup (see \ref dnn2) by improving the way the preconditioning was applied.
Also relating to Dan's neural setup, Samuel Zhang recently added an option
--first-component-power to; setting this to 0.5 seems to
give improvement if there is a reasonable amount of data (this takes the
output of the first p-norm component and raises it the the power 0.5).
\subsection roadmap_current_server Accessing Kaldi over the internet
A couple of people are working on ways to access Kaldi over the Internet
(e.g. a REST API). Although it may not make sense to make these part of
Kaldi's repository, we hope to at least point to an external project that
makes it easy to do this. Note, we're not talking about setting up a public
server at this point, just showing others how to do so.
We're also interested in more telephone-oriented protocols such as MRCP, and
if anyone wants to work on that, we would welcome it.
\section roadmap_like Features we would like to work on, if we had the time
\subsection roadmap_like_faster Faster decoders
An outstanding issue is, we'd like the decoding to be faster. The question is
how to do this in a way that's not too specific to one type of model. One way
this could be done is to convert the OpenFst decoding graph into some other,
more optimized data-structure.
\subsection roadmap_like_examples More example scripts
We are constantly adding new example scripts for new databases, and many of these are added by
new contributors to the Kaldi project. We welcome such contributions.
\section roadmap_wont Features we do not currently plan to work on
\subsection roadmap_wont_scripting A scripting layer (Python etc.).
Our approach is that if you need something that's not supported in the
current command-line programs, you just add a new command-line program.
This works well for us. Maintaining a parallel set of, say, Python-wrapped
example scripts would be too much work. However, people have created
examples of calling Kaldi code from other languages (Python, Perl, Java) for
various specific purposes. Ask us for details.
\subsection roadmap_wont_nlp Natural language processing, language model estimation, etc.
We don't think it makes sense for Kaldi to try to be everything, and NLP and
language model estimation are two examples of things that we probably don't
plan to do in the near future. (However, new facilities for using externally
generated language models are definitely on the table). Kaldi is primarily a
speech recognition project, and we plan to keep it that way, although we did
create code and example scripts for speaker identification and language
identification (see egs/sre08/ and egs/lre/), since these technologies have a
lot in common with speech recognition. We are also dabbling in some
computer-vision (there is an MNIST example in ^/sandbox/convnets):
for now this is mostly just a convenient way to test our ideas related to
convolutional networks on vision tasks, rather than a major pivot towards
computer vision.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment