Commit 08024e90 authored by kkm's avatar kkm

Documentation changes: added Git tutorial, removed Subversion tutorial and...

Documentation changes: added Git tutorial, removed Subversion tutorial and updated multiple references from Subversion to Git and from SourceForge to Kaldi's own web site or GitHub as appropriate.
parent 1d24e8c9
......@@ -33,12 +33,12 @@
@section about_name The name Kaldi
According to legend, Kaldi was the Ethiopian goatherder who discovered the
coffee plant.
coffee plant.
@section about_compare Kaldi's versus other toolkits
Kaldi is similar in aims and scope to HTK. The goal is to have modern and
flexible code, written in C++, that is easy to modify and extend.
flexible code, written in C++, that is easy to modify and extend.
Important features include:
- Code-level integration with Finite State Transducers (FSTs)
- We compile against the OpenFst toolkit (using it as a library).
......@@ -49,21 +49,21 @@
- As far as possible, we provide our algorithms in the most generic
form possible. For instance, our decoders are templated on an
object that provides a score indexed by a (frame, fst-input-symbol)
tuple. This means the decoder could work from any suitable source
of scores, such as a neural net.
tuple. This means the decoder could work from any suitable source
of scores, such as a neural net.
- Open license
- The code is licensed under Apache 2.0, which is one of the least
restrictive licenses available.
- Complete recipes
- Our goal is to make available complete recipes for building
- Our goal is to make available complete recipes for building
speech recognition systems, that work from widely available
databases such as those provided by the Linguistic Data
Consortium (LDC).
Consortium (LDC).
The goal of releasing complete recipes is an important aspect of Kaldi.
Since the code is publicly available under a license that permits
The goal of releasing complete recipes is an important aspect of Kaldi.
Since the code is publicly available under a license that permits
modifications and re-release, we would like to encourage people to release
their code, along with their script directories, in a similar format to
their code, along with their script directories, in a similar format to
Kaldi's own example script.
We have tried to make Kaldi's documentation as complete as possible given time
......@@ -75,7 +75,7 @@
to an expert. In the future we hope to make it somewhat more accessible,
bearing in mind that our intended audience is speech recognition researchers or
researchers-in-training. In general, Kaldi is not a speech recognition
toolkit "for dummies." It will allow you to do many kinds of operations that
toolkit "for dummies." It will allow you to do many kinds of operations that
don't make sense.
@section about_flavor The flavor of Kaldi
......@@ -88,39 +88,39 @@
- We emphasize generic algorithms and universal recipes
- By "generic algorithms" we mean things like linear
transforms, rather than those that are specific to speech
transforms, rather than those that are specific to speech
in some way. But we don't intend to be too dogmatic about this,
if more specific algorithms are useful.
- We would like recipes that can be run on any data-set, rather than
- We would like recipes that can be run on any data-set, rather than
those that have to be customized.
- We prefer provably correct algorithms
- The recipes have been designed in such a way that in principle they
should never fail in a catastophic way. There has been an effort to avoid recipes and
should never fail in a catastophic way. There has been an effort to avoid recipes and
algorithms that could possibly fail, even if they don't fail in the
"normal case" (one example: FST weight-pushing, which normally helps but
can crash or make things much worse in certain cases).
- Kaldi code is thoroughly tested.
- The goal is for all or nearly all the code to have corresponding
test routines.
- The goal is for all or nearly all the code to have corresponding
test routines.
- We try to keep the simple cases simple.
- There is a danger when building a large speech toolkit that the
code can become a forest of rarely used alternatives. We are trying to avoid
this by structuring the toolkit in the following way. Each command-line
program generally works for a limited set of cases (e.g. a decoder
might just work for GMMs). Thus, when you add a new type of model, you create
a new command-line decoder (that calls the same underlying templated code).
a new command-line decoder (that calls the same underlying templated code).
- Kaldi code is easy to understand.
- Even though the Kaldi toolkit as a whole may get very large, we aim
for each individual part of it to be understandable without too much
effort. We will accept some code duplication if it improves the
understandability of individual pieces.
- Kaldi code is easy to reuse and refactor.
- We aim for the toolkit to as loosely coupled as possible.
- We aim for the toolkit to as loosely coupled as possible.
In general this means that any given header should need to \#include as
few other header files as possible. The matrix library, in particular,
only depends on code in one other subdirectory so it can be used independently
of almost all the rest of Kaldi.
@section about_status Status of the project
Currently, we have code and scripts for most standard techniques, including all standard
......@@ -134,12 +134,9 @@
Note: after an early phase in which we intended to use version numbers for
major releases of Kaldi ("v1" and so on), we realized that these type of
releases do not mesh well with the natural style of development, which is very
continuous. Currently we maintain two major versions of Kaldi: the "trunk"
version, and the "complete" version (which maintains some little-used features
that were deleted from trunk). We also maintain various sandboxes for feature
development; these are merged back into trunk when the feature is complete.
For most purposes, the "trunk" is the version you should use, and you should
frequently do "svn up" to keep it up to date; see \ref install for more details.
continuous. Currently we maintain only the "master" development branch, and
this is the version you should use. Also,
frequently do "git pull" to keep it up to date; see \ref install for more details.
See \ref roadmap for details of features we are currently working on.
......
......@@ -33,7 +33,7 @@
grid will have NVidia GPUs which you can use for neural net training,
and you can reserve these on the queue by adding some extra option to qsub.
See \ref queue for more information.
We have started a separate project called <a
href=https://sourceforge.net/projects/kluster/> Kluster </a> that shows you
how to create such a cluster on Amazon's EC2; MIT's <a
......@@ -65,7 +65,7 @@
will not work there, and we are not very actively maintaining the Windows
compatibility of the code or the Windows build scripts (we fix problems when
we are told about them though).
\section dependencies_packages Software packages required
......@@ -73,9 +73,9 @@
order to install Kaldi. The full list is not important since the installation
scripts will tell you what you are missing.
- Subversion (svn): this is needed to download Kaldi and other software that it depends on.
- Git: this is needed to download Kaldi and other software that it depends on.
- wget is required for the installation of some non-Kaldi components described below
- The example scripts require standard UNIX utilities such as bash,
- The example scripts require standard UNIX utilities such as bash,
perl, awk, grep, and make.
It can also be helpful if you have an ATLAS linear-algebra package installed
......@@ -91,7 +91,7 @@
non-exhaustive list).
- OpenFst: we compile against this and use it heavily.
- IRSTLM: this a language modeling toolkit. Some of the example scripts require it but
- IRSTLM: this a language modeling toolkit. Some of the example scripts require it but
it is not tightly integrated with Kaldi; we can convert any Arpa format
language model to an FST.
- The IRSTLM build process requires automake, aclocal, and libtoolize
......@@ -102,9 +102,9 @@
as wav. It's needed for the example scripts that use LDC data.
- sclite: this is for scoring and is not necessary as we have our own, simple
scoring program (compute-wer.cc).
- ATLAS, the linear algebra library. This is only needed for the headers; in
- ATLAS, the linear algebra library. This is only needed for the headers; in
typical setups we expect that ATLAS will be on your system. However, if it not
already on your system you can compile ATLAS as long as your machine does not
already on your system you can compile ATLAS as long as your machine does not
have CPU throttling enabled.
- CLAPACK, the linear algebra library (we download the headers).
This is useful only on systems where you don't have ATLAS and are
......
......@@ -22,54 +22,40 @@
\page install Downloading and installing Kaldi
\section install_transition Transition to github
\section install_download Dowloading Kaldi
Due to the long recent sourceforge outage, we have now transitioned to
github for all future development. We still intend to maintain a
read-only subversion mirror of the github parent, located at sourceforge and mirrored
by us; however, we won't be able to set that up until Sourceforge comes back up.
We have now transitioned to
GitHub for all future development. We still intend to maintain a
read-only Subversion mirror of the GitHub parent, located at SourceForge and mirrored
by us.
While sourceforge is still down, the easiest way to access Kaldi as follows:
\verbatim
git clone https://github.com/kaldi-asr/kaldi.git
\endverbatim
You can then keep it up-to-date using "git pull".
When sourceforge comes back up, you will be free to access it either through github
or through the subversion commands below.
If you may want to contribute to Kaldi, this will mostly be done using pull requests.
You would first log in to github and go to https://github.com/kaldi-asr/kaldi and click on
"fork" to fork the repository. Then, in your local fork of the repository you would
do your work in a differently named branch, and generate a pull request through the
online interface of github. We will soon provide more detailed instructions on this.
\section install_download Dowloading Kaldi (old instructions)
You first need to install Subversion (SVN). The most current version of Kaldi,
You first need to install Git. The most current version of Kaldi,
possibly including unfinished and experimental features, can
be downloaded by typing into a shell:
\verbatim
svn co https://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk
git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden
cd kaldi-trunk
\endverbatim
If you want to get updates and bug fixes you can go to some checked-out
directory, and type
\verbatim
svn update
git pull
\endverbatim
If "svn update" prints out scary looking messages about conflicts (caused by
you changing parts of files that were later modified centrally),
you may have to resolve the conflicts; for that, we recommend that you
read about how svn works.
If "git pull" prints out a message telling it cannot pull the remote
changes because you have changed files locally,
you may have to commit locally and merge your changes, or stash them temporarily
and then apply back the stash; for that, we recommend that you
read about how Git works, possibly starting with the \ref tutorial_git.
\section install_install Installing Kaldi
The top-level installation instructions are in the file INSTALL.
For Windows, there are separate instructions (unfortunately, not actively maintained and woefully out of date)
in windows/INSTALL.
The top-level installation instructions are in the file \c INSTALL.
For Windows, there are separate instructions (unfortunately, not actively
maintained and woefully out of date) in \c windows/INSTALL.
See also \ref build_setup which explains how the build process
works internally.
The example scripts are in egs/
The example scripts are in \c egs/
*/
......@@ -24,11 +24,11 @@
for that. In this page we explain what the legal stuff means (as we
understand it).
The code and other content (e.g. scripts, documentation) in
The code and other content (e.g. scripts, documentation) in
Kaldi is released under the Apache license, version 2.0. The Apache
license is a popular "BSD-like" license. This means you can use
Kaldi for free and redistribute it, even for commercial purposes,
although you can't take off the license headers (and under some
although you can't take off the license headers (and under some
circumstances you may have to distribute a license document). Apache is
not a ``viral'' license like the GPL, which forces you to release
your modifications to the source code. Also note that this project has no
......@@ -54,22 +54,22 @@
header on them that says something like:
\verbatim
// Copyright 2012 Joe M. Schmo
//
//
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// etc.
// etc.
\endverbatim
However, if Joe Schmo works for Acme Corporation and releases code
as part of his work, then (depending what country Joe lives in) the header
as part of his work, then (depending what country Joe lives in) the header
would probably look something like this:
\verbatim
// Copyright 2012 Acme Corporation
//
//
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// etc.
// etc.
\endverbatim
This would be the case under some circumstances even if he did it
in his spare time. For example: the terms of Joe's employment with
......@@ -83,18 +83,18 @@
look something like this:
\verbatim
// Copyright 2012 Acme Corporation Jane Doe
//
//
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// etc.
// etc.
\endverbatim
In this case Acme Corporation and Jane Doe (and we're assuming that they
In this case Acme Corporation and Jane Doe (and we're assuming that they
agreed to this), jointly own the copyright on the code that they wrote.
The order of names has no legal meaning.
Joint ownership means that if either party chooses (and they don't have to
both agree), they can release the code themselves under a different
license. However, for Apache-licensed projects, there is
both agree), they can release the code themselves under a different
license. However, for Apache-licensed projects, there is
typically no point in doing this, since Apache already
allows for commercial use.
......@@ -108,17 +108,17 @@
The way this is normally handled is, Jane Doe
should add a new Apache header at the top of the file, above the one
mentioning Acme Corporation, and she should say something to the
effect that the work is derived from the original file from Acme
effect that the work is derived from the original file from Acme
Corporation, and the whole modified file is being released under Apache.
We haven't done it this way, because the project is very
collaborative, and if we did it like this we would have extremely long copyright
collaborative, and if we did it like this we would have extremely long copyright
headers. Instead we use the convention that if Jane makes a change,
she simply adds her name to the list of authors in the copyright
header. We are treating this as
a kind of shorthand for the whole multiple-header thing (this is
explained in the COPYING file). The way you can disambiguate
between joint copyright ownership and derivative work, is to
go back in the version history in subversion, and see what the original
go back in the version history in Git, and see what the original
release contained. We guess that most people won't care about
this distinction, which is why we have not bothered to disambiguate it.
For shell and perl scripts and other non-C++ content
......
......@@ -23,18 +23,15 @@
\page other Other Kaldi-related resources (and how to get help)
The main places where Kaldi knowledge can be found are this website,
and in the code repository (which we are currently in the process of moving
from subversion to git; see \ref install for instructions).
and in the code repository (see \ref install for instructions).
The repository contains the Kaldi code; the installation scripts;
and example scripts for a number of different datasets, which are located
in the sub-directory egs/).
Kaldi's <a href=http://sourceforge.net/projects/kaldi/>project page on Sourceforge</a> contains
a number of useful resources, but after the recent extended outage we are migrating away from
Sourceforge. <a href=http://kaldi-asr.org/>kaldi-asr.org/</a> is now the top-level
location you should go to; see in particular information about help forums and email
lists at <a href=http://kaldi-asr.org/forums.html>kaldi-asr.org/forums.html</a>.
in the sub-directory \c egs/).
Kaldi's <a href="http://kaldi-asr.org/">project page</a> contains
a number of useful resources; see in particular information about help forums and email
lists at <a href="http://kaldi-asr.org/forums.html">kaldi-asr.org/forums.html</a>.
......
......@@ -22,7 +22,7 @@
- \subpage tutorial_prereqs "Prerequisites"
- \subpage tutorial_setup "Getting started" (15 minutes)
- \subpage tutorial_svn "Version control with Subversion" (5 minutes)
- \subpage tutorial_git "Version control with Git" (5 minutes)
- \subpage tutorial_looking "Overview of the distribution" (25 minutes)
- \subpage tutorial_running "Running the example scripts" (40 minutes)
- \subpage tutorial_code "Reading and modifying the code" (30 minutes)
......
......@@ -36,7 +36,7 @@
Go to the top-level directory (we called it kaldi-1) and then into
src/.
src/.
First look at the file base/kaldi-common.h (don't follow the links within
this document; view it from the shell or from an editor). This \#includes a number of
things from the base/ directory that are used by almost every Kaldi program. You
......@@ -56,7 +56,7 @@
\section tutorial_code_matrix Matrix library (and modifying and debugging code)
Now look at the file matrix/matrix-lib.h. See what files it includes. This provides
an overview of the kinds of things that are in the matrix library. This library
is basically a C++ wrapper for BLAS and LAPACK, if that means anything to you (if not,
......@@ -69,7 +69,7 @@
These types of commends, and block comments that begin with /**, are interpreted by the
Doxygen software that automatically generates documentation. It also generates the
page you are reading right now (the source for this type of documentation
is in src/doc/).
is in src/doc/).
At this point we would like you to modify the code and compile it. We will be
adding a test function to the file matrix/matrix-lib-test.cc. As mentioned
......@@ -92,13 +92,13 @@ void UnitTestAddVec() {
InitRand(&v);
InitRand(&w);
Vector<Real> w2(w); // w2 is a copy of w.
Real f = RandGauss();
Real f = RandGauss();
w.AddVec(f, v); // w <-- w + f v
for (int32 i = 0; i < dim; i++) {
Real a = w(i), b = f * w2(i) + v(i);
AssertEqual(a, b); // will crash if not equal to within
// a tolerance.
}
}
}
\endverbatim
Add this code to the file matrix-lib-test.cc, just above the function
......@@ -109,7 +109,7 @@ MatrixUnitTest(). Then, inside MatrixUnitTest(), add the line:
It doesn't matter where in the function you add this.
Then type "make test". There should be an error (a semicolon that should be
a comma); fix it and try again.
Now type "./matrix-lib-test". This should crash with an assertion failure,
Now type "./matrix-lib-test". This should crash with an assertion failure,
because there was another mistake in the unit-test code. Next we will debug it.
Type
\verbatim
......@@ -130,7 +130,7 @@ values of a and b ("p" is short for "print"). Your screen should look someting
$5 = -0.931363404
(gdb) p b
$6 = -0.270584524
(gdb)
(gdb)
\endverbatim
The exact values are, of course, random, and may be different for you. Since
the numbers are considerably different, it's clear that it's not just a question
......@@ -145,7 +145,7 @@ $8 = 0.281656802
$9 = -0.931363404
(gdb) p w2.data_[0]
$10 = -1.07592916
(gdb)
(gdb)
\endverbatim
This may help you work out that the expression for "b" is wrong. Fix it in the code, recompile, and run
again (you can just type "r" in the gdb prompt to rerun). It should now run OK. Force gdb to break into the
......@@ -169,6 +169,7 @@ If you need to debug a program that takes command-line arguments, you can do it
\endverbatim
or you can invoke gdb without arguments and then type "r arg1 arg2..." at the prompt.
\todo This paragraph is full of lies!
When you are done, and it compiles, type
\verbatim
svn diff
......@@ -176,7 +177,7 @@ svn diff
to see what changes you made. If you are contributing to the Kaldi project and you
are planning to commit code in the near future, you
may want to revert the changes you made so you don't accidentally commit them. The following
commands will save the file you modified in case you need it later, and will revert to
commands will save the file you modified in case you need it later, and will revert to
the original version:
\verbatim
cp matrix-lib-test.cc matrix-lib-test.cc.tmp
......@@ -190,12 +191,12 @@ svn commit --username=your_sourceforge_username -m "Added a unit-test in matrix/
\section tutorial_code_acoustic Acoustic modeling code
Next look at gmm/diag-gmm.h (this class stores a Gaussian Mixture Model).
Next look at gmm/diag-gmm.h (this class stores a Gaussian Mixture Model).
The class DiagGmm may look a bit confusing as
it has many different accessor functions. Search for "private" and look
at the class member variables (they always end with an underscore, as per
the Kaldi style). This should make it clear how we store the GMM.
This is just a single GMM, not a whole collection of GMMs.
This is just a single GMM, not a whole collection of GMMs.
Look at gmm/am-diag-gmm.h; this class stores a collection of GMMs.
Notice that it does not inherit from anything.
Search for "private" and you can see the member variables (there
......@@ -211,7 +212,7 @@ keeping the rest of the system the same. We'll come to this other stuff later.
Next look at feat/feature-mfcc.h. Focus on the MfccOptions struct.
The struct members give you some idea what kind of options are supported
in MFCC feature extraction.
in MFCC feature extraction.
Notice that some struct members are options structs themselves.
Look at the Register function. This is standard in Kaldi options classes.
Then look at featbin/compute-mfcc-feats.cc (this is a command-line
......@@ -219,8 +220,8 @@ program) and search for Register.
You can see where the Register function of the options struct is called.
To see a complete list of the options supported for MFCC feature extraction,
execute the program featbin/compute-mfcc-feats with no arguments.
Recall that you saw some of these options being registered in
the MfccOptions class, and others being registered in
Recall that you saw some of these options being registered in
the MfccOptions class, and others being registered in
featbin/compute-mfcc-feats.cc. The way to specify options is --option=value.
Type
\verbatim
......@@ -264,11 +265,11 @@ adding statistics together and evaluating some kind of objective function
(e.g. a likelihood). In the normal recipe, it actually points to a class
that contains sufficient statistics for estimating a diagonal Gaussian p.d.f..
Do
Do
\verbatim
less exp/tri1/log/acc_tree.log
\endverbatim
There won't be much information in this file, but you can see the command
There won't be much information in this file, but you can see the command
line. This program accumulates the single-Gaussian statistics for each HMM-state
(actually, pdf-class) of each seen triphone context.
The <DFN>--ci-phones</DFN> options is so that it knows to avoid accumulating separate
......@@ -285,7 +286,7 @@ This program does the decision-tree clustering; it reads in the statistics
that were output by. It is basically a wrapper for the BuildTree function discussed above.
The questions that it asks in the decision-tree clustering are automatically generated,
as you can see in the script steps/train_tri1.sh (look for the programs cluster-phones
and compile-questions).
and compile-questions).
......@@ -296,7 +297,7 @@ topologies for a number of phones. In general each phone can have a different
topology. The topology includes "default" transitions, used for initialization.
Look at the example topology in the extended comment at the top of the header.
There is a tag <PdfClass> (note: as with HTK text formats,
this file looks vaguely XML-like, but it is not really XML).
this file looks vaguely XML-like, but it is not really XML).
The <PdfClass> is always the same as the HMM-state (<State>) here; in
general, it doesn't have to be. This is a mechanism to enforce tying of
distributions between distinct HMM states; it's possibly useful if you want to
......
This diff is collapsed.
......@@ -21,12 +21,12 @@
\page tutorial_looking Kaldi tutorial: Overview of the distribution (20 minutes)
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_svn "Previous: Version control with Subversion" <BR>
\ref tutorial_git "Previous: Version control with Git" <BR>
\ref tutorial_running "Next: Running the example scripts" <BR>
Before we jump into the example scripts, let us take a few minutes to look at what
else is included in the Kaldi distribution. Go to the kaldi-1 directory and list it.
There are a few files and subdirectories.
There are a few files and subdirectories.
The important subdirectories are "tools/", "src/", and "egs/" which we will
look at in the next section.
We will give an overview of "tools/" and "src/".
......@@ -53,7 +53,7 @@
of an abstract FST type. You can see that there are a lot of templates involved.
If templates are not your thing, you will probably have trouble understanding this code.
Change directory to bin/, or add it to your path.
Change directory to bin/, or add it to your path.
We will be executing some simple example instructions from
<a href=http://www.openfst.org/twiki/bin/view/FST/FstQuickTour#CreatingFsts>here</a>.
......@@ -63,7 +63,7 @@
# arc format: src dest ilabel olabel [weight]
# final state format: state [weight]
# lines may occur in any order except initial state must be first line
# unspecified weights default to 0.0 (for the library-default Weight type)
# unspecified weights default to 0.0 (for the library-default Weight type)
cat >text.fst <<EOF
0 1 a x .5
0 1 b y 1.5
......@@ -118,7 +118,7 @@ rm *.fst *.txt
\section tutorial_looking_src The src/ directory (10 minutes)
Change directory back up to the top level (kaldi-1) and into src/.
List the directory. You will see a few files and a large number of
List the directory. You will see a few files and a large number of
subdirectories. Look at the Makefile. At the top it sets the variable
SUBDIRS. This is a list of the subdirectories containing code.
Notice that some of them end in "bin". These are the ones that contain
......@@ -129,16 +129,16 @@ rm *.fst *.txt
Type "make test". This command goes into the various subdirectories and
runs test programs in there. All the tests should succeed. If you are
feeling lucky you can also type "make valgrind". This runs the same
tests with a memory checker, and takes longer, but will find more
tests with a memory checker, and takes longer, but will find more
errors. If this doesn't work, forget about it; it's not important
for now. If it is taking too long, stop it with ctrl-c.
Change directory to base/. Look at the Makefile. Notice the line
\verbatim
include ../kaldi.mk
\endverbatim
This lines includes the file ../kaldi.mk verbatim whenever a Makefile in a
subdirectory is invoked (just like a C \#include directive).
subdirectory is invoked (just like a C \#include directive).
Look at the file ../kaldi.mk. It will contain
some rules related to valgrind (for memory debugging), and then some
system-specific configuration in the form of variables such as CXXFLAGS.
......@@ -153,10 +153,10 @@ include ../kaldi.mk
it. Several other targets are defined, starting with "clean". Look for
them in the Makefile. To make "clean" you would type "make clean".
The target .valgrind is not something you would invoke from the command line;
you would type "make valgrind" (the target is defined in kaldi.mk).
you would type "make valgrind" (the target is defined in kaldi.mk).
Invoke all of these targets, i.e. type "make clean" and the same for the others,
and notice what commands are issued when you do this.
In the Makefile in the base/ directory: choose one of the binaries
listed in TESTFILES, and run it. Then briefly view the corresponding .cc file.
The math one is a good example (note: this excludes the majority of math functions
......@@ -182,12 +182,12 @@ code, cd to ../util, and view text-utils.h. Notice that the inputs of these
functions are always first, and are generally const references, while the
outputs (or inputs that are modified) are always last, and are pointer arguments. Non-const references
as function arguments are not allowed. You can read more about the Kaldi-specific
elements of the coding style \ref style "here" later if you are interested.
elements of the coding style \ref style "here" later if you are interested.
For now, just be aware that there is a coding style with quite specific rules.
Change directory to ../gmmbin and type
\verbatim
./gmm-init-model
./gmm-init-model
\endverbatim
It prints out the usage, which should give you a generic idea of how Kaldi programs
are called. Note that while there is a --config option that can be used to
......@@ -195,7 +195,7 @@ pass a configuration file, in general Kaldi is not as config-driven as HTK and t
files are not widely used. You will see a --binary option. In general, Kaldi file
formats come in both binary and test forms, and the --binary option controls how
they are written. However, this only controls how single objects (e.g. acoustic models)
are written. For whole collections of objects (e.g. collections of feature files),
are written. For whole collections of objects (e.g. collections of feature files),
there is a different mechanism that we will come to later.
Type
\verbatim
......@@ -205,7 +205,7 @@ What do you see, and what does this tell you about what Kaldi does with logging-
output? The place that the usage message goes is the same place that all error and
logging messages go, and there is a reason for this, which should become apparent
when you start looking at the scripts.
To get a little insight into the build process, cd to ../matrix, and type
\verbatim
rm *.o
......@@ -234,7 +234,7 @@ a build process, one solution is to try modifying kaldi.mk by hand. In order to
probably understand how Kaldi makes use of external math libraries (see \ref matrixwrap).
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_svn "Previous: Version control with Subversion" <BR>
\ref tutorial_git "Previous: Version control with Git" <BR>
\ref tutorial_running "Next: Running the example scripts" <BR>
<P>
*/
......@@ -43,11 +43,11 @@
Management (RM) CDs from the Linguistic Data Consortium (LDC), in the original form
as distributed by the LDC. That is, we assume this data is sitting on your system
somewhere. We obtained this as catalog number LDC93S3A. It is
also available in two separate pieces. Be careful because there was previously
also available in two separate pieces. Be careful because there was previously
a different distribution of the RM data with a different layout.
The system requirements are fairly basic. We assume that you have tools
including wget, svn, awk, perl and so on, or that you know how to install them.
including wget, git, awk, perl and so on, or that you know how to install them.
The most difficult part of the installation process relates to the math library
ATLAS; if this is not already installed as a library on your system you will
have to compile it, and this requires that CPU throttling be turned off, which
......@@ -61,7 +61,7 @@
to try to keep to the posted schedule, if necessary by skipping steps and avoiding
following links to more information that we provide in the text. This will help ensure
that you get a balanced overview. You can always review the material in more
detail later on. If this tutorial is to be given in a classroom setting, it is
detail later on. If this tutorial is to be given in a classroom setting, it is
important that someone run through the tutorial on the relevant system beforehand in order
to verify that all the prerequisites are installed.
......
......@@ -22,7 +22,7 @@
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_prereqs "Previous: Prerequisites" <BR>
\ref tutorial_svn "Next: Version control with Subversion" <BR>
\ref tutorial_git "Next: Version control with Git" <BR>
The first step is to download and install Kaldi. We will be using version 1 of
the toolkit, so that this tutorial does not get out of date. However, be aware
......@@ -32,28 +32,28 @@
"s3" scripts mentioned in this tutorial. But be aware that if you do that some
aspects of the tutorial may be out of date.
Assuming Subversion (svn) is installed, to get the latest code you can type
Assuming Git is installed, to get the latest code you can type
\verbatim
svn co svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk
git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden
\endverbatim
Then cd to kaldi-trunk. Look at the INSTALL file and follow the instructions
Then cd to kaldi-trunk. Look at the INSTALL file and follow the instructions
(it points you to two subdirectories). Look carefully at the output of the
installation scripts, as they try to guide you what to do. Some installation
errors are non-fatal, and the installation scripts will tell you so (i.e. there
errors are non-fatal, and the installation scripts will tell you so (i.e. there
are some things it installs which are nice to have but are not really needed).
The "best-case" scenario is that you do:
\verbatim
cd kaldi-trunk/tools/; make; cd ../src; ./configure; make
\endverbatim
and everything will just work; however, if this does not happen there are
fallback plans (e.g. you may have to install some package on your machine, or run
install_atlas.sh in tools/, or run some steps in tools/INSTALL manually,
fallback plans (e.g. you may have to install some package on your machine, or run
install_atlas.sh in tools/, or run some steps in tools/INSTALL manually,
or provide options to the configure script in src/). If there are problems,
there may be some information in \ref build_setup that will help you; otherwise,
feel free to contact the maintainers (\ref other) and we will be happy to help.
feel free to contact the maintainers (\ref other) and we will be happy to help.
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_prereqs "Previous: Prerequisites" <BR>
\ref tutorial_svn "Next: Version control with Subversion" <BR>
\ref tutorial_git "Next: Version control with Git" <BR>
<P>
*/
// doc/tutorial_svn.dox
// Copyright 2009-2011 Microsoft Corporation
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
/**
\page tutorial_svn Kaldi Tutorial: Version control with Subversion (5 minutes)
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
In case you are unfamiliar with the Subversion (svn) version control system, we
give a brief overview of some commands that might be useful to you. Subversion commands
always look like: "svn [command] [arguments]"; you can do "svn help" to see what
commands are available, or "svn help <command>" for help on a specific command.
In kaldi-1 or any subdirectory, type
\verbatim
svn up
\endverbatim
(this is short for "svn update"). If we have committed changes to the repository
in the several minutes since you installed Kaldi, you should see output like
the following:
\verbatim
kaldi-1: svn update
U src/lat/Makefile
U src/nnetbin/nnet-forward.cc
Updated to revision 191.
\endverbatim
More likely, it will just say something like "At revision 191."
To see if you have made any changes to anything, type
\verbatim
svn status
\endverbatim
This will
list files that you changed or that have been added. Files that have been added
to the directories but are not under version control because you have not used the
"svn add" command, will appear with the descriptor '?' (you will see all the
binaries that were compiled). Next, edit a version-controlled file (for example,
src/Makefile; add a comment or something), and type
\verbatim
svn diff
\endverbatim
This should show how your version differs from the copy that you downlaoded.
If you are going to be
contributing to the Kaldi project (and we do welcome new contributors),
then you should become familiar with other commands such
as "svn add", "svn commit" and so on. For this, there are tutorials available
online.
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
<P>
*/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment