Commit 08024e90 authored by kkm's avatar kkm

Documentation changes: added Git tutorial, removed Subversion tutorial and...

Documentation changes: added Git tutorial, removed Subversion tutorial and updated multiple references from Subversion to Git and from SourceForge to Kaldi's own web site or GitHub as appropriate.
parent 1d24e8c9
......@@ -134,12 +134,9 @@
Note: after an early phase in which we intended to use version numbers for
major releases of Kaldi ("v1" and so on), we realized that these type of
releases do not mesh well with the natural style of development, which is very
continuous. Currently we maintain two major versions of Kaldi: the "trunk"
version, and the "complete" version (which maintains some little-used features
that were deleted from trunk). We also maintain various sandboxes for feature
development; these are merged back into trunk when the feature is complete.
For most purposes, the "trunk" is the version you should use, and you should
frequently do "svn up" to keep it up to date; see \ref install for more details.
continuous. Currently we maintain only the "master" development branch, and
this is the version you should use. Also,
frequently do "git pull" to keep it up to date; see \ref install for more details.
See \ref roadmap for details of features we are currently working on.
......
......@@ -73,7 +73,7 @@
order to install Kaldi. The full list is not important since the installation
scripts will tell you what you are missing.
- Subversion (svn): this is needed to download Kaldi and other software that it depends on.
- Git: this is needed to download Kaldi and other software that it depends on.
- wget is required for the installation of some non-Kaldi components described below
- The example scripts require standard UNIX utilities such as bash,
perl, awk, grep, and make.
......
......@@ -22,54 +22,40 @@
\page install Downloading and installing Kaldi
\section install_transition Transition to github
\section install_download Dowloading Kaldi
Due to the long recent sourceforge outage, we have now transitioned to
github for all future development. We still intend to maintain a
read-only subversion mirror of the github parent, located at sourceforge and mirrored
by us; however, we won't be able to set that up until Sourceforge comes back up.
We have now transitioned to
GitHub for all future development. We still intend to maintain a
read-only Subversion mirror of the GitHub parent, located at SourceForge and mirrored
by us.
While sourceforge is still down, the easiest way to access Kaldi as follows:
\verbatim
git clone https://github.com/kaldi-asr/kaldi.git
\endverbatim
You can then keep it up-to-date using "git pull".
When sourceforge comes back up, you will be free to access it either through github
or through the subversion commands below.
If you may want to contribute to Kaldi, this will mostly be done using pull requests.
You would first log in to github and go to https://github.com/kaldi-asr/kaldi and click on
"fork" to fork the repository. Then, in your local fork of the repository you would
do your work in a differently named branch, and generate a pull request through the
online interface of github. We will soon provide more detailed instructions on this.
\section install_download Dowloading Kaldi (old instructions)
You first need to install Subversion (SVN). The most current version of Kaldi,
You first need to install Git. The most current version of Kaldi,
possibly including unfinished and experimental features, can
be downloaded by typing into a shell:
\verbatim
svn co https://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk
git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden
cd kaldi-trunk
\endverbatim
If you want to get updates and bug fixes you can go to some checked-out
directory, and type
\verbatim
svn update
git pull
\endverbatim
If "svn update" prints out scary looking messages about conflicts (caused by
you changing parts of files that were later modified centrally),
you may have to resolve the conflicts; for that, we recommend that you
read about how svn works.
If "git pull" prints out a message telling it cannot pull the remote
changes because you have changed files locally,
you may have to commit locally and merge your changes, or stash them temporarily
and then apply back the stash; for that, we recommend that you
read about how Git works, possibly starting with the \ref tutorial_git.
\section install_install Installing Kaldi
The top-level installation instructions are in the file INSTALL.
For Windows, there are separate instructions (unfortunately, not actively maintained and woefully out of date)
in windows/INSTALL.
The top-level installation instructions are in the file \c INSTALL.
For Windows, there are separate instructions (unfortunately, not actively
maintained and woefully out of date) in \c windows/INSTALL.
See also \ref build_setup which explains how the build process
works internally.
The example scripts are in egs/
The example scripts are in \c egs/
*/
......@@ -118,7 +118,7 @@
a kind of shorthand for the whole multiple-header thing (this is
explained in the COPYING file). The way you can disambiguate
between joint copyright ownership and derivative work, is to
go back in the version history in subversion, and see what the original
go back in the version history in Git, and see what the original
release contained. We guess that most people won't care about
this distinction, which is why we have not bothered to disambiguate it.
For shell and perl scripts and other non-C++ content
......
......@@ -23,18 +23,15 @@
\page other Other Kaldi-related resources (and how to get help)
The main places where Kaldi knowledge can be found are this website,
and in the code repository (which we are currently in the process of moving
from subversion to git; see \ref install for instructions).
and in the code repository (see \ref install for instructions).
The repository contains the Kaldi code; the installation scripts;
and example scripts for a number of different datasets, which are located
in the sub-directory egs/).
in the sub-directory \c egs/).
Kaldi's <a href=http://sourceforge.net/projects/kaldi/>project page on Sourceforge</a> contains
a number of useful resources, but after the recent extended outage we are migrating away from
Sourceforge. <a href=http://kaldi-asr.org/>kaldi-asr.org/</a> is now the top-level
location you should go to; see in particular information about help forums and email
lists at <a href=http://kaldi-asr.org/forums.html>kaldi-asr.org/forums.html</a>.
Kaldi's <a href="http://kaldi-asr.org/">project page</a> contains
a number of useful resources; see in particular information about help forums and email
lists at <a href="http://kaldi-asr.org/forums.html">kaldi-asr.org/forums.html</a>.
......
......@@ -22,7 +22,7 @@
- \subpage tutorial_prereqs "Prerequisites"
- \subpage tutorial_setup "Getting started" (15 minutes)
- \subpage tutorial_svn "Version control with Subversion" (5 minutes)
- \subpage tutorial_git "Version control with Git" (5 minutes)
- \subpage tutorial_looking "Overview of the distribution" (25 minutes)
- \subpage tutorial_running "Running the example scripts" (40 minutes)
- \subpage tutorial_code "Reading and modifying the code" (30 minutes)
......
......@@ -169,6 +169,7 @@ If you need to debug a program that takes command-line arguments, you can do it
\endverbatim
or you can invoke gdb without arguments and then type "r arg1 arg2..." at the prompt.
\todo This paragraph is full of lies!
When you are done, and it compiles, type
\verbatim
svn diff
......
// doc/tutorial_git.dox
// Copyright 2015 Smart Action Company LLC
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
/**
\page tutorial_git Kaldi Tutorial: Version control with Git (5 minutes)
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
Git is a distributed version control system. This means that, unlike
Subversion, there are multiple copies of the repository, and the changes are
transferred between these copies in multiple different ways explicitly, but most
of the time one's work is backed by a single copy of the repository. Because of
this multiplicity of copies, there are multiple possible \em workflows that you
may want to follow. Here's one we think best suits you if you just want to
<i>compile and use</i> Kaldi at first, but then at some point optionally decide
to \em contribute your work back to the project.
\section tutorial_git_git_setup First-time Git setup
If you have never used Git before,
<a href="https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup">
perform some minimal configuration first</a>. At the very least, set up your
name and e-mail address:
\verbatim
$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com
\endverbatim
Also, set short names for the most useful git commands you type most often.
\verbatim
$ git config --global alias.co checkout
$ git config --global alias.br branch
$ git config --global alias.st status
\endverbatim
Another very useful utility comes with <tt>git-prompts.sh</tt>,
a bash prompt extension utility for Git (if you do not have it,
search the internet how to install it on your system).
When installed, it provides a shell function \c __git_ps1 that,
when added to the prompt,
expands into the current branch name and pending commit markers,
so you do not forget where you are.
You may modify your \c PS1 shell variable so that it includes literally
<tt>$(__git_ps1 "[%s]")</tt>.
I have this in my \c ~/.bashrc:
\code{.sh}
PS1='\[\033[00;32m\]\u@\h\[\033[0m\]:\[\033[00;33m\]\w\[\033[01;36m\]$(__git_ps1 "[%s]")\[\033[01;33m\]\$\[\033[00m\] '
export GIT_PS1_SHOWDIRTYSTATE=true GIT_PS1_SHOWSTASHSTATE=true
# fake __git_ps1 when git-prompts.sh not installed
if [ "$(type -t __git_ps1)" == "" ]; then
function __git_ps1() { :; }
fi
\endcode
\section tutorial_git_workflow The User Workflow
Set up your repository and the working directory with this command:
\verbatim
kkm@yupana:~$ git clone https://github.com/kaldi-asr/kaldi.git --branch master --single-branch --origin golden
Cloning into 'kaldi'...
remote: Counting objects: 51770, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 51770 (delta 2), reused 0 (delta 0), pack-reused 51762
Receiving objects: 100% (51770/51770), 67.72 MiB | 6.52 MiB/s, done.
Resolving deltas: 100% (41117/41117), done.
Checking connectivity... done.
kkm@yupana:~$ cd kaldi/
kkm@yupana:~/kaldi[master]$
\endverbatim
Now, you are ready to configure and compile Kaldi and work with it.
Once in a while you want the latest changes in your local branch.
This is akin to what you usually did with <tt>svn update</tt>.
But please first let's agree to one thing:
you do not commit any files on the master branch.
We'll get to that below.
So far, you are only using the code.
It will be hard to untangle if you do not follow the rule,
and Git is so amazingly easy at branching,
that you always want to do your work on a branch.
\verbatim
kkm@yupana:~/kaldi[master]$ git pull golden
remote: Counting objects: 148, done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 148 (delta 111), reused 130 (delta 93), pack-reused 0
Receiving objects: 100% (148/148), 18.39 KiB | 0 bytes/s, done.
Resolving deltas: 100% (111/111), completed with 63 local objects.
From https://github.com/kaldi-asr/kaldi
658e1b4..827a5d6 master -> golden/master
\endverbatim
The command you use is <tt>git pull</tt>,
and \c golden is the alias we used to designate the main replica of the Kaldi
repository before.
\section tutorial_git_contributor From User To Contributor
At some point you decided to change Kaldi code,
be it scripts or source. Maybe you made a simple bug fix.
Maybe you are contributing a whole recipe. In any case,
your always do your work on a branch.
Even if you have uncommitted changes, Git handles that.
For example, you just realized that the \c fisher_english recipe does not
actually make use of \c hubscr.pl for scoring, but checks it exists and
fails.
You quickly fixed that in your work tree and want to share this change
with the project.
\subsection tutorial_git_branch Work locally on a branch
\verbatim
kkm@yupana:~/kaldi[master *]$ git fetch golden
kkm@yupana:~/kaldi[master *]$ git co golden/master -b fishfix --no-track
M fisher_english/s5/local/score.sh
Branch fishfix set up to track remote branch master from golden.
Switched to a new branch 'fishfix'
kkm@yupana:~/kaldi[myfix *]$
\endverbatim
So what we did here, we first \em fetched the current changes to the golden
repository to your machine.
This did not update your master
(in fact, you cannot pull if you have local worktree changes),
but did update the remote reference \c golden/master.
In the second command, we forked off a branch in your local repository,
called \c fishfix.
Was it more logical to branch off \c master? Not at all!
First, this is one operation more. You do not *need* to update the master, so
why would you? Second, we agreed (remember?) that master will have no changes,
and you had some. Third, and believe me, this happens, you might have committed
something to your master by mistake, and you do not want to bring this feral
change into your new branch.
Now you examine your changes, and, since they are good, you commit them:
\code{.diff}
kkm@yupana:~/kaldi[fishfix *]$ git diff
diff --git a/egs/fisher_english/s5/local/score.sh b/egs/fisher_english/s5/local/score.sh
index 60e4706..552fada 100755
--- a/egs/fisher_english/s5/local/score.sh
+++ b/egs/fisher_english/s5/local/score.sh
@@ -27,10 +27,6 @@ dir=$3
model=$dir/../final.mdl # assume model one level up from decoding dir.
-hubscr=$KALDI_ROOT/tools/sctk/bin/hubscr.pl
-[ ! -f $hubscr ] && echo "Cannot find scoring program at $hubscr" && exit 1;
-hubdir=`dirname $hubscr`
-
for f in $data/text $lang/words.txt $dir/lat.1.gz; do
[ ! -f $f ] && echo "$0: expecting file $f to exist" && exit 1;
done
kkm@yupana:~/kaldi[fishfix *]$ git commit -am 'fisher_english scoring does not really need hubscr.pl from sctk.'
[fishfix d7d76fe] fisher_english scoring does not really need hubscr.pl from sctk.
1 file changed, 4 deletions(-)
kkm@yupana:~/kaldi[fishfix]$
\endcode
Note that the \c -a switch to <tt>git commit</tt> makes it commit all modified
files (we had only one changed, so why not?). If you want to separate file
modifications into multiple features to submit separately, <tt>git add</tt>
specific files followed by <tt>git commit</tt> without the \c -a switch, and
then start another branch off the same point as the first one for the next fix:
<tt>git co golden/master -b another-fix --no-track</tt>, where you could add and
commit other changed files. With Git, it is not uncommon to have a dozen
branches going. Remember that it is extremely easy to combine multiple feature
branches into one, but splitting one large changeset into many smaller features
involves more work.
Now you need to create a pull request to the maintaners of Kaldi, so that they
can pull the change from your repository. For that, <i>your repository</i> needs
to be available online to them. And for that, you need a GitHub account.
\subsection tutorial_git_github_setup One-time GitHub setup
\li Go to <a href="https://github.com/kaldi-asr/kaldi">main Kaldi repository
page</a> and click on the Fork button. If you do not have an account, GitHub
will lead you through necessary steps.
\li <a href="https://help.github.com/articles/generating-ssh-keys/">Generate and
register an SSH key</a> with GitHub so that GitHub can identify you. Everyone
can read everything on GitHub, but only you can write to your forked repository!
\subsection pull_request Creating a pull request
Make sure your fork is registered under the name \c origin (the alias is
arbitrary, this is what we'll use here). If not, add it. The URL is listed on
your repository page under "SSH clone URL", and looks like
<tt>git@github.com:YOUR_USER_NAME/kaldi.git</tt>.
\verbatim
kkm@yupana:~/kaldi[fishfix]$ git remote -v
golden https://github.com/kaldi-asr/kaldi.git (fetch)
golden https://github.com/kaldi-asr/kaldi.git (push)
kkm@yupana:~/kaldi[fishfix]$ git remote add origin git@github.com:kkm000/kaldi.git
kkm@yupana:~/kaldi[fishfix]$ git remote -v
golden https://github.com/kaldi-asr/kaldi.git (fetch)
golden https://github.com/kaldi-asr/kaldi.git (push)
origin git@github.com:kkm000/kaldi.git (fetch)
origin git@github.com:kkm000/kaldi.git (push)
\endverbatim
Now push the branch into your fork of Kaldi:
\verbatim
kkm@yupana:~/kaldi[fishfix]$ git push origin HEAD -u
Counting objects: 632, done.
Delta compression using up to 12 threads.
Compressing objects: 100% (153/153), done.
Writing objects: 100% (415/415), 94.45 KiB | 0 bytes/s, done.
Total 415 (delta 324), reused 326 (delta 262)
To git@github.com:kkm000/kaldi.git
* [new branch] HEAD -> fishfix
Branch fishfix set up to track remote branch fishfix from origin.
\endverbatim
\c HEAD in <tt>git push</tt> tells Git "create branch in the remote repo with
the same name as the current branch", and \c -u remembers the connection between
your local branch \c fishfix and \c origin/fishfix in your repository.
Now go to your repository page and
<a href="https://help.github.com/articles/creating-a-pull-request/">create a
pull request</a>.
<a href="https://github.com/kaldi-asr/kaldi/pull/31">Examine your changes</a>,
and submit the request if everything looks good. The maintainers will receive
the request and either accept it or comment on it.
Follow the comments, commit fixes on your branch, push to \c origin again, and
GitHub will automatically update the pull request web page.
Then reply e. g. "Done" under the comments that you received, so that they know
you followed up on their comments.
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
<P>
*/
......@@ -21,7 +21,7 @@
\page tutorial_looking Kaldi tutorial: Overview of the distribution (20 minutes)
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_svn "Previous: Version control with Subversion" <BR>
\ref tutorial_git "Previous: Version control with Git" <BR>
\ref tutorial_running "Next: Running the example scripts" <BR>
Before we jump into the example scripts, let us take a few minutes to look at what
......@@ -234,7 +234,7 @@ a build process, one solution is to try modifying kaldi.mk by hand. In order to
probably understand how Kaldi makes use of external math libraries (see \ref matrixwrap).
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_svn "Previous: Version control with Subversion" <BR>
\ref tutorial_git "Previous: Version control with Git" <BR>
\ref tutorial_running "Next: Running the example scripts" <BR>
<P>
*/
......@@ -47,7 +47,7 @@
a different distribution of the RM data with a different layout.
The system requirements are fairly basic. We assume that you have tools
including wget, svn, awk, perl and so on, or that you know how to install them.
including wget, git, awk, perl and so on, or that you know how to install them.
The most difficult part of the installation process relates to the math library
ATLAS; if this is not already installed as a library on your system you will
have to compile it, and this requires that CPU throttling be turned off, which
......
......@@ -22,7 +22,7 @@
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_prereqs "Previous: Prerequisites" <BR>
\ref tutorial_svn "Next: Version control with Subversion" <BR>
\ref tutorial_git "Next: Version control with Git" <BR>
The first step is to download and install Kaldi. We will be using version 1 of
the toolkit, so that this tutorial does not get out of date. However, be aware
......@@ -32,9 +32,9 @@
"s3" scripts mentioned in this tutorial. But be aware that if you do that some
aspects of the tutorial may be out of date.
Assuming Subversion (svn) is installed, to get the latest code you can type
Assuming Git is installed, to get the latest code you can type
\verbatim
svn co svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk
git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden
\endverbatim
Then cd to kaldi-trunk. Look at the INSTALL file and follow the instructions
(it points you to two subdirectories). Look carefully at the output of the
......@@ -54,6 +54,6 @@
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_prereqs "Previous: Prerequisites" <BR>
\ref tutorial_svn "Next: Version control with Subversion" <BR>
\ref tutorial_git "Next: Version control with Git" <BR>
<P>
*/
// doc/tutorial_svn.dox
// Copyright 2009-2011 Microsoft Corporation
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
/**
\page tutorial_svn Kaldi Tutorial: Version control with Subversion (5 minutes)
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
In case you are unfamiliar with the Subversion (svn) version control system, we
give a brief overview of some commands that might be useful to you. Subversion commands
always look like: "svn [command] [arguments]"; you can do "svn help" to see what
commands are available, or "svn help <command>" for help on a specific command.
In kaldi-1 or any subdirectory, type
\verbatim
svn up
\endverbatim
(this is short for "svn update"). If we have committed changes to the repository
in the several minutes since you installed Kaldi, you should see output like
the following:
\verbatim
kaldi-1: svn update
U src/lat/Makefile
U src/nnetbin/nnet-forward.cc
Updated to revision 191.
\endverbatim
More likely, it will just say something like "At revision 191."
To see if you have made any changes to anything, type
\verbatim
svn status
\endverbatim
This will
list files that you changed or that have been added. Files that have been added
to the directories but are not under version control because you have not used the
"svn add" command, will appear with the descriptor '?' (you will see all the
binaries that were compiled). Next, edit a version-controlled file (for example,
src/Makefile; add a comment or something), and type
\verbatim
svn diff
\endverbatim
This should show how your version differs from the copy that you downlaoded.
If you are going to be
contributing to the Kaldi project (and we do welcome new contributors),
then you should become familiar with other commands such
as "svn add", "svn commit" and so on. For this, there are tutorials available
online.
\ref tutorial "Up: Kaldi tutorial" <BR>
\ref tutorial_setup "Previous: Getting started" <BR>
\ref tutorial_looking "Next: Overview of the distribution" <BR>
<P>
*/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment