README.md 3.19 KB
Newer Older
Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
1
# linstt-offline-dispatch
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

This project aims to build a speech-to-text transcriber web service based on kaldi-offline-decoding.

## Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

The project is divided into 3 modules:
- [worker_offline] is the module in charge of the ASR (automatic speech recognition).
- [master_server] is the webserver that provide the ASR service.
- [client] is a simple client meant to transcribe an audio file. 

### Prerequisites

#### Python 2.7
This project runs on python 2.7.
In order to run the [master_server] and the [client] you will need to install those python libraries: 
- tornado>=4.5.2
- ws4py
Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
21

22 23 24 25
```
pip install ws4py 
pip install tornado
```
Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
26 27
Or

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
```
pip install -r recquirements.txt
```
within the modules/server folder.

#### Kaldi model
The ASR server that will be setup here require kaldi model, note that the model is not included in the depository.
You must have this model on your machine. You must also check that the model have the specific files bellow :
- final.alimdl
- final.mat
- final.mdl
- splice_opts
- tree
- Graph/HCLG.fst
- Graph/disambig_tid.int
- Graph/num_pdfs
- Graph/phones.txt
- Graph/words.txt
- Graph/phones/*

#### Docker
You must install docker on your machine. Refer to [docker doc](https://docs.docker.com/engine/installation)
```
apt-get install docker
yoaourt -S docker
```

### Installing
You need to build the docker image first.
Go to modules/worker_offline and build the container.
```
cd modules/worker_offline
docker build -t linagora/stt-offline .
```
## Running the tests

To run an automated test go to the test folder
``` 
cd tests
```
And run the test script: 
```
./deployement_test.sh <langageModelPath>
```
The test should display "Test succefull"
## Deployment

#### 1- Server
* Configure the server options by editing the server.conf file.
* Launch the server 
Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
78

79 80 81 82 83 84 85 86
```
./master_server.py
``` 
 
#### 2- Worker
You can launch as many workers as you want on any machine that you want.
* Configure the worker by editing the server.conf file, provide the server IP adress ans server port.
* Launch the worker using the start_docker.sh command
Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
87

88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
```
cd modules/worker_offline
./start_docker.sh <langageModelPath>
```
For example if yout model is located at ~/speech/models/mymodel
With mymodel folder containing the following files:
- final.alimdl
- final.mat
- final.mdl
- splice_opts
- tree
- graphs/

```
cd modules/worker_offline
./start_docker.sh ~/speech/models/mymodel/
```

## Built With

* [tornado](http://www.tornadoweb.org/en/stable/index.html) - The web framework used
* [ws4py](https://ws4py.readthedocs.io/en/latest/) - WebSocket interfaces for python

## Authors

Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
113 114
* **Abdelwahab Aheba** - *linstt-Offline-Decoding* - [Linagora](https://linagora.com/)
* **Rudy Baraglia** - *linstt-dispatch* - [Linagora] (https://linagora.com/)
115 116 117 118 119 120 121 122


## License

See the [LICENSE.md](LICENSE.md) file for details.

## Acknowledgments

Rudy BARAGLIA's avatar
Rudy BARAGLIA committed
123
* The project has been vastly inspired by Alumae's (https://github.com/alumae) project kaldi-gstreamer-server (https://github.com/alumae/kaldi-gstreamer-server) and use chunk of his code.