Commit e7c927e3 authored by Rudy BARAGLIA's avatar Rudy BARAGLIA

Add proper README.md and reorganize repository

parent eba9be52
Copyright (c) 2014, alumae
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
# Project Title
This project aims to build a speech-to-text transcriber web service based on kaldi-offline-decoding.
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
The project is divided into 3 modules:
- [worker_offline] is the module in charge of the ASR (automatic speech recognition).
- [master_server] is the webserver that provide the ASR service.
- [client] is a simple client meant to transcribe an audio file.
### Prerequisites
#### Python 2.7
This project runs on python 2.7.
In order to run the [master_server] and the [client] you will need to install those python libraries:
- tornado>=4.5.2
- ws4py
```
pip install ws4py
pip install tornado
```
Or
```
pip install -r recquirements.txt
```
within the modules/server folder.
#### Kaldi model
The ASR server that will be setup here require kaldi model, note that the model is not included in the depository.
You must have this model on your machine. You must also check that the model have the specific files bellow :
- final.alimdl
- final.mat
- final.mdl
- splice_opts
- tree
- Graph/HCLG.fst
- Graph/disambig_tid.int
- Graph/num_pdfs
- Graph/phones.txt
- Graph/words.txt
- Graph/phones/*
#### Docker
You must install docker on your machine. Refer to [docker doc](https://docs.docker.com/engine/installation)
```
apt-get install docker
yoaourt -S docker
```
### Installing
You need to build the docker image first.
Go to modules/worker_offline and build the container.
```
cd modules/worker_offline
docker build -t linagora/stt-offline .
```
## Running the tests
To run an automated test go to the test folder
```
cd tests
```
And run the test script:
```
./deployement_test.sh <langageModelPath>
```
The test should display "Test succefull"
## Deployment
#### 1- Server
* Configure the server options by editing the server.conf file.
* Launch the server
```
./master_server.py
```
#### 2- Worker
You can launch as many workers as you want on any machine that you want.
* Configure the worker by editing the server.conf file, provide the server IP adress ans server port.
* Launch the worker using the start_docker.sh command
```
cd modules/worker_offline
./start_docker.sh <langageModelPath>
```
For example if yout model is located at ~/speech/models/mymodel
With mymodel folder containing the following files:
- final.alimdl
- final.mat
- final.mdl
- splice_opts
- tree
- graphs/
```
cd modules/worker_offline
./start_docker.sh ~/speech/models/mymodel/
```
## Built With
* [tornado](http://www.tornadoweb.org/en/stable/index.html) - The web framework used
* [ws4py](https://ws4py.readthedocs.io/en/latest/) - WebSocket interfaces for python
## Authors
* **Abdelwahab Aheba** - *linstt-Offline-Decoding* - [Linagora]()
* **Rudy Baraglia** - *linstt-dispatch*-[Linagora] ()
## License
See the [LICENSE.md](LICENSE.md) file for details.
## Acknowledgments
* The project has been vastly inspired by Alumae's (https://github.com/alumae) projet kaldi-gstreamer-server (https://github.com/alumae/kaldi-gstreamer-server) and use chunk of his code.
Speech-to-Text Offline Decoding
--------
This project aims to build an automatic process for speech recognition from audio file (offline mode) using:
- Speaker Diarization: for speech activity detection, speech segmentation and speaker identification
- Fmllr decoding: for using speaker information to adapt acoustical model
DockerFile for LinSTT Service
--------
Dockerfile for [Offline-LinSTT](https://ci.linagora.com/aheba/offline-decoding).
This dockerfile automatically builds offline Speech-to-Text server using [Kaldi](kaldi-asr.org/doc/about.html)
Using this project, you will be able to run an offline Automatic Speech Recognition (ASR) server in a few minutes.
Attention
--------
The ASR server that will be setup here require kaldi model, In the docker image that I will detail below, there is no kaldi model included.
You must have this model on your machine. You must also check that the model have the specific files bellow :
- final.alimdl
- final.mat
- final.mdl
- splice_opts
- tree
- Graph/HCLG.fst
- Graph/disambig_tid.int
- Graph/num_pdfs
- Graph/phones.txt
- Graph/words.txt
- Graph/phones/*
Install docker
---------
Please, refer to [docker doc](https://docs.docker.com/engine/installation).
Get the image
---------
Currently, the image docker is about (4GB) and based on debian8, the image docker has not yet pulled on DockerHub.
You need to build your own image:
```
docker build -t linagora/stt-offline .
```
How to use
----------
`start_docker.sh` allow to build and create the container assuming that your kaldi model is located at `<Path_model>`
```
./start_docker.sh <Path_model> <Port>
```
The `<Port>` param publish a container's port to the host, you should use POST method to send wav file to the server for transcription.
Run Example
----------
Simple call using curl:
```
curl -F "wav_file=@<wav_path>" http://<IP:PORT_service>/upload > <output_trans>
```
The attribut `wav_file` is needed to submit the wav file to the server using POST Method
Client script is available and allow to connect to the server located at `http://localhost:<Port>/upload`
```
./client/client <wav_path> <IP_server>:<POST> <Output>
```
\ No newline at end of file
# Run example
# Args: $1=<Path_wav> $2=<Ip:port_LinSTT_service> $3=<Output_dir>
wav=$1
IP_service=$2
curl -F "wav_file=@$wav" http://$IP_service/upload > $3
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Thu Jan 4 11:10:18 2018
@author: rbaraglia
"""
import requests
import json
import logging
SERVER_ADRESS = u"http://localhost"
SERVER_PORT = u":8888"
SERVER_REQUEST_PATH = u"/client/post/speech"
def main():
with open('../linSTT-dispatch/tests/mocanu-Samy.wav', 'rb') as f:
r = requests.post(SERVER_ADRESS+SERVER_PORT+SERVER_REQUEST_PATH, files={'wavFile': f})
print(type(r))
print(r.headers)
print(r.status_code)
if __name__ == '__main__':
main()
\ No newline at end of file
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Thu Jan 4 11:10:18 2018
@author: rbaraglia
"""
import requests
import json
import logging
import argparse
SERVER_IP = u"localhost"
SERVER_PORT = u"8888"
SERVER_TARGET = u"/client/post/speech"
def main():
parser = argparse.ArgumentParser(description='Client for linstt-dispatch')
parser.add_argument('-u', '--uri', default="http://"+SERVER_IP+":"+SERVER_PORT+SERVER_TARGET, dest="uri", help="Server adress")
parser.add_argument('audioFile', help="The .wav file to be transcripted" )
args = parser.parse_args()
with open(args.audioFile, 'rb') as f:
print("Sendind request to transcribe file %s to server at %s" % (args.audioFile, "http://"+SERVER_IP+":"+SERVER_PORT+SERVER_TARGET))
r = requests.post(args.uri, files={'wavFile': f})
print(type(r))
print(r.headers)
print(r.status_code)
if __name__ == '__main__':
main()
\ No newline at end of file
......@@ -25,6 +25,7 @@ server_settings.read('server.cfg')
SERVER_PORT = server_settings.get('server_params', 'listening_port')
TEMP_FILE_PATH = server_settings.get('machine_params', 'temp_file_location')
KEEP_TEMP_FILE = True if server_settings.get('server_params', 'keep_temp_files') == 'true' else False
LOGGING_LEVEL = logging.DEBUG if server_settings.get('server_params', 'debug') == 'true' else logging.INFO
class Application(tornado.web.Application):
......@@ -40,6 +41,7 @@ class Application(tornado.web.Application):
handlers = [
(r"/", MainHandler),
(r"/client/post/speech", DecodeRequestHandler),
(r"/upload", DecodeRequestHandler),
(r"/worker/ws/speech", WorkerWebSocketHandler)
]
tornado.web.Application.__init__(self, handlers, **settings)
......@@ -47,6 +49,7 @@ class Application(tornado.web.Application):
self.waiting_client = set()
self.num_requests_processed = 0
#TODO: Abort request when the client is waiting for a determined amount of time
def check_waiting_clients(self):
if len(self.waiting_client) > 0:
try:
......@@ -56,6 +59,11 @@ class Application(tornado.web.Application):
else:
client.waitWorker.notify()
def display_server_status(self):
logging.info('#'*50)
logging.info("Available workers: %s" % str(len(self.available_workers)))
logging.info("Waiting clients: %s" % str(len(self.waiting_client)))
logging.info("Requests processed: %s" % str(self.num_requests_processed))
# Return le README
......@@ -120,12 +128,12 @@ class DecodeRequestHandler(tornado.web.RequestHandler):
except:
self.worker = None
self.application.waiting_client.add(self)
logging.debug("Awaiting client: %s" % str(len(self.application.waiting_client)))
self.application.display_server_status()
yield self.waitWorker.wait()
else:
self.worker.client_handler = self
logging.debug("Worker allocated to client %s" % self.uuid)
logging.debug("Available workers: " + str(len(self.application.available_workers)))
self.application.display_server_status()
......@@ -134,6 +142,7 @@ class DecodeRequestHandler(tornado.web.RequestHandler):
logging.debug("Forwarding transcription to client")
self.add_header('result', message)
self.set_status(200, "Transcription succeded")
self.application.num_requests_processed += 1
self.waitResponse.notify()
......@@ -151,7 +160,7 @@ class WorkerWebSocketHandler(tornado.websocket.WebSocketHandler):
self.application.available_workers.add(self)
self.application.check_waiting_clients()
logging.debug("Worker connected")
logging.debug("Available workers: " + str(len(self.application.available_workers)))
self.application.display_server_status()
def on_message(self, message):
try:
......@@ -164,7 +173,7 @@ class WorkerWebSocketHandler(tornado.websocket.WebSocketHandler):
self.client_handler.receive_response(json.dumps({'transcript':json_msg['transcription']}))
self.client_handler = None
self.application.available_workers.add(self)
logging.debug("WORKER Available workers: " + str(len(self.application.available_workers)))
self.application.display_server_status()
self.application.check_waiting_clients()
elif 'error' in json_msg.keys():
......@@ -177,11 +186,11 @@ class WorkerWebSocketHandler(tornado.websocket.WebSocketHandler):
self.client_handler.send_error("Worker closed")
logging.debug("WORKER WebSocket closed")
self.application.available_workers.discard(self)
logging.debug("WORKER Available workers: " + str(len(self.application.available_workers)))
self.application.display_server_status()
def main():
logging.basicConfig(level=logging.DEBUG, format="%(levelname)8s %(asctime)s %(message)s ")
logging.basicConfig(level=LOGGING_LEVEL, format="%(levelname)8s %(asctime)s %(message)s ")
#Check if the temp_file repository exist
if not os.path.isdir(TEMP_FILE_PATH):
os.mkdir(TEMP_FILE_PATH)
......
ws4py
configparser
\ No newline at end of file
......@@ -2,6 +2,7 @@
listening_port : 8888
keep_temp_files : false
max_waiting_time : 10
debug : false
[machine_params]
temp_file_location : ../linSTT-dispatch/tests/temp_file/
\ No newline at end of file
temp_file_location : ./temp_files/
\ No newline at end of file
......@@ -40,7 +40,7 @@ RUN mkdir -p $BASE_DIR
WORKDIR $BASE_DIR
# Install Flask
# Install tornado
COPY requirements.txt .
RUN pip install -r requirements.txt
......@@ -49,4 +49,4 @@ COPY . .
RUN ./deploy-offline-decoding.sh /opt/kaldi /opt/lium_spkdiarization-8.4.1.jar /opt/models
# Set the default command
CMD ./worker.py
\ No newline at end of file
CMD ./worker_offline.py
ws4py
configparser
\ No newline at end of file
......@@ -41,8 +41,7 @@ class WorkerWebSocket(WebSocketClient):
except:
logging.debug("Message received: %s" % str(m))
else:
if 'uuid' in json_msg.keys(): #Receive the file path to process
if 'uuid' in json_msg.keys():
self.client_uuid = json_msg['uuid']
self.fileName = self.client_uuid.replace('-', '')
self.file = json_msg['file'].decode('base64')
......@@ -54,20 +53,23 @@ class WorkerWebSocket(WebSocketClient):
if PREPROCESSING:
pass
# Offline decoder call
logging.debug(os.listdir('./wavs'))
logging.debug(DECODER_COMMAND + ' ' + TEMP_FILE_PATH + self.fileName+'.wav')
subprocess.call("cd scripts; ./decode.sh ../systems/models "+self.fileName+".wav", shell=True)
# Delete temporary files
# Check
# Check result
if os.path.isfile('trans/decode_'+self.fileName+'.log'):
with open('trans/decode_'+self.fileName+'.log', 'r') as resultFile:
result = resultFile.read()
logging.debug("Transcription is: %s" % result)
self.send_result(result)
else:
logging.error("Worker Failed to create transcription file")
self.send_error("File was not created by worker")
# Delete temporary files
for file in os.listdir(TEMP_FILE_PATH):
os.remove(TEMP_FILE_PATH+file)
def post(self, m):
logging.debug('POST received')
......@@ -75,8 +77,8 @@ class WorkerWebSocket(WebSocketClient):
def send_result(self, result=None):
msg = json.dumps({u'uuid': self.client_uuid, u'transcription':result, u'trust_ind':u"0.1235"})
self.client_uuid = None
# TODO cleanup temp files.
self.send(msg)
def send_error(self, message):
msg = json.dumps({u'uuid': self.client_uuid, u'error':message})
self.send(msg)
......@@ -91,19 +93,18 @@ class WorkerWebSocket(WebSocketClient):
def main():
parser = argparse.ArgumentParser(description='Worker for linstt-dispatch')
parser.add_argument('-u', '--uri', default="ws://"+SERVER_IP+":"+SERVER_PORT+SERVER_TARGET, dest="uri", help="Server<-->worker websocket URI")
parser.add_argument('-f', '--fork', default=1, dest="fork", type=int)
args = parser.parse_args()
#thread.start_new_thread(loop.run, ())
if not os.path.isdir(TEMP_FILE_PATH):
os.mkdir(TEMP_FILE_PATH)
print('#'*50)
logging.basicConfig(level=logging.DEBUG, format="%(levelname)8s %(asctime)s %(message)s ")
logging.debug('Starting up worker')
logging.info('Starting up worker')
ws = WorkerWebSocket(args.uri)
try:
ws.connect()
logging.info("Worker succefully connected to server at %s:%s" % (SERVER_IP, SERVER_PORT))
ws.run_forever()
except KeyboardInterrupt:
ws.close()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment