Machine Learning - Speech

Job Responsibilities

Working in a high growth company.
Build, improve and extend speech models which can include speech-to-text models, text-to-speech models or speech analysis models.
Ability to understand and implement state-of-the-art academic research papers and apply novel algorithms to large volumes of real-life data.
Work closely on product delivery roadmap, taking it from development to production in collaboration with engineers, researchers, technical leads and architects.
Help the team to improve upon current methods and models.
Have a practical mindset and are able to bring these models into a production environment.

Requirement and Qualifications

Master’s degree or PhD in computer science, mathematics, engineering, computational linguistics, or work experience of minimum 2-3 years.
Provable experience in deep learning, speech processing and NLP (e.g. Kaggle competitions or spare-time projects).
Understanding of signal processing with application to speech and audio processing.
Experienced with acoustic modeling and language modeling.
Good knowledge of and experience with Python and/or C/C++.
Strong linguistic background and analytical mindset.
Practical mindset and are willing to get your hands dirty and understand the difference between fundamental research and data driven development.
Work independently and take matters into your own hands.
The ability to quickly learn new technologies and successfully implement them is essential.


Working knowledge of TensorFlow or Keras.
Having built or have been working with an automatic speech recognition (ASR) toolkit such as Kaldi or DeepSpeech is considered a strong plus.
Expertise in some of the following speech tasks: speech-to-text, text-to-speech, emotion recognition, personality recognition or speaker diarization.
Good understanding or hands on experience of speech preprocessing, noise-robust speech processing normalization techniques, speech related techniques (e.g. HMM, weighted FST, Viterbi,...).
Fluency in phonetics and making phonetic transcriptions.
JIRA or similar agile tools.
Comfortable working in a Linux environment.


