SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers.
Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning
Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.
SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation that can be used on-the-fly during your experiment.
Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization.
SpeechBrain is designed to speed-up research and development of speech technologies. It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to help newcomers using SpeechBrain.
SpeechBrain provides multiple pre-trained models that can easily be deployed with nicely designed interfaces. Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!
SpeechBrain allows you to easily and quickly customize any part of your speech pipeline
ranging from the data management up to the downstream task metric.
No existing speech toolkit provides such a level of accessibility.
# From PyPI
pip install speechbrain
# Local installation
git clone https://github.com/anonymspeechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
cd recipes/{dataset}/{task}/train
# Train the model using the default recipe
python train.py hparams/train.yaml
# Train the model with a hyperparameter tweak
python train.py hparams/train.yaml --learning_rate=0.1
class ASR_Brain(sb.Brain):
def compute_forward(self, batch, stage):
# Compute features (mfcc, fbanks, etc.) on the fly
feats = self.hparams.compute_feats(batch.wavs)
# Improve robustness with pre-built augmentations
feats = self.hparams.augment(feats)
# Apply your custom model
return self.modules.myCustomModel(feats)
SpeechBrain thanks its generous sponsors. Sponsoring allows us to expand the team and further extend the functionalities of the toolkit.