SpeechBrain: A PyTorch Speech Toolkit

Key Features

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.

Speech Recognition

SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers.

Speaker Recognition

Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning

Speech Enhancement

Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.

Speech Processing

SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation that can be used on-the-fly during your experiment.

Multi Microphone Processing

Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization.

Research & Development

SpeechBrain is designed to speed-up research and development of speech technologies. It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to help newcomers using SpeechBrain.

HuggingFace!

SpeechBrain provides multiple pre-trained models that can easily be deployed with nicely designed interfaces. Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!

Why SpeechBrain?

SpeechBrain allows you to easily and quickly customize any part of your speech pipeline ranging from the data management up to the downstream task metric.
No existing speech toolkit provides such a level of accessibility.

Adapts to your needs.

SpeechBrain allows users to install either via PyPI to rapidly use the standard library or via a local install to view recipes and further explore the features of the toolkit.

Get Started Now


  # From PyPI
  pip install speechbrain

  # Local installation
  git clone https://github.com/anonymspeechbrain/speechbrain.git
  cd speechbrain
  pip install -r requirements.txt
  pip install --editable .

A single command.

Every SpeechBrain recipe relies on a YAML file that summarizes all the functions and hyperparameters of the system. A single Python script combines them to implement the desired task.

Get Started Now


  cd recipes/{dataset}/{task}/train

  # Train the model using the default recipe
  python train.py hparams/train.yaml

  # Train the model with a hyperparameter tweak
  python train.py hparams/train.yaml --learning_rate=0.1

Built for research.

SpeechBrain is designed for research and development. Hence, flexibility and transparency are core concepts to facilitate our daily work. You can define your own deep learning models, losses, training / evaluation loops, input pipeline / transformations and use them handily without overhead.

Get Started Now


  class ASR_Brain(sb.Brain):
    def compute_forward(self, batch, stage):

      # Compute features (mfcc, fbanks, etc.) on the fly
      feats = self.hparams.compute_feats(batch.wavs)

      # Improve robustness with pre-built augmentations
      feats = self.hparams.augment(feats)

      # Apply your custom model
      return self.modules.myCustomModel(feats)

SpeechBrain

Key Features

Speech Recognition

Speaker Recognition

Speech Enhancement

Speech Processing

Multi Microphone Processing

Research & Development

HuggingFace!

Why SpeechBrain?

Adapts to your needs.

SpeechBrain allows users to install either via PyPI to rapidly use the standard library or via a local install to view recipes and further explore the features of the toolkit.

A single command.

Every SpeechBrain recipe relies on a YAML file that summarizes all the functions and hyperparameters of the system. A single Python script combines them to implement the desired task.

Built for research.

SpeechBrain is designed for research and development. Hence, flexibility and transparency are core concepts to facilitate our daily work. You can define your own deep learning models, losses, training / evaluation loops, input pipeline / transformations and use them handily without overhead.

Sponsors

Collaborators