Skip to content

Usage Examples for the Revoize SDK in Python

← Back to Examples Overview

Example #1: Processing a Single Audio File

This example shows the core flow in Python: load a WAV file, get model params and initialize the SDK, then process the audio in chunks and save the enhanced result. Minimal setup so you can adapt it to your own pipeline.

WARNING

The input and output file paths and model name are hardcoded in this example. You may need to modify the input file path to match the location of your audio file. The input WAV file must be at the sample rate required by the chosen model (see model_params.input_sample_rate). Audio is processed in chunks whose size is given by model_params.input_chunk_size_samples (this varies by model; e.g. Capella uses 480, other models may use different sizes).

Here's a general sequence diagram for this example:


Import Statements

First, we need to import the necessary modules and libraries to use the Revoize SDK and process audio files. We'll use numpy for array operations and soundfile for reading and writing WAV files.

python
import numpy as np
import soundfile as sf
import revoize_sdk

The most important line from the Revoize SDK usage perspective is:

python
import numpy as np
import soundfile as sf
import revoize_sdk  

We use revoize_sdk.models.get_params(name) to obtain model parameters (chunk sizes and sample rates), then revoize_sdk.init(model_params) and revoize_sdk.process(chunk).


Main Function

This example is simple, so we'll write it as a main function that can be run directly.

python
def main():
    ...

This function does not take any arguments and returns nothing. In a real-life application, you would probably want to pass a list of input arguments like the path to the input file, the path to the output file, etc., but this is out of scope for this example.


Define Some Hardcoded Values

We obtain model parameters by name (e.g. "Capella"). You can use revoize_sdk.models.list_names() to see available names. The parameters define the required input/output chunk sizes and sample rates.

python
    # Input WAV file path
    input_wav = "input.wav"
    # Output WAV file path
    output_wav = "output.wav"
    # Get model parameters by name
    model_params = revoize_sdk.models.get_params("Capella")
    chunk_size = model_params.input_chunk_size_samples

Initialize the Revoize SDK

Before we can start processing audio, we initialize the SDK with the model parameters.

python
    revoize_sdk.init(model_params)

The input WAV file should be at model_params.input_sample_rate (e.g. 48000 Hz for Capella). Output will have model_params.output_chunk_size_samples per chunk at model_params.output_sample_rate.


Load the Input WAV File

Next, we need to load the input WAV file from disk. We use the soundfile library to read the WAV file.

python
    # Read the WAV file
    audio_samples, sample_rate = sf.read(input_wav)

    # Ensure the audio is mono and float32
    if len(audio_samples.shape) > 1:
        audio_samples = audio_samples.mean(axis=1)
    audio_samples = audio_samples.astype(np.float32)

The soundfile.read() function returns both the audio samples and the sample rate. We ensure the audio is mono by averaging channels if necessary, and convert the samples to float32 format to make them compatible with the Revoize SDK's process function.


Process the Audio in Chunks

Now that we have the audio samples in float32 format, we can process them in chunks using the Revoize SDK. We iterate over the audio samples in chunks of chunk_size and process each chunk using the process function.

python
    processed_audio = []
    num_chunks = len(audio_samples) // chunk_size

    for i in range(num_chunks):
        start = i * chunk_size
        end = start + chunk_size
        chunk = audio_samples[start:end]
        # Process each chunk using Revoize SDK
        output_chunk = revoize_sdk.process(chunk)
        processed_audio.extend(output_chunk)

The process function takes an input audio chunk and returns the processed audio chunk. We store the processed audio chunks in a list called processed_audio.


Save the Processed Audio to a New WAV File

Finally, we save the processed audio to a new WAV file. We convert the list of processed samples to a NumPy array and use soundfile.write() to save it as a WAV file.

python
    processed_audio = np.array(processed_audio, dtype=np.float32)
    sf.write(output_wav, processed_audio, model_params.output_sample_rate)

Full Code Example

Below is the complete minimal source code example that demonstrates how to process a single audio file using the Revoize SDK.

python
import numpy as np
import soundfile as sf
import revoize_sdk

def main():
    # -------------------------------------------------
    # 1. Get model parameters and initialize
    # -------------------------------------------------
    input_wav = "input.wav"
    output_wav = "output.wav"
    model_params = revoize_sdk.models.get_params("Capella")
    chunk_size = model_params.input_chunk_size_samples
    revoize_sdk.init(model_params)

    # -------------------------------------------------
    # 2. Load the input WAV file
    # -------------------------------------------------
    # Read the WAV file
    audio_samples, sample_rate = sf.read(input_wav)

    # Ensure the audio is mono and float32
    if len(audio_samples.shape) > 1:
        audio_samples = audio_samples.mean(axis=1)
    audio_samples = audio_samples.astype(np.float32)

    # -------------------------------------------------
    # 3. Process the audio in chunks
    # -------------------------------------------------
    processed_audio = []
    num_chunks = len(audio_samples) // chunk_size

    for i in range(num_chunks):
        start = i * chunk_size
        end = start + chunk_size
        chunk = audio_samples[start:end]
        # Process each chunk using Revoize SDK
        output_chunk = revoize_sdk.process(chunk)
        processed_audio.extend(output_chunk)

    # -------------------------------------------------
    # 4. Save the processed audio to a new WAV file
    # -------------------------------------------------
    processed_audio = np.array(processed_audio, dtype=np.float32)
    sf.write(output_wav, processed_audio, model_params.output_sample_rate)

if __name__ == "__main__":
    main()

Example #2: Real-time Speech Enhancement

Coming Soon.