Usage Examples for the Revoize SDK in Python
Example #1: Processing a Single Audio File
This example shows the core flow in Python: load a WAV file, get model params and initialize the SDK, then process the audio in chunks and save the enhanced result. Minimal setup so you can adapt it to your own pipeline.
WARNING
The input and output file paths and model name are hardcoded in this example. You may need to modify the input file path to match the location of your audio file. The input WAV file must be at the sample rate required by the chosen model (see model_params.input_sample_rate). Audio is processed in chunks whose size is given by model_params.input_chunk_size_samples (this varies by model; e.g. Capella uses 480, other models may use different sizes).
Here's a general sequence diagram for this example:
Import Statements
First, we need to import the necessary modules and libraries to use the Revoize SDK and process audio files. We'll use numpy for array operations and soundfile for reading and writing WAV files.
import numpy as np
import soundfile as sf
import revoize_sdkThe most important line from the Revoize SDK usage perspective is:
import numpy as np
import soundfile as sf
import revoize_sdk We use revoize_sdk.models.get_params(name) to obtain model parameters (chunk sizes and sample rates), then revoize_sdk.init(model_params) and revoize_sdk.process(chunk).
Main Function
This example is simple, so we'll write it as a main function that can be run directly.
def main():
...This function does not take any arguments and returns nothing. In a real-life application, you would probably want to pass a list of input arguments like the path to the input file, the path to the output file, etc., but this is out of scope for this example.
Define Some Hardcoded Values
We obtain model parameters by name (e.g. "Capella"). You can use revoize_sdk.models.list_names() to see available names. The parameters define the required input/output chunk sizes and sample rates.
# Input WAV file path
input_wav = "input.wav"
# Output WAV file path
output_wav = "output.wav"
# Get model parameters by name
model_params = revoize_sdk.models.get_params("Capella")
chunk_size = model_params.input_chunk_size_samplesInitialize the Revoize SDK
Before we can start processing audio, we initialize the SDK with the model parameters.
revoize_sdk.init(model_params)The input WAV file should be at model_params.input_sample_rate (e.g. 48000 Hz for Capella). Output will have model_params.output_chunk_size_samples per chunk at model_params.output_sample_rate.
Load the Input WAV File
Next, we need to load the input WAV file from disk. We use the soundfile library to read the WAV file.
# Read the WAV file
audio_samples, sample_rate = sf.read(input_wav)
# Ensure the audio is mono and float32
if len(audio_samples.shape) > 1:
audio_samples = audio_samples.mean(axis=1)
audio_samples = audio_samples.astype(np.float32)The soundfile.read() function returns both the audio samples and the sample rate. We ensure the audio is mono by averaging channels if necessary, and convert the samples to float32 format to make them compatible with the Revoize SDK's process function.
Process the Audio in Chunks
Now that we have the audio samples in float32 format, we can process them in chunks using the Revoize SDK. We iterate over the audio samples in chunks of chunk_size and process each chunk using the process function.
processed_audio = []
num_chunks = len(audio_samples) // chunk_size
for i in range(num_chunks):
start = i * chunk_size
end = start + chunk_size
chunk = audio_samples[start:end]
# Process each chunk using Revoize SDK
output_chunk = revoize_sdk.process(chunk)
processed_audio.extend(output_chunk)The process function takes an input audio chunk and returns the processed audio chunk. We store the processed audio chunks in a list called processed_audio.
Save the Processed Audio to a New WAV File
Finally, we save the processed audio to a new WAV file. We convert the list of processed samples to a NumPy array and use soundfile.write() to save it as a WAV file.
processed_audio = np.array(processed_audio, dtype=np.float32)
sf.write(output_wav, processed_audio, model_params.output_sample_rate)Full Code Example
Below is the complete minimal source code example that demonstrates how to process a single audio file using the Revoize SDK.
import numpy as np
import soundfile as sf
import revoize_sdk
def main():
# -------------------------------------------------
# 1. Get model parameters and initialize
# -------------------------------------------------
input_wav = "input.wav"
output_wav = "output.wav"
model_params = revoize_sdk.models.get_params("Capella")
chunk_size = model_params.input_chunk_size_samples
revoize_sdk.init(model_params)
# -------------------------------------------------
# 2. Load the input WAV file
# -------------------------------------------------
# Read the WAV file
audio_samples, sample_rate = sf.read(input_wav)
# Ensure the audio is mono and float32
if len(audio_samples.shape) > 1:
audio_samples = audio_samples.mean(axis=1)
audio_samples = audio_samples.astype(np.float32)
# -------------------------------------------------
# 3. Process the audio in chunks
# -------------------------------------------------
processed_audio = []
num_chunks = len(audio_samples) // chunk_size
for i in range(num_chunks):
start = i * chunk_size
end = start + chunk_size
chunk = audio_samples[start:end]
# Process each chunk using Revoize SDK
output_chunk = revoize_sdk.process(chunk)
processed_audio.extend(output_chunk)
# -------------------------------------------------
# 4. Save the processed audio to a new WAV file
# -------------------------------------------------
processed_audio = np.array(processed_audio, dtype=np.float32)
sf.write(output_wav, processed_audio, model_params.output_sample_rate)
if __name__ == "__main__":
main()Example #2: Real-time Speech Enhancement
Coming Soon.