Usage Examples for the Revoize SDK in Python
Example #1: Processing a Single Audio File
In this example, we'll show you how to process a single audio file using the Revoize SDK. We'll load an audio file from disk, process it with the SDK, and save the enhanced audio to a new file.
This example focuses on processing a WAV file using the Revoize SDK in a minimal setup.
WARNING
The input and output file paths, model type, and chunk size are hardcoded in this example. You may need to modify the input file path to match the location of your audio file. The input WAV file must be recorded at 48 kHz. Audio is processed in 480-sample chunks.
Here's a general sequence diagram for this example:
Import Statements
First, we need to import the necessary modules and libraries to use the Revoize SDK and process audio files. We'll use numpy
for array operations and soundfile
for reading and writing WAV files.
import numpy as np
import soundfile as sf
import revoize_sdk
The most important line from the Revoize SDK usage perspective is:
import numpy as np
import soundfile as sf
import revoize_sdk
This ensures we can use the Revoize SDK functions init
and process
as well as the ModelType
enum.
Main Function
This example is simple, so we'll write it as a main function that can be run directly.
def main():
...
This function does not take any arguments and returns nothing. In a real-life application, you would probably want to pass a list of input arguments like the path to the input file, the path to the output file, etc., but this is out of scope for this example.
Define Some Hardcoded Values
To keep this example minimalistic, we can hardcode the paths to:
- the input WAV file
- the output WAV file
- the chunk size
# Input WAV file path
input_wav = "input.wav"
# Output WAV file path
output_wav = "output.wav"
# Chunk size (480 samples)
chunk_size = 480
Initialize the Revoize SDK
Before we can start processing audio, we need to initialize the Revoize SDK by calling the init
function with the desired model type.
# Model Type is hardcoded to Capella
revoize_sdk.init(revoize_sdk.ModelType.CAPELLA)
There are various model types available in the Revoize SDK, but for this example, we are using the CAPELLA
model, which is a lightweight discriminative model suitable for general denoising tasks.
Load the Input WAV File
Next, we need to load the input WAV file from disk. We use the soundfile
library to read the WAV file.
# Read the WAV file
audio_samples, sample_rate = sf.read(input_wav)
# Ensure the audio is mono and float32
if len(audio_samples.shape) > 1:
audio_samples = audio_samples.mean(axis=1)
audio_samples = audio_samples.astype(np.float32)
The soundfile.read()
function returns both the audio samples and the sample rate. We ensure the audio is mono by averaging channels if necessary, and convert the samples to float32
format to make them compatible with the Revoize SDK's process
function.
Process the Audio in Chunks
Now that we have the audio samples in float32
format, we can process them in chunks using the Revoize SDK. We iterate over the audio samples in chunks of chunk_size
and process each chunk using the process
function.
processed_audio = []
num_chunks = len(audio_samples) // chunk_size
for i in range(num_chunks):
start = i * chunk_size
end = start + chunk_size
chunk = audio_samples[start:end]
# Process each chunk using Revoize SDK
output_chunk = revoize_sdk.process(chunk)
processed_audio.extend(output_chunk)
The process
function takes an input audio chunk and returns the processed audio chunk. We store the processed audio chunks in a list called processed_audio
.
Save the Processed Audio to a New WAV File
Finally, we save the processed audio to a new WAV file. We convert the list of processed samples to a NumPy array and use soundfile.write()
to save it as a WAV file.
processed_audio = np.array(processed_audio, dtype=np.float32)
sf.write(output_wav, processed_audio, sample_rate)
Full Code Example
Below is the complete minimal source code example that demonstrates how to process a single audio file using the Revoize SDK.
import numpy as np
import soundfile as sf
import revoize_sdk
def main():
# -------------------------------------------------
# 1. Hardcoded parameters and initialization
# -------------------------------------------------
# Input WAV file path
input_wav = "input.wav"
# Output WAV file path
output_wav = "output.wav"
# Chunk size (480 samples)
chunk_size = 480
# Model Type is hardcoded to Capella
revoize_sdk.init(revoize_sdk.ModelType.CAPELLA)
# -------------------------------------------------
# 2. Load the input WAV file
# -------------------------------------------------
# Read the WAV file
audio_samples, sample_rate = sf.read(input_wav)
# Ensure the audio is mono and float32
if len(audio_samples.shape) > 1:
audio_samples = audio_samples.mean(axis=1)
audio_samples = audio_samples.astype(np.float32)
# -------------------------------------------------
# 3. Process the audio in chunks
# -------------------------------------------------
processed_audio = []
num_chunks = len(audio_samples) // chunk_size
for i in range(num_chunks):
start = i * chunk_size
end = start + chunk_size
chunk = audio_samples[start:end]
# Process each chunk using Revoize SDK
output_chunk = revoize_sdk.process(chunk)
processed_audio.extend(output_chunk)
# -------------------------------------------------
# 4. Save the processed audio to a new WAV file
# -------------------------------------------------
processed_audio = np.array(processed_audio, dtype=np.float32)
sf.write(output_wav, processed_audio, sample_rate)
if __name__ == "__main__":
main()
Example #2: Real-time Speech Enhancement
Coming Soon.