Skip to content

🔊 Audio

DocArray supports many different modalities including Audio. This section will show you how to load and handle audio data using DocArray.

Moreover, you will learn about DocArray's audio-specific types, to represent your audio data ranging from AudioUrl to AudioBytes and AudioNdArray.


This requires a pydub dependency. You can install all necessary dependencies via:

pip install "docarray[audio]"

Additionally, you have to install ffmpeg (see more info here):

# on Mac with brew:
brew install ffmpeg
# on Linux with apt-get
apt-get install ffmpeg libavcodec-extra

Load audio file

First, let's define a class which extends BaseDoc and has a url attribute of type AudioUrl, and an optional tensor attribute of type AudioTensor.


Check out our predefined AudioDoc to get started and play around with our audio features.

Next, you can instantiate an object of that class with a local or remote URL:

from docarray import BaseDoc
from docarray.typing import AudioUrl, AudioNdArray

class MyAudio(BaseDoc):
    url: AudioUrl
    tensor: AudioNdArray = None
    frame_rate: int = None

doc = MyAudio(

Loading the content of the audio file is as easy as calling .load() on the AudioUrl instance.

This will return a tuple of:

  • An AudioNdArray representing the audio file content
  • An integer representing the frame rate (number of signals for a certain period of time)
doc.tensor, doc.frame_rate = doc.url.load()
📄 MyAudio : 2015696 ...
│ Attribute            │ Value                                                 │
│ url: AudioUrl        │    │
│                      │ ... (length: 90)                                      │
│ tensor: AudioNdArray │ AudioNdArray of shape (30833,), dtype: float64        │
│ frame_rate: int      │ 44100                                                 │


DocArray offers several AudioTensors to store your data to:

If you specify the type of your tensor to one of the above, it will be cast to that automatically:

from docarray import BaseDoc
from docarray.typing import AudioTensorFlowTensor, AudioTorchTensor, AudioUrl

class MyAudio(BaseDoc):
    url: AudioUrl
    tf_tensor: AudioTensorFlowTensor = None
    torch_tensor: AudioTorchTensor = None

doc = MyAudio(

doc.tf_tensor, _ = doc.url.load()
doc.torch_tensor, _ = doc.url.load()

assert isinstance(doc.tf_tensor, AudioTensorFlowTensor)
assert isinstance(doc.torch_tensor, AudioTorchTensor)


Alternatively, you can load your AudioUrl instance to AudioBytes, and your AudioBytes instance to an AudioTensor of your choice:

from docarray import BaseDoc
from docarray.typing import AudioBytes, AudioTensor, AudioUrl

class MyAudio(BaseDoc):
    url: AudioUrl = None
    bytes_: AudioBytes = None
    tensor: AudioTensor = None

doc = MyAudio(

doc.bytes_ = doc.url.load_bytes()  # type(doc.bytes_) = AudioBytes
doc.tensor, _ = doc.bytes_.load()  # type(doc.tensor) = AudioNdarray

Vice versa, you can also transform an AudioTensor to AudioBytes:

from docarray.typing import AudioBytes

bytes_from_tensor = doc.tensor.to_bytes()

assert isinstance(bytes_from_tensor, AudioBytes)

Save audio to file

You can save your AudioTensor to an audio file of any format as follows:

tensor_reversed = doc.tensor[::-1]

Play audio in a notebook

You can play your audio sound in a notebook from its URL or tensor, by calling .display() on either one.

Play from url:


Play from tensor:


Getting started - Predefined AudioDoc

To get started and play around with your audio data, DocArray provides a predefined AudioDoc, which includes all of the previously mentioned functionalities:

class AudioDoc(BaseDoc):
    url: Optional[AudioUrl] = None
    tensor: Optional[AudioTensor] = None
    embedding: Optional[AnyEmbedding] = None
    bytes_: Optional[AudioBytes] = None
    frame_rate: Optional[int] = None

You can use this class directly or extend it to your preference:

from docarray.documents import AudioDoc
from typing import Optional

# extend AudioDoc
class MyAudio(AudioDoc):
    name: Optional[str] = None

audio = MyAudio(
) = 'My first audio doc!'
audio.tensor, audio.frame_rate = audio.url.load()