🔊 Audio
DocArray supports many different modalities including Audio
.
This section will show you how to load and handle audio data using DocArray.
Moreover, you will learn about DocArray's audio-specific types, to represent your audio data ranging from AudioUrl
to AudioBytes
and AudioNdArray
.
Note
This requires a pydub
dependency. You can install all necessary dependencies via:
Additionally, you have to install ffmpeg
(see more info here):
Load audio file
First, let's define a class which extends BaseDoc
and has a url
attribute of type AudioUrl
, and an optional tensor
attribute of type AudioTensor
.
Tip
Check out our predefined AudioDoc
to get started and play around with our audio features.
Next, you can instantiate an object of that class with a local or remote URL:
from docarray import BaseDoc
from docarray.typing import AudioUrl, AudioNdArray
class MyAudio(BaseDoc):
url: AudioUrl
tensor: AudioNdArray = None
frame_rate: int = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
Loading the content of the audio file is as easy as calling .load()
on the AudioUrl
instance.
This will return a tuple of:
- An
AudioNdArray
representing the audio file content - An integer representing the frame rate (number of signals for a certain period of time)
Output
📄 MyAudio : 2015696 ...
╭──────────────────────┬───────────────────────────────────────────────────────╮
│ Attribute │ Value │
├──────────────────────┼───────────────────────────────────────────────────────┤
│ url: AudioUrl │ https://github.com/docarray/docarray/blob/main/tes │
│ │ ... (length: 90) │
│ tensor: AudioNdArray │ AudioNdArray of shape (30833,), dtype: float64 │
│ frame_rate: int │ 44100 │
╰──────────────────────┴───────────────────────────────────────────────────────╯
AudioTensor
DocArray offers several AudioTensor
s to store your data to:
If you specify the type of your tensor to one of the above, it will be cast to that automatically:
from docarray import BaseDoc
from docarray.typing import AudioTensorFlowTensor, AudioTorchTensor, AudioUrl
class MyAudio(BaseDoc):
url: AudioUrl
tf_tensor: AudioTensorFlowTensor = None
torch_tensor: AudioTorchTensor = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
doc.tf_tensor, _ = doc.url.load()
doc.torch_tensor, _ = doc.url.load()
assert isinstance(doc.tf_tensor, AudioTensorFlowTensor)
assert isinstance(doc.torch_tensor, AudioTorchTensor)
AudioBytes
Alternatively, you can load your AudioUrl
instance to AudioBytes
, and your AudioBytes
instance to an AudioTensor
of your choice:
from docarray import BaseDoc
from docarray.typing import AudioBytes, AudioTensor, AudioUrl
class MyAudio(BaseDoc):
url: AudioUrl = None
bytes_: AudioBytes = None
tensor: AudioTensor = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
doc.bytes_ = doc.url.load_bytes() # type(doc.bytes_) = AudioBytes
doc.tensor, _ = doc.bytes_.load() # type(doc.tensor) = AudioNdarray
Vice versa, you can also transform an AudioTensor
to AudioBytes
:
from docarray.typing import AudioBytes
bytes_from_tensor = doc.tensor.to_bytes()
assert isinstance(bytes_from_tensor, AudioBytes)
Save audio to file
You can save your AudioTensor
to an audio file of any format as follows:
Play audio in a notebook
You can play your audio sound in a notebook from its URL or tensor, by calling .display()
on either one.
Play from url
:
Play from tensor
:
Getting started - Predefined AudioDoc
To get started and play around with your audio data, DocArray provides a predefined AudioDoc
, which includes all of the previously mentioned functionalities:
class AudioDoc(BaseDoc):
url: Optional[AudioUrl]
tensor: Optional[AudioTensor]
embedding: Optional[AnyEmbedding]
bytes_: Optional[AudioBytes]
frame_rate: Optional[int]
You can use this class directly or extend it to your preference:
from docarray.documents import AudioDoc
from typing import Optional
# extend AudioDoc
class MyAudio(AudioDoc):
name: Optional[str]
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
audio.name = 'My first audio doc!'
audio.tensor, audio.frame_rate = audio.url.load()