Skip to content

🎥 Video

DocArray supports many modalities including Video. This section will show you how to load and handle video data using DocArray.

Moreover, you will learn about DocArray's video-specific types, to represent your video data ranging from VideoUrl to VideoBytes and VideoNdArray.


This requires an av dependency. You can install all necessary dependencies via:

pip install "docarray[video]"

Load video data

In DocArray video data is represented by a video tensor, an audio tensor, and the key frame indices.


Check out our predefined VideoDoc to get started and play around with our video features.

First, let's define a MyVideo class with all of those attributes and instantiate an object with a local or remote URL:

from docarray import BaseDoc
from docarray.typing import AudioNdArray, NdArray, VideoNdArray, VideoUrl

class MyVideo(BaseDoc):
    url: VideoUrl
    video: VideoNdArray = None
    audio: AudioNdArray = None
    key_frame_indices: NdArray = None

doc = MyVideo(

Now you can load the video file content by simply calling .load() on your AudioUrl instance. This will return a NamedTuple of a video tensor, an audio tensor, and the key frame indices:

  • The video tensor is a 4-dimensional array of shape (n_frames, height, width, channels).
    • The first dimension represents the frame id.
    • The last three dimensions represent the image data of the corresponding frame.
  • If the video contains audio, it will be stored as an AudioNdArray.
  • Additionally, the key frame indices will be stored. A key frame is defined as the starting point of any smooth transition.,, doc.key_frame_indices = doc.url.load()

assert isinstance(, VideoNdArray)
assert isinstance(, AudioNdArray)
assert isinstance(doc.key_frame_indices, NdArray)

(250, 176, 320, 3)

For the given example you can infer from's shape that the video contains 250 frames of size 176x320 in RGB mode. Based on the overall length of the video (10 seconds), you can infer the framerate is approximately 250/10 = 25 frames per second (fps).


DocArray offers several VideoTensors to store your data to:

If you specify the type of your tensor as one of the above, it will be cast to that automatically:

from docarray import BaseDoc
from docarray.typing import VideoTensorFlowTensor, VideoTorchTensor, VideoUrl

class MyVideo(BaseDoc):
    url: VideoUrl
    tf_tensor: VideoTensorFlowTensor = None
    torch_tensor: VideoTorchTensor = None

doc = MyVideo(

doc.tf_tensor = doc.url.load().video
doc.torch_tensor = doc.url.load().video

assert isinstance(doc.tf_tensor, VideoTensorFlowTensor)
assert isinstance(doc.torch_tensor, VideoTorchTensor)


Alternatively, you can load your VideoUrl instance to VideoBytes, and your VideoBytes instance to a VideoTensor of your choice:

from docarray import BaseDoc
from docarray.typing import VideoTensor, VideoUrl, VideoBytes

class MyVideo(BaseDoc):
    url: VideoUrl
    bytes_: VideoBytes = None
    video: VideoTensor = None

doc = MyVideo(

doc.bytes_ = doc.url.load_bytes() = doc.url.load().video

Vice versa, you can also transform a VideoTensor to VideoBytes:

from docarray.typing import VideoBytes

bytes_from_tensor =

assert isinstance(bytes_from_tensor, VideoBytes)

Key frame extraction

A key frame is defined as the starting point of any smooth transition. Given the key frame indices, you can access selected scenes:

indices = doc.key_frame_indices
first_scene =[indices[0] : indices[1]]

assert (indices == [0, 95]).all()
assert first_scene.shape == (95, 176, 320, 3)

Or you can access the first frame of all new scenes and display them in a notebook:

from docarray.typing import ImageNdArray
from pydantic import parse_obj_as

key_frames =[doc.key_frame_indices]
for frame in key_frames:
    img = parse_obj_as(ImageNdArray, frame)

Save video to file

You can save your video tensor to a file. In the example below you save the video with a framerate of 60 fps, which results in a 4-second video, instead of the original 10-second video with a frame rate of 25 fps.

Display video in a notebook

You can play a video in a notebook from its URL as well as its tensor, by calling .display() on either one. For the latter, you can optionally give the corresponding AudioTensor as a parameter.

doc_fast = MyAudio(url="/path/my_video.mp4")

Getting started - Predefined VideoDoc

To get started and play around with your video data, DocArray provides a predefined VideoDoc, which includes all of the previously mentioned functionalities:

class VideoDoc(BaseDoc):
    url: Optional[VideoUrl] = None
    audio: Optional[AudioDoc] = AudioDoc()
    tensor: Optional[VideoTensor] = None
    key_frame_indices: Optional[AnyTensor] = None
    embedding: Optional[AnyEmbedding] = None
    bytes_: Optional[bytes] = None

You can use this class directly or extend it to your preference:

from typing import Optional

from docarray.documents import VideoDoc

# extend it
class MyVideo(VideoDoc):
    name: Optional[str] = None

video = MyVideo(
) = 'My first video doc!'
video.tensor = video.url.load().video