🎥 Video
DocArray supports many modalities including Video
.
This section will show you how to load and handle video data using DocArray.
Moreover, you will learn about DocArray's video-specific types, to represent your video data ranging from VideoUrl
to VideoBytes
and VideoNdArray
.
Note
This requires an av
dependency. You can install all necessary dependencies via:
Load video data
In DocArray video data is represented by a video tensor, an audio tensor, and the key frame indices.
Tip
Check out our predefined VideoDoc
to get started and play around with our video features.
First, let's define a MyVideo
class with all of those attributes and instantiate an object with a local or remote URL:
from docarray import BaseDoc
from docarray.typing import AudioNdArray, NdArray, VideoNdArray, VideoUrl
class MyVideo(BaseDoc):
url: VideoUrl
video: VideoNdArray = None
audio: AudioNdArray = None
key_frame_indices: NdArray = None
doc = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
Now you can load the video file content by simply calling .load()
on your AudioUrl
instance.
This will return a NamedTuple of a video tensor, an audio tensor, and the key frame indices:
- The video tensor is a 4-dimensional array of shape
(n_frames, height, width, channels)
.- The first dimension represents the frame id.
- The last three dimensions represent the image data of the corresponding frame.
- If the video contains audio, it will be stored as an
AudioNdArray
. - Additionally, the key frame indices will be stored. A key frame is defined as the starting point of any smooth transition.
doc.video, doc.audio, doc.key_frame_indices = doc.url.load()
assert isinstance(doc.video, VideoNdArray)
assert isinstance(doc.audio, AudioNdArray)
assert isinstance(doc.key_frame_indices, NdArray)
print(doc.video.shape)
For the given example you can infer from doc.video
's shape that the video contains 250 frames of size 176x320 in RGB mode.
Based on the overall length of the video (10 seconds), you can infer the framerate is approximately 250/10 = 25 frames per second (fps).
VideoTensor
DocArray offers several VideoTensor
s to store your data to:
If you specify the type of your tensor as one of the above, it will be cast to that automatically:
from docarray import BaseDoc
from docarray.typing import VideoTensorFlowTensor, VideoTorchTensor, VideoUrl
class MyVideo(BaseDoc):
url: VideoUrl
tf_tensor: VideoTensorFlowTensor = None
torch_tensor: VideoTorchTensor = None
doc = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
doc.tf_tensor = doc.url.load().video
doc.torch_tensor = doc.url.load().video
assert isinstance(doc.tf_tensor, VideoTensorFlowTensor)
assert isinstance(doc.torch_tensor, VideoTorchTensor)
VideoBytes
Alternatively, you can load your VideoUrl
instance to VideoBytes
, and your VideoBytes
instance to a VideoTensor
of your choice:
from docarray import BaseDoc
from docarray.typing import VideoTensor, VideoUrl, VideoBytes
class MyVideo(BaseDoc):
url: VideoUrl
bytes_: VideoBytes = None
video: VideoTensor = None
doc = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
doc.bytes_ = doc.url.load_bytes()
doc.video = doc.url.load().video
Vice versa, you can also transform a VideoTensor
to VideoBytes
:
from docarray.typing import VideoBytes
bytes_from_tensor = doc.video.to_bytes()
assert isinstance(bytes_from_tensor, VideoBytes)
Key frame extraction
A key frame is defined as the starting point of any smooth transition. Given the key frame indices, you can access selected scenes:
indices = doc.key_frame_indices
first_scene = doc.video[indices[0] : indices[1]]
assert (indices == [0, 95]).all()
assert first_scene.shape == (95, 176, 320, 3)
Or you can access the first frame of all new scenes and display them in a notebook:
from docarray.typing import ImageNdArray
from pydantic import parse_obj_as
key_frames = doc.video[doc.key_frame_indices]
for frame in key_frames:
img = parse_obj_as(ImageNdArray, frame)
img.display()
Save video to file
You can save your video tensor to a file. In the example below you save the video with a framerate of 60 fps, which results in a 4-second video, instead of the original 10-second video with a frame rate of 25 fps.
Display video in a notebook
You can play a video in a notebook from its URL as well as its tensor, by calling .display()
on either one. For the latter, you can optionally give the corresponding AudioTensor
as a parameter.
Getting started - Predefined VideoDoc
To get started and play around with your video data, DocArray provides a predefined VideoDoc
, which includes all of the previously mentioned functionalities:
class VideoDoc(BaseDoc):
url: Optional[VideoUrl]
audio: Optional[AudioDoc] = AudioDoc()
tensor: Optional[VideoTensor]
key_frame_indices: Optional[AnyTensor]
embedding: Optional[AnyEmbedding]
bytes_: Optional[bytes]
You can use this class directly or extend it to your preference:
from typing import Optional
from docarray.documents import VideoDoc
# extend it
class MyVideo(VideoDoc):
name: Optional[str]
video = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
video.name = 'My first video doc!'
video.tensor = video.url.load().video