Skip to content


At the heart of DocArray lies the concept of BaseDoc.

A BaseDoc is very similar to a Pydantic BaseModel -- in fact it is a specialized Pydantic BaseModel. It allows you to define custom Document schemas (or Models in the Pydantic world) to represent your data.


Naming convention: When we refer to a BaseDoc, we refer to a class that inherits from BaseDoc. When we refer to a Document we refer to an instance of a BaseDoc class.

Basic Doc usage

Before going into detail about what we can do with BaseDoc and how to use it, let's see what it looks like in practice.

The following Python code defines a BannerDoc class that can be used to represent the data of a website banner:

from docarray import BaseDoc
from docarray.typing import ImageUrl

class BannerDoc(BaseDoc):
    image_url: ImageUrl
    title: str
    description: str

You can then instantiate a BannerDoc object and access its attributes:

banner = BannerDoc(
    title='Hello World',
    description='This is a banner',

assert banner.image_url == ''
assert banner.title == 'Hello World'
assert banner.description == 'This is a banner'

BaseDoc is a Pydantic BaseModel

The BaseDoc class inherits from Pydantic BaseModel. This means you can use all the features of BaseModel in your Doc class. BaseDoc:

  • Will perform data validation: BaseDoc will check that the data you pass to it is valid. If not, it will raise an error. Data being "valid" is actually defined by the type used in the type hint itself, but we will come back to this concept later.
  • Can be configured using a nested Config class, see Pydantic documentation for more detail on what kind of config Pydantic offers.
  • Can be used as a drop-in replacement for BaseModel in your code and is compatible with tools that use Pydantic, like FastAPI.

Representing multimodal and nested data

Let's say you want to represent a YouTube video in your application, perhaps to build a search system for YouTube videos. A YouTube video is not only composed of a video, but also has a title, description, thumbnail (and more, but let's keep it simple).

All of these elements are from different modalities: the title and description are text, the thumbnail is an image, and the video itself is, well, a video.

DocArray lets you represent all of this multimodal data in a single object.

Let's first create a BaseDoc for each of the elements that compose the YouTube video.

First for the thumbnail image:

from docarray import BaseDoc
from docarray.typing import ImageUrl, ImageBytes

class ImageDoc(BaseDoc):
    url: ImageUrl
    bytes: ImageBytes = (
        None  # bytes are not always loaded in memory, so we make it optional

Then for the video itself:

from docarray import BaseDoc
from docarray.typing import VideoUrl, VideoBytes

class VideoDoc(BaseDoc):
    url: VideoUrl
    bytes: VideoBytes = (
        None  # bytes are not always loaded in memory, so we make it optional

Then for the title and description (which are text) we'll just use a str type.

All the elements that compose a YouTube video are ready:

from docarray import BaseDoc

class YouTubeVideoDoc(BaseDoc):
    title: str
    description: str
    thumbnail: ImageDoc
    video: VideoDoc

We now have YouTubeVideoDoc which is a pythonic representation of a YouTube video.

This representation can be used to send or store data. You can even use it directly to train a machine learning Pytorch model on this representation.


You see here that ImageDoc and VideoDoc are also BaseDoc, and they are later used inside another BaseDoc`. This is what we call nested data representation.

BaseDoc can be nested to represent any kind of data hierarchy.

See also:

  • The next part of the representing section
  • API reference for the BaseDoc class
  • The Storing section on how to store your data
  • The Sending section on how to send your data