At the heart of
DocArray lies the concept of
A BaseDoc is very similar to a Pydantic
BaseModel -- in fact it is a specialized Pydantic
BaseModel. It allows you to define custom
Document schemas (or
the Pydantic world) to represent your data.
Naming convention: When we refer to a
BaseDoc, we refer to a class that inherits from BaseDoc.
When we refer to a
Document we refer to an instance of a
Before going into detail about what we can do with BaseDoc and how to use it, let's see what it looks like in practice.
The following Python code defines a
BannerDoc class that can be used to represent the data of a website banner:
You can then instantiate a
BannerDoc object and access its attributes:
BaseDoc is a Pydantic
- Will perform data validation:
BaseDocwill check that the data you pass to it is valid. If not, it will raise an error. Data being "valid" is actually defined by the type used in the type hint itself, but we will come back to this concept later.
- Can be configured using a nested
Configclass, see Pydantic documentation for more detail on what kind of config Pydantic offers.
- Can be used as a drop-in replacement for
BaseModelin your code and is compatible with tools that use Pydantic, like FastAPI.
Representing multimodal and nested data
Let's say you want to represent a YouTube video in your application, perhaps to build a search system for YouTube videos. A YouTube video is not only composed of a video, but also has a title, description, thumbnail (and more, but let's keep it simple).
All of these elements are from different
modalities: the title and description are text, the thumbnail is an image, and the video itself is, well, a video.
DocArray lets you represent all of this multimodal data in a single object.
Let's first create a
BaseDoc for each of the elements that compose the YouTube video.
First for the thumbnail image:
Then for the video itself:
Then for the title and description (which are text) we'll just use a
All the elements that compose a YouTube video are ready:
We now have
YouTubeVideoDoc which is a pythonic representation of a YouTube video.
BaseDoc can be nested to represent any kind of data hierarchy.
Setting a Pydantic
Documents support setting a custom
configuration like any other Pydantic
Here is an example to extend the Config of a Document dependong on which version of Pydantic you are using.