Introduction

In the previous sections we saw how to use BaseDoc, DocList and DocVec to represent multimodal data and send it over the wire. In this section we will see how to store and persist this data.

DocArray offers two ways of storing your data, each of which have their own documentation sections:

Document Store for simple long-term storage
Document Index for fast retrieval using vector similarity

Document Store

DocList can be persisted using the .push() and .pull() methods. Under the hood, DocStore is used to persist a DocList. You can either store your documents on-disk or upload them to AWS S3 or minio.

This section covers the following three topics:

Storing BaseDoc, DocList and DocVec on-disk
Storing on S3

Document Index

A Document Index lets you store your documents and search through them using vector similarity.

This is useful if you want to store a bunch of data, and at a later point retrieve documents that are similar to a query that you provide. Relevant concrete examples are neural search applications, augmenting LLMs and chatbots with domain knowledge (Retrieval-Augmented Generation)]), or recommender systems.

DocArray's Document Index concept achieves this by providing a unified interface to a number of vector databases. In fact, you can think of Document Index as an ORM for vector databases.

Currently, DocArray supports the following vector indexes. Some of them wrap vector databases (Weaviate, Qdrant, ElasticSearch) and act as a client for them, while others use a vector search library locally (HNSWLib, Exact NN search):

Weaviate | Docs
Qdrant | Docs
Elasticsearch v7 and v8 | Docs
Redis | Docs
Milvus | Docs
Hnswlib | Docs
InMemoryExactNNSearch | Docs