Welcome to DocArray!#

⬆️ DocArray v2: We are currently working on v2 of DocArray. Keep reading here if you are interested in the current (stable) version, or check out the v2 alpha branch and v2 roadmap!

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.

🚪 Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc.

🧑‍🔬 Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚡 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.

🔎 One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate.

👒 For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services.

🐍 Pythonic experience: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write.

🛸 IDE integration: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code.

Read more on why should you use DocArray and comparison to alternatives.


PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependencies are installed.

conda install -c conda-forge docarray

No extra dependencies are installed.

pip install "docarray[common]"

The following dependencies are installed to enable the most common features:


Used in


advanced serialization


compression in serialization


push/pull to Jina Cloud


visualizing image sprites


image data-related IO


used in embedding projector of DocumentArray


used in embedding projector of DocumentArray

pip install "docarray[full]"

In addition to common, the following dependencies are installed to enable full features:


Used in


sparse embedding, tensors


video processing and IO


3D mesh processing and IO


GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This installs all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
>>> from docarray import Document, DocumentArray

Index | Module Index