Documents
docarray.documents
AudioDoc
Bases: BaseDoc
Document for handling audios.
The Audio Document can contain:
- an
AudioUrl
(AudioDoc.url
) - an
AudioTensor
(AudioDoc.tensor
) - an
AnyEmbedding
(AudioDoc.embedding
) - an
AudioBytes
(AudioDoc.bytes_
) object - an integer representing the frame_rate (
AudioDoc.frame_rate
)
You can use this Document directly:
from docarray.documents import AudioDoc
# use it directly
audio = AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can extend this Document:
from docarray.documents import AudioDoc, TextDoc
from typing import Optional
# extend it
class MyAudio(AudioDoc):
name: Optional[TextDoc]
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.name = TextDoc(text='my first audio')
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import AudioDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
audio: AudioDoc
text: TextDoc
mmdoc = MultiModalDoc(
audio=AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.url.load()
# equivalent to
mmdoc.audio.bytes_ = mmdoc.audio.url.load_bytes()
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.bytes_.load()
Source code in docarray/documents/audio.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
ImageDoc
Bases: BaseDoc
Document for handling images.
It can contain:
- an
ImageUrl
(Image.url
) - an
ImageTensor
(Image.tensor
) - an
AnyEmbedding
(Image.embedding
) - an
ImageBytes
object (ImageDoc.bytes_
)
You can use this Document directly:
from docarray.documents import ImageDoc
# use it directly
image = ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
You can extend this Document:
from docarray.documents import ImageDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyImage(ImageDoc):
second_embedding: Optional[AnyEmbedding]
image = MyImage(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
# image.second_embedding = model(image.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image: ImageDoc
text: TextDoc
mmdoc = MultiModalDoc(
image=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image.tensor = mmdoc.image.url.load()
# or
mmdoc.image.bytes_ = mmdoc.image.url.load_bytes()
mmdoc.image.tensor = mmdoc.image.bytes_.load()
Source code in docarray/documents/image.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str]
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding]
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
TextDoc
Bases: BaseDoc
Document for handling text.
It can contain:
- a
TextUrl
(TextDoc.url
) - a
str
(TextDoc.text
) - an
AnyEmbedding
(TextDoc.embedding
) - a
bytes
object (TextDoc.bytes_
)
You can use this Document directly:
from docarray.documents import TextDoc
# use it directly
txt_doc = TextDoc(url='http://www.jina.ai/')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
You can initialize directly from a string:
You can extend this Document:
from docarray.documents import TextDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyText(TextDoc):
second_embedding: Optional[AnyEmbedding]
txt_doc = MyText(url='http://www.jina.ai/')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
# txt_doc.second_embedding = model(txt_doc.text)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image_doc: ImageDoc
text_doc: TextDoc
mmdoc = MultiModalDoc(
image_doc=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text_doc=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image_doc.tensor = mmdoc.image_doc.url.load()
# or
mmdoc.image_doc.bytes_ = mmdoc.image_doc.url.load_bytes()
mmdoc.image_doc.tensor = mmdoc.image_doc.bytes_.load()
This Document can be compared against another Document of the same type or a string.
When compared against another object of the same type, the pydantic BaseModel
equality check will apply which checks the equality of every attribute,
excluding id
. When compared against a str, it will check the equality
of the text
attribute against the given string.
from docarray.documents import TextDoc
doc = TextDoc(text='This is the main text', url='exampleurl.com')
doc2 = TextDoc(text='This is the main text', url='exampleurl.com')
doc == 'This is the main text' # True
doc == doc2 # True
Source code in docarray/documents/text.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
__contains__(item)
This method makes TextDoc
behave the same as an str
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
str
|
A string to be checked if is a substring of |
required |
Returns:
Type | Description |
---|---|
bool
|
A boolean determining the presence of |
Source code in docarray/documents/text.py
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
VideoDoc
Bases: BaseDoc
Document for handling video.
The Video Document can contain:
- a
VideoUrl
(VideoDoc.url
) - an
AudioDoc
(VideoDoc.audio
) - a
VideoTensor
(VideoDoc.tensor
) - an
AnyTensor
representing the indices of the video's key frames (VideoDoc.key_frame_indices
) - an
AnyEmbedding
(VideoDoc.embedding
) - a
VideoBytes
object (VideoDoc.bytes_
)
You can use this Document directly:
from docarray.documents import VideoDoc
# use it directly
vid = VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
vid.tensor, vid.audio.tensor, vid.key_frame_indices = vid.url.load()
# model = MyEmbeddingModel()
# vid.embedding = model(vid.tensor)
You can extend this Document:
from typing import Optional
from docarray.documents import TextDoc, VideoDoc
# extend it
class MyVideo(VideoDoc):
name: Optional[TextDoc]
video = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
video.name = TextDoc(text='my first video')
video.tensor = video.url.load().video
# model = MyEmbeddingModel()
# video.embedding = model(video.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import TextDoc, VideoDoc
# compose it
class MultiModalDoc(BaseDoc):
video: VideoDoc
text: TextDoc
mmdoc = MultiModalDoc(
video=VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.video.tensor = mmdoc.video.url.load().video
# or
mmdoc.video.bytes_ = mmdoc.video.url.load_bytes()
mmdoc.video.tensor = mmdoc.video.bytes_.load().video
Source code in docarray/documents/video.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
audio
AudioDoc
Bases: BaseDoc
Document for handling audios.
The Audio Document can contain:
- an
AudioUrl
(AudioDoc.url
) - an
AudioTensor
(AudioDoc.tensor
) - an
AnyEmbedding
(AudioDoc.embedding
) - an
AudioBytes
(AudioDoc.bytes_
) object - an integer representing the frame_rate (
AudioDoc.frame_rate
)
You can use this Document directly:
from docarray.documents import AudioDoc
# use it directly
audio = AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can extend this Document:
from docarray.documents import AudioDoc, TextDoc
from typing import Optional
# extend it
class MyAudio(AudioDoc):
name: Optional[TextDoc]
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.name = TextDoc(text='my first audio')
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import AudioDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
audio: AudioDoc
text: TextDoc
mmdoc = MultiModalDoc(
audio=AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.url.load()
# equivalent to
mmdoc.audio.bytes_ = mmdoc.audio.url.load_bytes()
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.bytes_.load()
Source code in docarray/documents/audio.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
helper
create_doc(__model_name, *, __config__=None, __base__=BaseDoc, __module__=__name__, __validators__=None, __cls_kwargs__=None, __slots__=None, **field_definitions)
Dynamically create a subclass of BaseDoc. This is a wrapper around pydantic's create_model.
from docarray.documents import Audio
from docarray.documents.helper import create_doc
from docarray.typing.tensor.audio import AudioNdArray
MyAudio = create_doc(
'MyAudio',
__base__=Audio,
title=(str, ...),
tensor=(AudioNdArray, ...),
)
assert issubclass(MyAudio, BaseDoc)
assert issubclass(MyAudio, Audio)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
__model_name |
str
|
name of the created model |
required |
__config__ |
Optional[Type[BaseConfig]]
|
config class to use for the new model |
None
|
__base__ |
Type[T_doc]
|
base class for the new model to inherit from, must be BaseDoc or its subclass |
BaseDoc
|
__module__ |
str
|
module of the created model |
__name__
|
__validators__ |
Dict[str, AnyClassMethod]
|
a dict of method names and @validator class methods |
None
|
__cls_kwargs__ |
Dict[str, Any]
|
a dict for class creation |
None
|
__slots__ |
Optional[Tuple[str, ...]]
|
Deprecated, |
None
|
field_definitions |
Any
|
fields of the model (or extra fields if a base is supplied) in the format |
{}
|
Returns:
Type | Description |
---|---|
Type[T_doc]
|
the new Document class |
Source code in docarray/documents/helper.py
create_doc_from_dict(model_name, data_dict)
Create a subclass of BaseDoc based on example data given as a dictionary.
In case the example contains None as a value, corresponding field will be viewed as the type Any.
import numpy as np
from docarray.documents import ImageDoc
from docarray.documents.helper import create_doc_from_dict
data_dict = {'image': ImageDoc(tensor=np.random.rand(3, 224, 224)), 'author': 'me'}
MyDoc = create_doc_from_dict(model_name='MyDoc', data_dict=data_dict)
assert issubclass(MyDoc, BaseDoc)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Name of the new Document class |
required |
data_dict |
Dict[str, Any]
|
Dictionary of field types to their corresponding values. |
required |
Returns:
Type | Description |
---|---|
Type[T_doc]
|
the new Document class |
Source code in docarray/documents/helper.py
create_doc_from_typeddict(typeddict_cls, **kwargs)
Create a subclass of BaseDoc based on the fields of a TypedDict
. This is a wrapper around pydantic's create_model_from_typeddict.
from typing_extensions import TypedDict
from docarray import BaseDoc
from docarray.documents import Audio
from docarray.documents.helper import create_doc_from_typeddict
from docarray.typing.tensor.audio import AudioNdArray
class MyAudio(TypedDict):
title: str
tensor: AudioNdArray
Doc = create_doc_from_typeddict(MyAudio, __base__=Audio)
assert issubclass(Doc, BaseDoc)
assert issubclass(Doc, Audio)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
typeddict_cls |
Type[TypedDict]
|
TypedDict class to use for the new Document class |
required |
kwargs |
Any
|
extra arguments to pass to |
{}
|
Returns:
Type | Description |
---|---|
the new Document class |
Source code in docarray/documents/helper.py
image
ImageDoc
Bases: BaseDoc
Document for handling images.
It can contain:
- an
ImageUrl
(Image.url
) - an
ImageTensor
(Image.tensor
) - an
AnyEmbedding
(Image.embedding
) - an
ImageBytes
object (ImageDoc.bytes_
)
You can use this Document directly:
from docarray.documents import ImageDoc
# use it directly
image = ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
You can extend this Document:
from docarray.documents import ImageDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyImage(ImageDoc):
second_embedding: Optional[AnyEmbedding]
image = MyImage(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
# image.second_embedding = model(image.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image: ImageDoc
text: TextDoc
mmdoc = MultiModalDoc(
image=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image.tensor = mmdoc.image.url.load()
# or
mmdoc.image.bytes_ = mmdoc.image.url.load_bytes()
mmdoc.image.tensor = mmdoc.image.bytes_.load()
Source code in docarray/documents/image.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
legacy
LegacyDocument
Bases: BaseDoc
This Document is the LegacyDocument. It follows the same schema as in DocArray v1. It can be useful to start migrating a codebase from v1 to v2.
Nevertheless, the API is not totally compatible with DocArray v1 Document
.
Indeed, none of the method associated with Document
are present. Only the schema
of the data is similar.
from docarray import DocList
from docarray.documents.legacy import LegacyDocument
import numpy as np
doc = LegacyDocument(text='hello')
doc.url = 'http://myimg.png'
doc.tensor = np.zeros((3, 224, 224))
doc.embedding = np.zeros((100, 1))
doc.tags['price'] = 10
doc.chunks = DocList[Document]([Document() for _ in range(10)])
doc.chunks = DocList[Document]([Document() for _ in range(10)])
Source code in docarray/documents/legacy/legacy_document.py
legacy_document
LegacyDocument
Bases: BaseDoc
This Document is the LegacyDocument. It follows the same schema as in DocArray v1. It can be useful to start migrating a codebase from v1 to v2.
Nevertheless, the API is not totally compatible with DocArray v1 Document
.
Indeed, none of the method associated with Document
are present. Only the schema
of the data is similar.
from docarray import DocList
from docarray.documents.legacy import LegacyDocument
import numpy as np
doc = LegacyDocument(text='hello')
doc.url = 'http://myimg.png'
doc.tensor = np.zeros((3, 224, 224))
doc.embedding = np.zeros((100, 1))
doc.tags['price'] = 10
doc.chunks = DocList[Document]([Document() for _ in range(10)])
doc.chunks = DocList[Document]([Document() for _ in range(10)])
Source code in docarray/documents/legacy/legacy_document.py
mesh
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str]
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
mesh_3d
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str]
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
vertices_and_faces
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
point_cloud
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding]
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
point_cloud_3d
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding]
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
points_and_colors
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
text
TextDoc
Bases: BaseDoc
Document for handling text.
It can contain:
- a
TextUrl
(TextDoc.url
) - a
str
(TextDoc.text
) - an
AnyEmbedding
(TextDoc.embedding
) - a
bytes
object (TextDoc.bytes_
)
You can use this Document directly:
from docarray.documents import TextDoc
# use it directly
txt_doc = TextDoc(url='http://www.jina.ai/')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
You can initialize directly from a string:
You can extend this Document:
from docarray.documents import TextDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyText(TextDoc):
second_embedding: Optional[AnyEmbedding]
txt_doc = MyText(url='http://www.jina.ai/')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
# txt_doc.second_embedding = model(txt_doc.text)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image_doc: ImageDoc
text_doc: TextDoc
mmdoc = MultiModalDoc(
image_doc=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text_doc=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image_doc.tensor = mmdoc.image_doc.url.load()
# or
mmdoc.image_doc.bytes_ = mmdoc.image_doc.url.load_bytes()
mmdoc.image_doc.tensor = mmdoc.image_doc.bytes_.load()
This Document can be compared against another Document of the same type or a string.
When compared against another object of the same type, the pydantic BaseModel
equality check will apply which checks the equality of every attribute,
excluding id
. When compared against a str, it will check the equality
of the text
attribute against the given string.
from docarray.documents import TextDoc
doc = TextDoc(text='This is the main text', url='exampleurl.com')
doc2 = TextDoc(text='This is the main text', url='exampleurl.com')
doc == 'This is the main text' # True
doc == doc2 # True
Source code in docarray/documents/text.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
__contains__(item)
This method makes TextDoc
behave the same as an str
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
str
|
A string to be checked if is a substring of |
required |
Returns:
Type | Description |
---|---|
bool
|
A boolean determining the presence of |
Source code in docarray/documents/text.py
video
VideoDoc
Bases: BaseDoc
Document for handling video.
The Video Document can contain:
- a
VideoUrl
(VideoDoc.url
) - an
AudioDoc
(VideoDoc.audio
) - a
VideoTensor
(VideoDoc.tensor
) - an
AnyTensor
representing the indices of the video's key frames (VideoDoc.key_frame_indices
) - an
AnyEmbedding
(VideoDoc.embedding
) - a
VideoBytes
object (VideoDoc.bytes_
)
You can use this Document directly:
from docarray.documents import VideoDoc
# use it directly
vid = VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
vid.tensor, vid.audio.tensor, vid.key_frame_indices = vid.url.load()
# model = MyEmbeddingModel()
# vid.embedding = model(vid.tensor)
You can extend this Document:
from typing import Optional
from docarray.documents import TextDoc, VideoDoc
# extend it
class MyVideo(VideoDoc):
name: Optional[TextDoc]
video = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
video.name = TextDoc(text='my first video')
video.tensor = video.url.load().video
# model = MyEmbeddingModel()
# video.embedding = model(video.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import TextDoc, VideoDoc
# compose it
class MultiModalDoc(BaseDoc):
video: VideoDoc
text: TextDoc
mmdoc = MultiModalDoc(
video=VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.video.tensor = mmdoc.video.url.load().video
# or
mmdoc.video.bytes_ = mmdoc.video.url.load_bytes()
mmdoc.video.tensor = mmdoc.video.bytes_.load().video
Source code in docarray/documents/video.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|