Documents
docarray.documents
AudioDoc
Bases: BaseDoc
Document for handling audios.
The Audio Document can contain:
- an
AudioUrl
(AudioDoc.url
) - an
AudioTensor
(AudioDoc.tensor
) - an
AnyEmbedding
(AudioDoc.embedding
) - an
AudioBytes
(AudioDoc.bytes_
) object - an integer representing the frame_rate (
AudioDoc.frame_rate
)
You can use this Document directly:
from docarray.documents import AudioDoc
# use it directly
audio = AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can extend this Document:
from docarray.documents import AudioDoc, TextDoc
from typing import Optional
# extend it
class MyAudio(AudioDoc):
name: Optional[TextDoc] = None
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.name = TextDoc(text='my first audio')
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import AudioDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
audio: AudioDoc
text: TextDoc
mmdoc = MultiModalDoc(
audio=AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.url.load()
# equivalent to
mmdoc.audio.bytes_ = mmdoc.audio.url.load_bytes()
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.bytes_.load()
Source code in docarray/documents/audio.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
ImageDoc
Bases: BaseDoc
Document for handling images.
It can contain:
- an
ImageUrl
(Image.url
) - an
ImageTensor
(Image.tensor
) - an
AnyEmbedding
(Image.embedding
) - an
ImageBytes
object (ImageDoc.bytes_
)
You can use this Document directly:
from docarray.documents import ImageDoc
# use it directly
image = ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
You can extend this Document:
from docarray.documents import ImageDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyImage(ImageDoc):
second_embedding: Optional[AnyEmbedding] = None
image = MyImage(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
# image.second_embedding = model(image.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image: ImageDoc
text: TextDoc
mmdoc = MultiModalDoc(
image=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image.tensor = mmdoc.image.url.load()
# or
mmdoc.image.bytes_ = mmdoc.image.url.load_bytes()
mmdoc.image.tensor = mmdoc.image.bytes_.load()
Source code in docarray/documents/image.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str] = None
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding] = None
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
TextDoc
Bases: BaseDoc
Document for handling text.
It can contain:
- a
TextUrl
(TextDoc.url
) - a
str
(TextDoc.text
) - an
AnyEmbedding
(TextDoc.embedding
) - a
bytes
object (TextDoc.bytes_
)
You can use this Document directly:
from docarray.documents import TextDoc
# use it directly
txt_doc = TextDoc(url='https://www.gutenberg.org/files/1065/1065-0.txt')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
You can initialize directly from a string:
You can extend this Document:
from docarray.documents import TextDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyText(TextDoc):
second_embedding: Optional[AnyEmbedding] = None
txt_doc = MyText(url='https://www.gutenberg.org/files/1065/1065-0.txt')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
# txt_doc.second_embedding = model(txt_doc.text)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image_doc: ImageDoc
text_doc: TextDoc
mmdoc = MultiModalDoc(
image_doc=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text_doc=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image_doc.tensor = mmdoc.image_doc.url.load()
# or
mmdoc.image_doc.bytes_ = mmdoc.image_doc.url.load_bytes()
mmdoc.image_doc.tensor = mmdoc.image_doc.bytes_.load()
This Document can be compared against another Document of the same type or a string.
When compared against another object of the same type, the pydantic BaseModel
equality check will apply which checks the equality of every attribute,
excluding id
. When compared against a str, it will check the equality
of the text
attribute against the given string.
from docarray.documents import TextDoc
doc = TextDoc(text='This is the main text', url='exampleurl.com/file')
doc2 = TextDoc(text='This is the main text', url='exampleurl.com/file')
doc == 'This is the main text' # True
doc == doc2 # True
Source code in docarray/documents/text.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
__contains__(item)
This method makes TextDoc
behave the same as an str
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
str
|
A string to be checked if is a substring of |
required |
Returns:
Type | Description |
---|---|
bool
|
A boolean determining the presence of |
Source code in docarray/documents/text.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
VideoDoc
Bases: BaseDoc
Document for handling video.
The Video Document can contain:
- a
VideoUrl
(VideoDoc.url
) - an
AudioDoc
(VideoDoc.audio
) - a
VideoTensor
(VideoDoc.tensor
) - an
AnyTensor
representing the indices of the video's key frames (VideoDoc.key_frame_indices
) - an
AnyEmbedding
(VideoDoc.embedding
) - a
VideoBytes
object (VideoDoc.bytes_
)
You can use this Document directly:
from docarray.documents import VideoDoc, AudioDoc
# use it directly
vid = VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
tensor, audio_tensor, key_frame_indices = vid.url.load()
vid.tensor = tensor
vid.audio = AudioDoc(tensor=audio_tensor)
vid.key_frame_indices = key_frame_indices
# model = MyEmbeddingModel()
# vid.embedding = model(vid.tensor)
You can extend this Document:
from typing import Optional
from docarray.documents import TextDoc, VideoDoc
# extend it
class MyVideo(VideoDoc):
name: Optional[TextDoc] = None
video = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
video.name = TextDoc(text='my first video')
video.tensor = video.url.load().video
# model = MyEmbeddingModel()
# video.embedding = model(video.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import TextDoc, VideoDoc
# compose it
class MultiModalDoc(BaseDoc):
video: VideoDoc
text: TextDoc
mmdoc = MultiModalDoc(
video=VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.video.tensor = mmdoc.video.url.load().video
# or
mmdoc.video.bytes_ = mmdoc.video.url.load_bytes()
mmdoc.video.tensor = mmdoc.video.bytes_.load().video
Source code in docarray/documents/video.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
audio
AudioDoc
Bases: BaseDoc
Document for handling audios.
The Audio Document can contain:
- an
AudioUrl
(AudioDoc.url
) - an
AudioTensor
(AudioDoc.tensor
) - an
AnyEmbedding
(AudioDoc.embedding
) - an
AudioBytes
(AudioDoc.bytes_
) object - an integer representing the frame_rate (
AudioDoc.frame_rate
)
You can use this Document directly:
from docarray.documents import AudioDoc
# use it directly
audio = AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can extend this Document:
from docarray.documents import AudioDoc, TextDoc
from typing import Optional
# extend it
class MyAudio(AudioDoc):
name: Optional[TextDoc] = None
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
)
audio.name = TextDoc(text='my first audio')
audio.tensor, audio.frame_rate = audio.url.load()
# model = MyEmbeddingModel()
# audio.embedding = model(audio.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import AudioDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
audio: AudioDoc
text: TextDoc
mmdoc = MultiModalDoc(
audio=AudioDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.wav?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.url.load()
# equivalent to
mmdoc.audio.bytes_ = mmdoc.audio.url.load_bytes()
mmdoc.audio.tensor, mmdoc.audio.frame_rate = mmdoc.audio.bytes_.load()
Source code in docarray/documents/audio.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
helper
create_doc(__model_name, *, __config__=None, __base__=BaseDoc, __module__=__name__, __validators__=None, __cls_kwargs__=None, __slots__=None, **field_definitions)
Dynamically create a subclass of BaseDoc. This is a wrapper around pydantic's create_model.
Note
To pickle a dynamically created BaseDoc subclass:
- the class must be defined globally
- it must provide
__module__
from docarray.documents import Audio
from docarray.documents.helper import create_doc
from docarray.typing.tensor.audio import AudioNdArray
MyAudio = create_doc(
'MyAudio',
__base__=Audio,
title=(str, ...),
tensor=(AudioNdArray, ...),
)
assert safe_issubclass(MyAudio, BaseDoc)
assert safe_issubclass(MyAudio, Audio)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
__model_name |
str
|
name of the created model |
required |
__config__ |
Optional[Type[BaseConfig]]
|
config class to use for the new model |
None
|
__base__ |
Type[T_doc]
|
base class for the new model to inherit from, must be BaseDoc or its subclass |
BaseDoc
|
__module__ |
str
|
module of the created model |
__name__
|
__validators__ |
Dict[str, AnyClassMethod]
|
a dict of method names and @validator class methods |
None
|
__cls_kwargs__ |
Dict[str, Any]
|
a dict for class creation |
None
|
__slots__ |
Optional[Tuple[str, ...]]
|
Deprecated, |
None
|
field_definitions |
Any
|
fields of the model (or extra fields if a base is supplied) in the format |
{}
|
Returns:
Type | Description |
---|---|
Type[T_doc]
|
the new Document class |
Source code in docarray/documents/helper.py
create_doc_from_dict(model_name, data_dict)
Create a subclass of BaseDoc based on example data given as a dictionary.
In case the example contains None as a value, corresponding field will be viewed as the type Any.
import numpy as np
from docarray.documents import ImageDoc
from docarray.documents.helper import create_doc_from_dict
data_dict = {'image': ImageDoc(tensor=np.random.rand(3, 224, 224)), 'author': 'me'}
MyDoc = create_doc_from_dict(model_name='MyDoc', data_dict=data_dict)
assert safe_issubclass(MyDoc, BaseDoc)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Name of the new Document class |
required |
data_dict |
Dict[str, Any]
|
Dictionary of field types to their corresponding values. |
required |
Returns:
Type | Description |
---|---|
Type[T_doc]
|
the new Document class |
Source code in docarray/documents/helper.py
create_doc_from_typeddict(typeddict_cls, **kwargs)
Create a subclass of BaseDoc based on the fields of a TypedDict
. This is a wrapper around pydantic's create_model_from_typeddict.
from typing_extensions import TypedDict
from docarray import BaseDoc
from docarray.documents import Audio
from docarray.documents.helper import create_doc_from_typeddict
from docarray.typing.tensor.audio import AudioNdArray
class MyAudio(TypedDict):
title: str
tensor: AudioNdArray
Doc = create_doc_from_typeddict(MyAudio, __base__=Audio)
assert safe_issubclass(Doc, BaseDoc)
assert safe_issubclass(Doc, Audio)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
typeddict_cls |
Type[TypedDict]
|
TypedDict class to use for the new Document class |
required |
kwargs |
Any
|
extra arguments to pass to |
{}
|
Returns:
Type | Description |
---|---|
the new Document class |
Source code in docarray/documents/helper.py
image
ImageDoc
Bases: BaseDoc
Document for handling images.
It can contain:
- an
ImageUrl
(Image.url
) - an
ImageTensor
(Image.tensor
) - an
AnyEmbedding
(Image.embedding
) - an
ImageBytes
object (ImageDoc.bytes_
)
You can use this Document directly:
from docarray.documents import ImageDoc
# use it directly
image = ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
You can extend this Document:
from docarray.documents import ImageDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyImage(ImageDoc):
second_embedding: Optional[AnyEmbedding] = None
image = MyImage(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
)
image.tensor = image.url.load()
# model = MyEmbeddingModel()
# image.embedding = model(image.tensor)
# image.second_embedding = model(image.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image: ImageDoc
text: TextDoc
mmdoc = MultiModalDoc(
image=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image.tensor = mmdoc.image.url.load()
# or
mmdoc.image.bytes_ = mmdoc.image.url.load_bytes()
mmdoc.image.tensor = mmdoc.image.bytes_.load()
Source code in docarray/documents/image.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
legacy
LegacyDocument
Bases: BaseDoc
This Document is the LegacyDocument. It follows the same schema as in DocArray <=0.21. It can be useful to start migrating a codebase from v1 to v2.
Nevertheless, the API is not totally compatible with DocArray <=0.21 Document
.
Indeed, none of the method associated with Document
are present. Only the schema
of the data is similar.
from docarray import DocList
from docarray.documents.legacy import LegacyDocument
import numpy as np
doc = LegacyDocument(text='hello')
doc.url = 'http://myimg.png'
doc.tensor = np.zeros((3, 224, 224))
doc.embedding = np.zeros((100, 1))
doc.tags['price'] = 10
doc.chunks = DocList[Document]([Document() for _ in range(10)])
doc.chunks = DocList[Document]([Document() for _ in range(10)])
Source code in docarray/documents/legacy/legacy_document.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
legacy_document
LegacyDocument
Bases: BaseDoc
This Document is the LegacyDocument. It follows the same schema as in DocArray <=0.21. It can be useful to start migrating a codebase from v1 to v2.
Nevertheless, the API is not totally compatible with DocArray <=0.21 Document
.
Indeed, none of the method associated with Document
are present. Only the schema
of the data is similar.
from docarray import DocList
from docarray.documents.legacy import LegacyDocument
import numpy as np
doc = LegacyDocument(text='hello')
doc.url = 'http://myimg.png'
doc.tensor = np.zeros((3, 224, 224))
doc.embedding = np.zeros((100, 1))
doc.tags['price'] = 10
doc.chunks = DocList[Document]([Document() for _ in range(10)])
doc.chunks = DocList[Document]([Document() for _ in range(10)])
Source code in docarray/documents/legacy/legacy_document.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
mesh
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str] = None
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
mesh_3d
Mesh3D
Bases: BaseDoc
Document for handling meshes for 3D data representation.
A mesh is a representation for 3D data and contains vertices and faces information. Vertices are points in a 3D space, represented as a tensor of shape (n_points, 3). Faces are triangular surfaces that can be defined by three points in 3D space, corresponding to the three vertices of a triangle. Faces can be represented as a tensor of shape (n_faces, 3). Each number in that tensor refers to an index of a vertex in the tensor of vertices.
The Mesh3D Document can contain:
- an
Mesh3DUrl
(Mesh3D.url
) -
a
VerticesAndFaces
object containing: -
an
AnyEmbedding
(Mesh3D.embedding
) - a
bytes
object (Mesh3D.bytes_
).
You can use this Document directly:
from docarray.documents import Mesh3D
# use it directly
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.tensors.vertices)
You can extend this Document:
from docarray.documents import Mesh3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyMesh3D(Mesh3D):
name: Optional[str] = None
mesh = MyMesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
mesh.name = 'my first mesh'
mesh.tensors = mesh.url.load()
# model = MyEmbeddingModel()
# mesh.embedding = model(mesh.vertices)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import Mesh3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
mesh: Mesh3D
text: TextDoc
mmdoc = MultiModalDoc(
mesh=Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.mesh.tensors = mmdoc.mesh.url.load()
# or
mmdoc.mesh.bytes_ = mmdoc.mesh.url.load_bytes()
You can display your 3D mesh in a notebook from either its url, or its tensors:
from docarray.documents import Mesh3D
# display from url
mesh = Mesh3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# mesh.url.display()
# display from tensors
mesh.tensors = mesh.url.load()
# mesh.tensors.display()
Source code in docarray/documents/mesh/mesh_3d.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
vertices_and_faces
VerticesAndFaces
Bases: BaseDoc
Document for handling the tensor data of a Mesh3D
object.
A VerticesAndFaces Document can contain:
- an
AnyTensor
containing the vertices information (VerticesAndFaces.vertices
) - an
AnyTensor
containing the faces information (VerticesAndFaces.faces
)
Source code in docarray/documents/mesh/vertices_and_faces.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot mesh consisting of vertices and faces.
Source code in docarray/documents/mesh/vertices_and_faces.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
point_cloud
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding] = None
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
point_cloud_3d
PointCloud3D
Bases: BaseDoc
Document for handling point clouds for 3D data representation.
Point cloud is a representation of a 3D mesh. It is made by repeatedly and uniformly
sampling points within the surface of the 3D body. Compared to the mesh
representation, the point cloud is a fixed size ndarray of shape (n_samples, 3)
and
hence easier for deep learning algorithms to handle.
A PointCloud3D Document can contain:
- a
PointCloud3DUrl
(PointCloud3D.url
) - a
PointsAndColors
object (PointCloud3D.tensors
) - an
AnyEmbedding
(PointCloud3D.embedding
) - a
bytes
object (PointCloud3D.bytes_
)
You can use this Document directly:
from docarray.documents import PointCloud3D
# use it directly
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
You can extend this Document:
from docarray.documents import PointCloud3D
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyPointCloud3D(PointCloud3D):
second_embedding: Optional[AnyEmbedding] = None
pc = MyPointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
pc.tensors = pc.url.load(samples=100)
# model = MyEmbeddingModel()
# pc.embedding = model(pc.tensors.points)
# pc.second_embedding = model(pc.tensors.colors)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import PointCloud3D, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
point_cloud: PointCloud3D
text: TextDoc
mmdoc = MultiModalDoc(
point_cloud=PointCloud3D(
url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.point_cloud.tensors = mmdoc.point_cloud.url.load(samples=100)
# or
mmdoc.point_cloud.bytes_ = mmdoc.point_cloud.url.load_bytes()
You can display your point cloud from either its url, or its tensors:
from docarray.documents import PointCloud3D
# display from url
pc = PointCloud3D(url='https://people.sc.fsu.edu/~jburkardt/data/obj/al.obj')
# pc.url.display()
# display from tensors
pc.tensors = pc.url.load(samples=10000)
# pc.tensors.display()
Source code in docarray/documents/point_cloud/point_cloud_3d.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
points_and_colors
PointsAndColors
Bases: BaseDoc
Document for handling the tensor data of a PointCloud3D
object.
A PointsAndColors Document can contain:
- an
AnyTensor
containing the points in 3D space information (PointsAndColors.points
) - an
AnyTensor
containing the points' color information (PointsAndColors.colors
)
Source code in docarray/documents/point_cloud/points_and_colors.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
display()
Plot point cloud consisting of points in 3D space and optionally colors.
Source code in docarray/documents/point_cloud/points_and_colors.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
text
TextDoc
Bases: BaseDoc
Document for handling text.
It can contain:
- a
TextUrl
(TextDoc.url
) - a
str
(TextDoc.text
) - an
AnyEmbedding
(TextDoc.embedding
) - a
bytes
object (TextDoc.bytes_
)
You can use this Document directly:
from docarray.documents import TextDoc
# use it directly
txt_doc = TextDoc(url='https://www.gutenberg.org/files/1065/1065-0.txt')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
You can initialize directly from a string:
You can extend this Document:
from docarray.documents import TextDoc
from docarray.typing import AnyEmbedding
from typing import Optional
# extend it
class MyText(TextDoc):
second_embedding: Optional[AnyEmbedding] = None
txt_doc = MyText(url='https://www.gutenberg.org/files/1065/1065-0.txt')
txt_doc.text = txt_doc.url.load()
# model = MyEmbeddingModel()
# txt_doc.embedding = model(txt_doc.text)
# txt_doc.second_embedding = model(txt_doc.text)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import ImageDoc, TextDoc
# compose it
class MultiModalDoc(BaseDoc):
image_doc: ImageDoc
text_doc: TextDoc
mmdoc = MultiModalDoc(
image_doc=ImageDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true'
),
text_doc=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.image_doc.tensor = mmdoc.image_doc.url.load()
# or
mmdoc.image_doc.bytes_ = mmdoc.image_doc.url.load_bytes()
mmdoc.image_doc.tensor = mmdoc.image_doc.bytes_.load()
This Document can be compared against another Document of the same type or a string.
When compared against another object of the same type, the pydantic BaseModel
equality check will apply which checks the equality of every attribute,
excluding id
. When compared against a str, it will check the equality
of the text
attribute against the given string.
from docarray.documents import TextDoc
doc = TextDoc(text='This is the main text', url='exampleurl.com/file')
doc2 = TextDoc(text='This is the main text', url='exampleurl.com/file')
doc == 'This is the main text' # True
doc == doc2 # True
Source code in docarray/documents/text.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
__contains__(item)
This method makes TextDoc
behave the same as an str
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
str
|
A string to be checked if is a substring of |
required |
Returns:
Type | Description |
---|---|
bool
|
A boolean determining the presence of |
Source code in docarray/documents/text.py
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
video
VideoDoc
Bases: BaseDoc
Document for handling video.
The Video Document can contain:
- a
VideoUrl
(VideoDoc.url
) - an
AudioDoc
(VideoDoc.audio
) - a
VideoTensor
(VideoDoc.tensor
) - an
AnyTensor
representing the indices of the video's key frames (VideoDoc.key_frame_indices
) - an
AnyEmbedding
(VideoDoc.embedding
) - a
VideoBytes
object (VideoDoc.bytes_
)
You can use this Document directly:
from docarray.documents import VideoDoc, AudioDoc
# use it directly
vid = VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
tensor, audio_tensor, key_frame_indices = vid.url.load()
vid.tensor = tensor
vid.audio = AudioDoc(tensor=audio_tensor)
vid.key_frame_indices = key_frame_indices
# model = MyEmbeddingModel()
# vid.embedding = model(vid.tensor)
You can extend this Document:
from typing import Optional
from docarray.documents import TextDoc, VideoDoc
# extend it
class MyVideo(VideoDoc):
name: Optional[TextDoc] = None
video = MyVideo(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
)
video.name = TextDoc(text='my first video')
video.tensor = video.url.load().video
# model = MyEmbeddingModel()
# video.embedding = model(video.tensor)
You can use this Document for composition:
from docarray import BaseDoc
from docarray.documents import TextDoc, VideoDoc
# compose it
class MultiModalDoc(BaseDoc):
video: VideoDoc
text: TextDoc
mmdoc = MultiModalDoc(
video=VideoDoc(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/mov_bbb.mp4?raw=true'
),
text=TextDoc(text='hello world, how are you doing?'),
)
mmdoc.video.tensor = mmdoc.video.url.load().video
# or
mmdoc.video.bytes_ = mmdoc.video.url.load_bytes()
mmdoc.video.tensor = mmdoc.video.bytes_.load().video
Source code in docarray/documents/video.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
dict(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False)
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
Source code in docarray/base_doc/doc.py
from_base64(data, protocol='pickle', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
str
|
a base64 encoded string |
required |
protocol |
Literal['pickle', 'protobuf']
|
protocol to use. It can be 'pickle' or 'protobuf' |
'pickle'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_bytes(data, protocol='protobuf', compress=None)
classmethod
Build Document object from binary bytes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
binary bytes |
required |
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_json(data)
classmethod
Build Document object from json data
Returns:
Type | Description |
---|---|
T
|
a Document object |
Source code in docarray/base_doc/mixins/io.py
from_protobuf(pb_msg)
classmethod
create a Document from a protobuf message
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pb_msg |
DocProto
|
the proto message of the Document |
required |
Returns:
Type | Description |
---|---|
T
|
a Document initialize with the proto data |
Source code in docarray/base_doc/mixins/io.py
json(*, include=None, exclude=None, by_alias=False, skip_defaults=None, exclude_unset=False, exclude_defaults=False, exclude_none=False, encoder=None, models_as_dict=True, **dumps_kwargs)
Generate a JSON representation of the model, include
and exclude
arguments as per dict()
.
encoder
is an optional function to supply as default
to json.dumps(),
other arguments as per json.dumps()
.
Source code in docarray/base_doc/doc.py
parse_raw(b, *, content_type=None, encoding='utf8', proto=None, allow_pickle=False)
classmethod
Parse a raw string or bytes into a base doc
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
StrBytes
|
|
required |
content_type |
str
|
|
None
|
encoding |
str
|
the encoding to use when parsing a string, defaults to 'utf8' |
'utf8'
|
proto |
Protocol
|
protocol to use. |
None
|
allow_pickle |
bool
|
allow pickle protocol |
False
|
Returns:
Type | Description |
---|---|
T
|
a document |
Source code in docarray/base_doc/doc.py
schema_summary()
classmethod
summary()
Print non-empty fields and nested structure of this Document object.
to_base64(protocol='protobuf', compress=None)
Serialize a Document object into as base64 string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compress method to use |
None
|
Returns:
Type | Description |
---|---|
str
|
a base64 encoded string |
Source code in docarray/base_doc/mixins/io.py
to_bytes(protocol='protobuf', compress=None)
Serialize itself into bytes.
For more Pythonic code, please use bytes(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protocol |
ProtocolType
|
protocol to use. It can be 'pickle' or 'protobuf' |
'protobuf'
|
compress |
Optional[str]
|
compression algorithm to use |
None
|
Returns:
Type | Description |
---|---|
bytes
|
the binary serialization in bytes |
Source code in docarray/base_doc/mixins/io.py
to_protobuf()
Convert Document into a Protobuf message.
Returns:
Type | Description |
---|---|
DocProto
|
the protobuf message |
Source code in docarray/base_doc/mixins/io.py
update(other)
Updates self with the content of other. Changes are applied to self. Updating one Document with another consists in the following:
- Setting data properties of the second Document to the first Document if they are not None
- Concatenating lists and updating sets
- Updating recursively Documents and DocLists
- Updating Dictionaries of the left with the right
It behaves as an update operation for Dictionaries, except that since
it is applied to a static schema type, the presence of the field is
given by the field not having a None value and that DocLists,
lists and sets are concatenated. It is worth mentioning that Tuples
are not merged together since they are meant to be immutable,
so they behave as regular types and the value of self
is updated
with the value of other
.
from typing import List, Optional
from docarray import BaseDoc
class MyDocument(BaseDoc):
content: str
title: Optional[str] = None
tags_: List
doc1 = MyDocument(
content='Core content of the document', title='Title', tags_=['python', 'AI']
)
doc2 = MyDocument(content='Core content updated', tags_=['docarray'])
doc1.update(doc2)
assert doc1.content == 'Core content updated'
assert doc1.title == 'Title'
assert doc1.tags_ == ['python', 'AI', 'docarray']
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
T
|
The Document with which to update the contents of this |
required |
Source code in docarray/base_doc/mixins/update.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|