Store on-disk
When you want to use your DocList in another place, you can use:
Push and pull
To use the store locally, you need to pass a local file path to the function starting with 'file://'
.
from docarray import BaseDoc, DocList
class SimpleDoc(BaseDoc):
text: str
dl = DocList[SimpleDoc]([SimpleDoc(text=f'doc {i}') for i in range(8)])
dl.push('file://simple_dl')
dl_pull = DocList[SimpleDoc].pull('file://simple_dl')
A file with the name of simple_dl.docs
will be created in $HOME/.docarray/cache
to store the DocList
.
Push and pull with streaming
When you have a large amount of documents to push and pull, you can use the streaming method:
.push_stream()
and
.pull_stream()
stream the DocList
to save memory usage. You set multiple DocList
s to pull from the same source as well:
from docarray import BaseDoc, DocList
class SimpleDoc(BaseDoc):
text: str
store_docs = [SimpleDoc(text=f'doc {i}') for i in range(8)]
DocList[SimpleDoc].push_stream(
iter(store_docs),
'file://dl_stream',
)
dl_pull_stream_1 = DocList[SimpleDoc].pull_stream('file://dl_stream')
dl_pull_stream_2 = DocList[SimpleDoc].pull_stream('file://dl_stream')
for d1, d2 in zip(dl_pull_stream_1, dl_pull_stream_2):
print(f'get {d1}, get {d2}')
Output
get SimpleDoc(id='5a4b92af27aadbb852d636892506998b', text='doc 0'), get SimpleDoc(id='5a4b92af27aadbb852d636892506998b', text='doc 0')
get SimpleDoc(id='705e4f6acbab0a6ff10d11a07c03b24c', text='doc 1'), get SimpleDoc(id='705e4f6acbab0a6ff10d11a07c03b24c', text='doc 1')
get SimpleDoc(id='4fb5c01bd5f935bbe91cf73e271ad590', text='doc 2'), get SimpleDoc(id='4fb5c01bd5f935bbe91cf73e271ad590', text='doc 2')
get SimpleDoc(id='381498cef78f1d4f1d80415d67918940', text='doc 3'), get SimpleDoc(id='381498cef78f1d4f1d80415d67918940', text='doc 3')
get SimpleDoc(id='d968bc6fa235b1cfc69eded92926157e', text='doc 4'), get SimpleDoc(id='d968bc6fa235b1cfc69eded92926157e', text='doc 4')
get SimpleDoc(id='30bf347427a4bd50ce8ada1841320fe3', text='doc 5'), get SimpleDoc(id='30bf347427a4bd50ce8ada1841320fe3', text='doc 5')
get SimpleDoc(id='1389877ac97b3e6d0e8eb17568934708', text='doc 6'), get SimpleDoc(id='1389877ac97b3e6d0e8eb17568934708', text='doc 6')
get SimpleDoc(id='264b0eff2cd138d296f15c685e15bf23', text='doc 7'), get SimpleDoc(id='264b0eff2cd138d296f15c685e15bf23', text='doc 7')