Skip to content

feat: draft secondary index#403

Closed
davidbp wants to merge 19 commits intomainfrom
feat-secondary-index
Closed

feat: draft secondary index#403
davidbp wants to merge 19 commits intomainfrom
feat-secondary-index

Conversation

@davidbp
Copy link
Copy Markdown
Contributor

@davidbp davidbp commented Jun 16, 2022

This PR is a draft for storing secondary indices as a dict with different docarrays

ToDo:

Design doc: https://docs.google.com/presentation/d/1rntTa1Ur2WmdAvOUEI01l2WeNSWgtqk2xjth_gdIr50/edit#slide=id.g13443d1ddb2_1_23

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 16, 2022

Codecov Report

Merging #403 (348ebb7) into main (21084e2) will increase coverage by 0.00%.
The diff coverage is 88.23%.

@@           Coverage Diff           @@
##             main     #403   +/-   ##
=======================================
  Coverage   86.64%   86.64%           
=======================================
  Files         134      134           
  Lines        6516     6561   +45     
=======================================
+ Hits         5646     5685   +39     
- Misses        870      876    +6     
Flag Coverage Δ
docarray 86.64% <88.23%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/elastic/getsetdel.py 100.00% <ø> (ø)
docarray/array/mixins/match.py 66.66% <55.55%> (-8.34%) ⬇️
docarray/array/storage/annlite/find.py 90.47% <87.50%> (-2.86%) ⬇️
docarray/array/storage/annlite/backend.py 95.12% <92.30%> (-0.60%) ⬇️
docarray/array/storage/annlite/getsetdel.py 100.00% <100.00%> (ø)
docarray/array/storage/annlite/seqlike.py 85.29% <100.00%> (+3.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21084e2...348ebb7. Read the comment docs.

@JoanFM JoanFM linked an issue Jun 21, 2022 that may be closed by this pull request
@github-actions github-actions bot added size/m and removed size/s labels Jun 28, 2022
@JohannesMessner
Copy link
Copy Markdown
Member

JohannesMessner commented Jul 28, 2022

I am considering the following changes to nomenclature:
secondary_index -> subindex
secondary_indices_configs -> subindex_configs
da.find(query=..., secondary_index='@.[image]') -> da.find(query=..., find_on='@.[image]')

Technically, secondary_index is probably more correct than subindex, but I much prefer the brevity.

@JohannesMessner
Copy link
Copy Markdown
Member

Also related to this: with this nomenclature on find() we could enable the same syntax on in-memory-DA without causing confusion. It would unify the experience between in-memory and document store.

The implementation could be as simple as this, since no actual secondary index is needed:

def find(self, ..., find_on=None):
    if find_on:
        return self[find_on].find(...)

@JohannesMessner JohannesMessner deleted the feat-secondary-index branch July 28, 2022 14:53
@JohannesMessner JohannesMessner restored the feat-secondary-index branch July 28, 2022 14:53
@JohannesMessner
Copy link
Copy Markdown
Member

Continued here: #456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement ANNLite secondary index at deeper levels

2 participants