Collection Group Queries w/ indexing#1440
Conversation
wilhuff
left a comment
There was a problem hiding this comment.
On the whole I think this is great.
One thing that's missing is a recording of the collection-group-parent entries for pending mutations. We need to index those too otherwise these queries won't work while offline. We probably need some tests that verify that they do work while offline.
packages/firestore/src/core/query.ts
Outdated
|
|
||
| /** | ||
| * Helper to convert a Collection Group query into a collection query at a | ||
| * specific path. |
There was a problem hiding this comment.
This doesn't seem like an intrinsically useful operation. Maybe add a comment about what it's useful for?
| * path for root-level collections). Index entries can be retrieved via | ||
| * getCollectionParents(). | ||
| */ | ||
| indexCollectionParent( |
There was a problem hiding this comment.
This name is slightly confusing because we're using "index" as both a noun and a verb in this context. I initially thought this might be a getter but was confused by the return type. Some other verb might help with that.
Maybe ensureCollectionParentIndexed? addToCollectionParentIndex?
However, see above. Maybe this doesn't need to be so specific.
There was a problem hiding this comment.
Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.
packages/firestore/src/core/query.ts
Outdated
| if (DocumentKey.isDocumentKey(this.path)) { | ||
| if (this.collectionGroup !== null) { | ||
| return ( | ||
| this.collectionGroup === docPath.secondToLastSegment() && |
There was a problem hiding this comment.
"secondToLastSegment" is a wonky thing to have on here.
Maybe add a hasCollectionId(this.collectinoGroup) method on documentPath?
There was a problem hiding this comment.
Done. I added the method to DocumentKey, since it wouldn't always apply to ResourcePath.
| collectionPath: ResourcePath | ||
| ): PersistencePromise<void> { | ||
| assert(collectionPath.length >= 1, 'Invalid collection path.'); | ||
| const collectionId = collectionPath.lastSegment(); |
There was a problem hiding this comment.
This seems to essentially duplicate the MemoryQueryIndexes implementation. Couldn't we just reuse that here?
There was a problem hiding this comment.
Maaaaybe?
It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...
It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.
Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.
There was a problem hiding this comment.
I ended up reworking this, extracting MemoryCollectionParentIndex as a standalone class used by MemoryIndexManager so that I could reuse it in both the IndexedDbIndexManager as well as in the schema migration that back-populates indexes.
| return collectionParentsStore(transaction) | ||
| .loadAll(range) | ||
| .next(entries => { | ||
| for (const { parent } of entries) { |
There was a problem hiding this comment.
What's the name of this construct so I can read more about it? (Not asking for any change--just want to understand this better.)
There was a problem hiding this comment.
Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html
|
One other thought: could it be useful to have a possibly disabled by default test that exercises collection groups in combination with filters/order by? |
mikelehen
left a comment
There was a problem hiding this comment.
Thanks for the quick review! I've fixed the small, easy things, and commented on the rest of the feedback. I'll ping you in an hour or so to two to see if we can chat through the unresolved high-level feedback, in particular the relationship of QueryIndexes to the rest of the components.
| return collectionParentsStore(transaction) | ||
| .loadAll(range) | ||
| .next(entries => { | ||
| for (const { parent } of entries) { |
There was a problem hiding this comment.
Object destructuring. https://basarat.gitbooks.io/typescript/docs/destructuring.html
packages/firestore/src/core/query.ts
Outdated
|
|
||
| /** | ||
| * Helper to convert a Collection Group query into a collection query at a | ||
| * specific path. |
| collectionPath: ResourcePath | ||
| ): PersistencePromise<void> { | ||
| assert(collectionPath.length >= 1, 'Invalid collection path.'); | ||
| const collectionId = collectionPath.lastSegment(); |
There was a problem hiding this comment.
Maaaaybe?
It is very similar code, but I can't think of a clean way to reuse it that wouldn't involve creating an awkward abstraction just for the sake of code reuse, and I hate doing that. It's tempting to just use MemoryQueryIndexes directly, but I worry that'll be really awkward / confusing in the future when we have more indexes...
It's worth noting that here (unlike MemoryQueryIndexes) we only use the in-memory cache of indexes in the write path (to avoid re-writing indexes we know exist). We don't use it in the read path since we never populate it with existing index entries from disk. So although the data structure is the same, we use it differently.
Anyway, if you have a suggestion let me know, else I'm inclined to just live with the duplication for now.
| */ | ||
| constructor(private readonly sizer: DocumentSizer) {} | ||
| constructor( | ||
| private readonly queryIndexes: QueryIndexes, |
There was a problem hiding this comment.
FWIW the fullness of time is very fuzzy to me. I was imagining that we'd have a UnifiedDocumentCache component that would manage mutations and remote documents, and so it could own QueryIndexes and initiate the appropriate index updates (based on old/new values, etc.). So until then, my plan was to just have RemoteDocumentCache and MutationQueue separately do the appropriate index updates (though I forgot the MutationQueue, oops).
But I think this is probably worth talking through in chat or in-person because I don't think I'm fully grokking your suggestion.
| import { PersistencePromise } from './persistence_promise'; | ||
|
|
||
| /** | ||
| * Represents a set of indexes that are used to execute queries efficiently. |
There was a problem hiding this comment.
I added some comment text. I'm fuzzy on what this interface is going to look like over time though. I imagine we may end up removing indexCollectionParent() and instead have indexDocument(oldDoc, newDoc) or something (and it would implicitly index the collection parent as well).
| /** | ||
| * Represents a set of indexes that are used to execute queries efficiently. | ||
| */ | ||
| export interface QueryIndexes { |
There was a problem hiding this comment.
I'm not in love with it either. Your alternatives are the same ones I considered. :-)
I wasn't sure if Indexer was okay, since it also reads indexes (getCollectionParents()) and the rest of our persistence classes are nouns (QueryCache, RemoteDocumentCache, MutationQueue). So I think I'd slightly prefer IndexManager... but I'm kinda' allergic to managers...
I'm opposed to CollectionParentIndex because the file overhead is a pretty big pain and I'm not sure what the future of this all looks like (e.g. see comment about indexDocument(oldDoc, newDoc) above).
Anyway, I'll plan to rename this to IndexManager before final review unless you have a preference for something else.
| * path for root-level collections). Index entries can be retrieved via | ||
| * getCollectionParents(). | ||
| */ | ||
| indexCollectionParent( |
There was a problem hiding this comment.
Good point. How about addCollectionParentEntry(), trying to be consistent with our RemoteDocumentCache.addEntry() naming? I don't care strongly though.
|
Just to quickly summarize our discussion before we go away for a while:
|
* QueryIndexes => IndexManager * indexCollectionParent() => addToCollectionParentIndex()
|
@wilhuff I think this is ready for review. I did the renames we talked about, added tests, etc. |
wilhuff
left a comment
There was a problem hiding this comment.
LGTM. A few more nits and we're good to go!
packages/firebase/index.d.ts
Outdated
| doc(documentPath: string): DocumentReference; | ||
|
|
||
| /** | ||
| * Gets a Query instance that will include documents from all collections and |
There was a problem hiding this comment.
"Gets" is the wrong verb here. Elsewhere when talking about creating a new Query object we've written "Creates and returns a new Query".
Also, we've written "@return The created Query" instead of "The Query instance." below
There was a problem hiding this comment.
Changed to:
* Creates and returns a new Query that includes all documents in the
* database that are contained in a collection or subcollection with the
* given collectionId.
packages/firebase/index.d.ts
Outdated
| * Gets a Query instance that will include documents from all collections and | ||
| * subcollections in the database with the given collectionId. | ||
| * | ||
| * @param collectionId The collectionId specifying the group of collections to |
There was a problem hiding this comment.
"collectionId" is a term of art that's pretty specific to Firestore. Making this comment self-referential like this makes it more opaque than it needs to be. Could we include text here that indicates that a collectionId is the trailing component of a path to a collection? Also maybe note that collectionIds don't contain slashes?
There was a problem hiding this comment.
Changed to:
* @param collectionId Identifies the collections to query over. Every
* collection or subcollection with this ID as the last segment of its path
* will be included. Cannot contain a slash.
| @@ -0,0 +1,50 @@ | |||
| /** | |||
| * Copyright 2018 Google Inc. | |||
| * Represents a set of indexes that are used to execute queries efficiently. | ||
| * | ||
| * Currently the only index is a [collection id] => [parent path] index, used | ||
| * to execute Collection Group queries. When we implement property indexing in |
There was a problem hiding this comment.
All code is subject to change as time progresses. I realize you're not wild about the fog of war surrounding our eventual goals with indexing seems as if this last sentence isn't adding much value.
| if (query.collectionGroup !== null) { | ||
| assert( | ||
| path.length % 2 === 0, | ||
| 'Collection Group queries should be within a document path.' |
There was a problem hiding this comment.
Nit: Does the root of the database count as a document path? Maybe "within a document path or root"?
| }).to.throw(expectedError); | ||
| }); | ||
|
|
||
| it('support collection groups', async () => { |
There was a problem hiding this comment.
When reading this as English I would expect this to read "it supports collection groups". Would it be reasonable to make these it('supports ...', ...)?
There was a problem hiding this comment.
Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."
Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')
mikelehen
left a comment
There was a problem hiding this comment.
Thanks! Nits resolved. I think this is good to go, pending backend durability.
packages/firebase/index.d.ts
Outdated
| doc(documentPath: string): DocumentReference; | ||
|
|
||
| /** | ||
| * Gets a Query instance that will include documents from all collections and |
There was a problem hiding this comment.
Changed to:
* Creates and returns a new Query that includes all documents in the
* database that are contained in a collection or subcollection with the
* given collectionId.
packages/firebase/index.d.ts
Outdated
| * Gets a Query instance that will include documents from all collections and | ||
| * subcollections in the database with the given collectionId. | ||
| * | ||
| * @param collectionId The collectionId specifying the group of collections to |
There was a problem hiding this comment.
Changed to:
* @param collectionId Identifies the collections to query over. Every
* collection or subcollection with this ID as the last segment of its path
* will be included. Cannot contain a slash.
| @@ -0,0 +1,50 @@ | |||
| /** | |||
| * Copyright 2018 Google Inc. | |||
| * Represents a set of indexes that are used to execute queries efficiently. | ||
| * | ||
| * Currently the only index is a [collection id] => [parent path] index, used | ||
| * to execute Collection Group queries. When we implement property indexing in |
| if (query.collectionGroup !== null) { | ||
| assert( | ||
| path.length % 2 === 0, | ||
| 'Collection Group queries should be within a document path.' |
| }).to.throw(expectedError); | ||
| }); | ||
|
|
||
| it('support collection groups', async () => { |
There was a problem hiding this comment.
Eh. I think the verb here should match up with the noun in the describe() function so e.g. in this case the full test case reads "Queries support ..."
Following the example of other tests in this file I've chosen to side-step the issue with it('can query collection groups ...')
|
@Feiyang1 Can you approve for the change to packages/firebase/index.d.ts? [note the change is actually commented-out for now because we can't expose the API yet] |
@wilhuff If you have time can you do an initial sanity-check review of this before the holiday? I was hoping to have it 100% ready, but have been too randomized. The feature works, but it's missing the schema migration to populate the index and I need to add more tests. In particular, this PR contains: