Conversation
|
API appears to have been different in postgres 11, will look into it later this evening |
Ngalstyan4
left a comment
There was a problem hiding this comment.
Good! Left small comments. Should be ready soon
src/hnsw/build.c
Outdated
| int n_items = currDim; | ||
| int indexCol; | ||
|
|
||
| // If NumIndexAttrs isn't 1 there's no clear way to infer the dim |
There was a problem hiding this comment.
Can you clarify in the comment what NumIndexAttrs represents?
src/hnsw/build.c
Outdated
|
|
||
| // Get the indexed column out of the row and return it's dimensions | ||
| datum = heap_getattr(tuple, indexCol, RelationGetDescr(heap), &isNull); | ||
| if(!isNull) { |
There was a problem hiding this comment.
So it seems, if the first row in the table has NULL vector and all the ones afterwards have e.g. 3 dimensional real[] arrays, then we fail to infer. This is fair. Let's just add a test for it and document it
src/hnsw/build.c
Outdated
| if(buildstate->dimensions == HNSW_DEFAULT_DIMS) | ||
| buildstate->dimensions = GetArrayLengthFromHeap(heap, indexInfo, buildstate->dimensions); | ||
| /* Require column to have dimensions to be indexed */ | ||
| if(buildstate->dimensions < 1) elog(ERROR, "column does not have dimensions"); |
There was a problem hiding this comment.
Is this the error we are going to see if the table is empty? Is that case tested?
There was a problem hiding this comment.
It would be if DEFAULT_DIMS were changed from 3 to -1. Currently a number of tests create indices on empty tables, so I wasn't sure what the expected behavior was. I preserved the default to maximize compatibility with the existing API, so right now this doesn't generate an error. If it's alright to throw an error when the table being indexed is empty I can change the default per @var77's suggestion and update the relevant tests to make things more consistent
There was a problem hiding this comment.
It is all right and the expected behaviour. Just throw a helpful message asking the user to provide number of dimensions for the indexed column
src/hnsw/build.c
Outdated
| // If a dimension wasn't specified try to infer it | ||
| // this is unecessary when the actual size is 3 but in practice this shouldn't be too common | ||
| if(columnType == REAL_ARRAY || columnType == INT_ARRAY) | ||
| if(buildstate->dimensions == HNSW_DEFAULT_DIMS) |
There was a problem hiding this comment.
I think we should set HNSW_DEFAULT_DIMS to -1, NULL or something like that, as now if we pass (dims=3) on index creation and insert array with 4 dimensions it is going to overwrite the dims option.
|
There's actually an issue with EDIT: fixed this but I'm seeing odd results on the distance function regression tests these two diffs are exemplary but there are several others |
28320d5 to
65853b4
Compare
|
Hi @ezra-varady, This might have been a result of some issues on indexes with |
f047163 to
e6c1944
Compare
src/hnsw/build.c
Outdated
|
|
||
| // If a dimension wasn't specified try to infer it | ||
| if(buildstate->columnType == REAL_ARRAY || buildstate->columnType == INT_ARRAY) | ||
| if(buildstate->dimensions < 1) buildstate->dimensions = InferDimension(heap, indexInfo); |
There was a problem hiding this comment.
We should always use braces {,} with ifs and other basic blocs.
It is particularly important for cases like these where you have multiple ifs
I think the code above can be clarified as well.
What happens when buildstate->columnType is not REAL_ARRAY or INT_ARRAY?
There was a problem hiding this comment.
Ah you're right, the operators aren't defined, good catch
src/hnsw/build.c
Outdated
| HnswOptions *opts; | ||
| int attrNum; | ||
|
|
||
| // We know there's one key because we generate an error during inference if multiple keys are specified |
There was a problem hiding this comment.
Can you add an assert in place of this comment?
… fix pg11 compat, still some issues with tests
badc634 to
90cfbc9
Compare
This should address #55, allowing postgres to automatically decide the dimension index of arrays of
realandint. It preserves the user's ability to specify a dimension. Whendimsis left as the default lantern will now check the column's dimension and use that instead if it differs. I haven't tested against older versions of postgres, and I'm not familiar with the API, so this may need some additional work to be backwards compatible