Skip to content

fix: input function does not allow tailing spaces#148

Merged
ankane merged 4 commits intopgvector:masterfrom
yihong0618:master
Jun 9, 2023
Merged

fix: input function does not allow tailing spaces#148
ankane merged 4 commits intopgvector:masterfrom
yihong0618:master

Conversation

@yihong0618
Copy link
Contributor

@yihong0618 yihong0618 commented Jun 8, 2023

this fix #147

also add test case for it

passed CI can check: https://github.com/yihong0618/pgvector/actions/runs/5209060111

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aligns with current PostgreSQL behavior with arrays, e.g.:

postgres=# SELECT '{1   , 2   }'::int[];
 int4  
-------
 {1,2}
(1 row)

However, I'd recommend following the convention that PostgreSQL has with the array_isspace function for the character checking (which would have to be reimplemented, as it's static):

https://github.com/postgres/postgres/blob/378d73ef204d0dcbeab834d52478e8cb90578ab7/src/backend/utils/adt/arrayfuncs.c#L438

Here is an example of how it is used in the array_in code:

https://github.com/postgres/postgres/blob/378d73ef204d0dcbeab834d52478e8cb90578ab7/src/backend/utils/adt/arrayfuncs.c#L257

@yihong0618
Copy link
Contributor Author

yihong0618 commented Jun 9, 2023

This aligns with current PostgreSQL behavior with arrays, e.g.:

postgres=# SELECT '{1   , 2   }'::int[];
 int4  
-------
 {1,2}
(1 row)

However, I'd recommend following the convention that PostgreSQL has with the array_isspace function for the character checking (which would have to be reimplemented, as it's static):

https://github.com/postgres/postgres/blob/378d73ef204d0dcbeab834d52478e8cb90578ab7/src/backend/utils/adt/arrayfuncs.c#L438

Here is an example of how it is used in the array_in code:

https://github.com/postgres/postgres/blob/378d73ef204d0dcbeab834d52478e8cb90578ab7/src/backend/utils/adt/arrayfuncs.c#L257

thanks will use array_isspace

@jkatz
fixed thanks for the info, and add front and end spaces support like pg array passed CI: https://github.com/yihong0618/pgvector/actions/runs/5217272065

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
src/vector.c Outdated
* https://github.com/postgres/postgres/blob/378d73ef204d0dcbeab834d52478e8cb90578ab7/src/backend/utils/adt/arrayfuncs.c#L438
*/
static inline bool
array_isspace(char ch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name this vector_isspace

src/vector.c Outdated

/*
* Note: we currently allow whitespace between, but not within,
* dimension items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this comment makes sense for vectors, so let's remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy that

Copy link
Member

@ankane ankane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yihong0618, thanks for reporting and the PR! Looks great. Added a few comments inline (accidentally submitted the first review early).

src/vector.c Outdated
CheckElement(x[dim]);
dim++;
// stringEnd is space is OK to support SELECT '[ 1 , 2 ]'::vector(2);
while (array_isspace((unsigned char)*stringEnd))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cast here or below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we need that, this is handle [1 , 2 ] which has spaces between 1 and ,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unsigned char) shouldn't be needed. Also, I think it should come after the if (stringEnd == pt) check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy that, thanks for the info~

Comment on lines +3 to +6
SELECT '[1,2,3 ]'::vector;
SELECT '[1, 2, 3 ]'::vector;
SELECT '[1. , 2, 3 ]'::vector(3);
SELECT ' [1,2,3] '::vector(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine this into a single test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy that

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
@yihong0618
Copy link
Contributor Author

yihong0618 commented Jun 9, 2023

@ankane thanks very much for the review, the comments has been fixed now, and one CI on ubuntu seems failed,
I am not sure its flaky test will check: https://github.com/yihong0618/pgvector/actions/runs/5217584573/jobs/9417561780


rerun and passed

@ankane ankane merged commit 41c68bf into pgvector:master Jun 9, 2023
@ankane
Copy link
Member

ankane commented Jun 9, 2023

Awesome, big thanks @yihong0618!

@yihong0618
Copy link
Contributor Author

@ankane thanks for your great repo~

jkatz pushed a commit to jkatz/pgvector that referenced this pull request Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Input function does not allow tailing spaces

3 participants