Skip to content

DateTime payload index#3395

Merged
timvisee merged 18 commits intoqdrant:devfrom
xzfc:3320-datetime-index
Jan 31, 2024
Merged

DateTime payload index#3395
timvisee merged 18 commits intoqdrant:devfrom
xzfc:3320-datetime-index

Conversation

@xzfc
Copy link
Member

@xzfc xzfc commented Jan 15, 2024

This PR adds datetime payload index. /claim #3320

Open for discussion/feedback:

  • Internal representation. Current format: milliseconds in i64. Possible alternatives: nanoseconds in u64 or seconds in f64.
  • Timestamp representation. Current format: RFC3339, e.g., 2019-01-01T00:00:00.000Z.
  • Currently datetime_range is a separate key. Need to investigate if this could be merged with the range key in a straightforward way.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@algora-pbc
Copy link

algora-pbc bot commented Jan 15, 2024

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe/Alipay.

Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here @xzfc!
My only concern is the at the api layer. We should keep the Range type at this level to not break backward compatibility in the clients. A newtype instead of generic for datetime range may fix this.

Also, I would love the tests to include some non-UTC timezone examples as a sanity check.

@timvisee
Copy link
Member

timvisee commented Jan 16, 2024

Some comments on your open questions, but I haven't gone through your code yet:

Internal representation. Current format: milliseconds in i64. Possible alternatives: nanoseconds in u64 or seconds in f64.

I think this is a good choice. This way we have a constant resolution of 1 millisecond, while we'd loose precision with f64.

Though, what's the maximum value we can represent here if milliseconds and i64?

Later we can add parameterization to this to support different resolutions, seconds or nanoseconds for example. Each could have an optimized index variant.

I do wonder whether we'd somehow want to make this timezone aware as well. Maybe we shouldn't care for the index and always use UTC.

Timestamp representation. Current format: RFC3339, e.g., 2019-01-01T00:00:00.000Z.

Perfect. Following a solid standard.

@xzfc
Copy link
Member Author

xzfc commented Jan 16, 2024

@timvisee

Though, what's the maximum value we can represent here if milliseconds and i64?

I've checked the supported datetime ranges for chrono library and i64 values.

unit range in years
seconds 584,942,417,355
milliseconds 584,942,417
microseconds 584,942
nanoseconds 584
  • Microseconds in i64 is enough to cover the whole range of dates supported by chrono (~524288 years). So if we stick to chrono, no reason to use milliseconds instead of microseconds.
  • Nanoseconds in i64 would give us a range of about 584 years, from 1677 to 2262.
  • Chrono doesn't support sub-nanosecond accuracy.
  • Seconds in f64 would give us sub-microsecond precision for dates in the current year, and sub-millisecond precision for dates in the max year supported by chrono (262143).
    calculations
    $ python3 -c 'import math,time;t=time.time();print(f"{(math.nextafter(t, t+1)-t) * 1e6:.10f} µs")'
    0.2384185791 µs
    $ python3 -c 'import math,time;t=8210298412799.0;print(f"{(math.nextafter(t, t+1)-t) * 1e6:.10f} µs")'
    976.5625000000 µs

I'll switch to microseconds for now if you don't have objections.

@timvisee
Copy link
Member

@xzfc Thank you so much for diving into this.

I'll switch to microseconds for now if you don't have objections.

No objections, we can extend/change this later if desired.

@generall generall requested a review from coszio January 22, 2024 13:54
@coszio coszio force-pushed the 3320-datetime-index branch 2 times, most recently from 62381ba to 41d5134 Compare January 22, 2024 20:26
Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this @xzfc, looks much better now.

I made a few improvements to push this to the endline:

  • restored the enum variant approach
  • other minor stuff

I'm not sure how much I like approving macros, which are hard to maintain, but we can refactor that later.

Comment on lines +1327 to +1328
#[macro_rules_attribute::macro_rules_derive(crate::common::macros::schemars_rename_generics)]
#[derive_args(<FloatPayloadType> => "Range", <DateTimePayloadType> => "DatetimeRange")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧙

@coszio coszio requested a review from generall January 23, 2024 15:20
@timvisee timvisee force-pushed the 3320-datetime-index branch 2 times, most recently from d4c8d81 to 03d3a9a Compare January 25, 2024 15:09
@timvisee timvisee changed the title Datetime payload index DateTime payload index Jan 25, 2024
@IvanPleshkov IvanPleshkov self-requested a review January 26, 2024 13:28
@coszio coszio requested a review from timvisee January 26, 2024 13:45
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xzfc Thanks a lot for your effort on implementing this! ✔️

@timvisee timvisee force-pushed the 3320-datetime-index branch from 12b3ac6 to 469c060 Compare January 30, 2024 09:57
@timvisee timvisee force-pushed the 3320-datetime-index branch from 469c060 to 11788ad Compare January 31, 2024 09:44
@timvisee timvisee merged commit 41784a2 into qdrant:dev Jan 31, 2024
generall pushed a commit that referenced this pull request Mar 5, 2024
* Datetime payload index

* Introduce IndexMapItem

* Drop FieldIndex::DatetimeIndex

* Rename OpenAPI struct names

* Switch to microseconds

* Validate and serialize grpc timestamps

* Add tests with different timezones

* minor review fixes

* Revert "Drop FieldIndex::DatetimeIndex"

This reverts commit d55f251.

* Revert "Introduce IndexMapItem"

This reverts commit c5255f6.

* fix: back to microseconds after reverts

* extract range conversion from boxed checker fn

* add log to deps

* don't run macro doctest

* no_run -> ignore

* remove prost-types in favor of prost-wkt-types

* better assertion on test_payload_indexing.py

* propagate unparsable datetime

---------

Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants