Skip to content

opt: extend SplitScanIntoUnionScans to support splitting a constraint with multi-key spans #55156

@agarzas

Description

@agarzas

I have this table

CREATE TABLE IF NOT EXISTS  user_checklist_items (
    tenant_id UUID,
    location_id UUID,
    checklist_item_id UUID,
    user_id STRING,
    configuration_maintenance_id UUID NOT NULL,
    configuration_maintenance_item_id UUID NOT NULL,
    date_should_be_completed DATE NOT NULL,
    is_recurrent_assignation BOOL,
    location_name STRING NOT NULL,
    order_item FLOAT NOT NULL,
    title STRING NOT NULL,
    create_date TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (tenant_id, location_id, checklist_item_id, user_id)
) INTERLEAVE IN PARENT tenants (tenant_id);

CREATE INDEX IF NOT EXISTS userchecklistitems_tenantid_userid_dateshouldbecompleted_locationname_orderitem ON user_checklist_items (tenant_id, user_id, date_should_be_completed, location_name, title, order_item, checklist_item_id) USING HASH WITH BUCKET_COUNT = 8;
CREATE INDEX IF NOT EXISTS userchecklistitems_tenantid_locationid_configurationmaintenanceid_configurationmaintenanceitemid_dateshouldbecompleted ON user_checklist_items (tenant_id, location_id, configuration_maintenance_id, configuration_maintenance_item_id, date_should_be_completed) USING HASH WITH BUCKET_COUNT = 8;

The secondary indexes are using a hash shard index because they are sequentially.

If I run this query:

select tenant_id, user_id, date_should_be_completed
from user_checklist_items@userchecklistitems_tenantid_userid_dateshouldbecompleted_locationname_orderitem
where 
tenant_id = 'a2a0dd49-23cf-4cf2-b823-61701c416e60' and
user_id = '01603523-c6f0-4e12-a43f-524c76b0fa8f' and 
date_should_be_completed >= '2020-10-01'
order by index user_checklist_items@userchecklistitems_tenantid_userid_dateshouldbecompleted_locationname_orderitem
limit 5

keyset pagination works, it´s limmiting the scan to just 5 records, but data is not ordered by the "date_should_be_completed" field, as the index indicates. (I guess due to the hash function to distribute load)

Screen Shot 2020-10-02 at 0 50 33

Screen Shot 2020-10-02 at 0 52 01

If I run this other query

select tenant_id, user_id, date_should_be_completed
from user_checklist_items@userchecklistitems_tenantid_userid_dateshouldbecompleted_locationname_orderitem
where 
tenant_id = 'a2a0dd49-23cf-4cf2-b823-61701c416e60' and
user_id = '01603523-c6f0-4e12-a43f-524c76b0fa8f' and 
date_should_be_completed >= '2020-10-01'
order by date_should_be_completed
limit 5

The query works, but it´s running a scan to the entire table (in this example, i´m just showing a table which contains 365 records), and after the scan is completed, it´s limiting to 5 records, which is not efficient although all the fields are indexed.

Screen Shot 2020-10-02 at 0 53 16

Screen Shot 2020-10-02 at 0 54 05

Is it possible to run a keyset pagination using a hash shard index?

Metadata

Metadata

Assignees

No one assigned

    Labels

    O-communityOriginated from the community

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions