Skip to content

[Question] how to make funnel query run faster? #2096

@lamberken

Description

@lamberken

hi, I need to use clickhouse to funnels query

Data

  • the table trajectory_trackless_complex has 2894082787 rows
  • pageid: 2538 --> 3242 --> 1682, ( homepage --> listpage --> detailpage )

Use sequenceMatch, throw error

SELECT 
    sum(step1_condition) AS home, 
    sum(step2_condition) AS list, 
    sum(step3_condition) AS detail
FROM 
(
    SELECT 
        loginkey, 
        max(pageid = 2538) AS step1_condition, 
        sequenceMatch('(?1).*(?2)')(toDateTime(createtime), pageid = 2538, pageid = 3242) AS step2_condition, 
        sequenceMatch('(?1).*(?2).*(?3)')(toDateTime(createtime), pageid = 2538, pageid = 3242, pageid = 1682) AS step3_condition
    FROM datacenter.trajectory_trackless_complex 
    WHERE intotime = '2018-03-01'
    GROUP BY loginkey
) 

↗ Progress: 875.45 million rows, 12.42 GB (76.01 million rows/s., 1.08 GB/s.) ██████████████████████████████████████████▊ 98%Received exception from server (version 1.1.54342):
Code: 160. DB::Exception: Received from hadoo:9000, 10.12.180.114. DB::Exception: Pattern application proves too difficult, exceeding max iterations (1000000). 

Query sql

SELECT 
    sum(step1_time != 0) AS home, 
    sum(step2_time != 0) AS list, 
    sum(step3_time != 0) AS detail
FROM 
(
    SELECT 
        loginkey, 
        groupArray(pageid) AS events, 
        groupArray(createtime) AS events_times, 
        arrayFilter((time, name) -> (name = 2538), events_times, events)[1] AS step1_time, 
        arrayFilter((time, name) -> ((name = 3242) AND (time >= step1_time) AND (step1_time != 0)), events_times, events)[1] AS step2_time, 
        arrayFilter((time, name) -> ((name = 1682) AND (time >= step2_time) AND (step2_time != 0)), events_times, events)[1] AS step3_time
    FROM 
    (
        SELECT 
            pageid, 
            createtime, 
            loginkey
        FROM datacenter.trajectory_trackless_complex 
        WHERE pageid IN (2538, 3242, 1682)
        ORDER BY createtime ASC
    ) 
    GROUP BY loginkey
) 

Execute result

it uses 377sec
image


Question

how to optimize the query ?
Thank you very match

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions