GQL

Blog

  • Movie recommendations based on what 1st and 2nd degree connections have watched

    Graph databases are excellent for building network based recommendations systems. The main reason is that network based recommendations require some level of path traversal e.g. friend of a friend, 1st degree connection, 2nd degree connection etc. While this possible using a Relational Database, it would require writing complex recursive queries.

    Path traversal and recursion are native features of Graph Database. Let’s take a example of a Social Video Streaming service. A graph could be built as following:

    We are interested in the recommending movies based on the person’s 1st and 2nd degree connection.

    Let’s start by creating some sample data

    create graph streaming_service{
      
      node account ({name string})
      , node movie ({title string})
      , edge watched (account)-[{watched_at datetime}]->(movie)
      , edge friends (account)-[]->(account)
    };
    
    use streaming_service;
    INSERT (n1:account {_id:'fatima', name:"Fatima"})
      , (n2:account {_id:'maryam', name:"Maryam"})
      , (n3:account {_id:'uroosa', name:"Uroosa"})
      , (n4:account {_id:'sara', name:"Sara"})
      , (n5:account {_id:'zainab', name:"Zainab"})
      , (n6:account {_id:'hannah', name:"Hannah"})
      , (m1:movie {_id:'time_hoppers', title:"Time Hoppers"})
      , (m2:movie {_id:'despicable_me', title:"Despicable Me"})
      , (m3:movie {_id:'penguins_of_madagascar', title:"Penguins of Madagascar"})
      , (m4:movie {_id:'zootopia', title:"Zootopia"})
      , (n1)-[:friends]->(n2) 
      , (n2)-[:friends]->(n1) 
    
      , (n2)-[:friends]->(n3) 
      , (n3)-[:friends]->(n2) 
    
      , (n3)-[:friends]->(n4) 
      , (n4)-[:friends]->(n3) 
    
      , (n5)-[:friends]->(n6)
      , (n6)-[:friends]->(n5)
    
      , (n1)-[:watched {watched_at: '2024-01-01T20:00:00Z'}]->(m1)
      , (n1)-[:watched {watched_at: '2024-01-02T20:00:00Z'}]->(m2)
      , (n2)-[:watched {watched_at: '2024-01-01T21:00:00Z'}]->(m1)
      , (n3)-[:watched {watched_at: '2024-01-01T22:00:00Z'}]->(m3)
      , (n4)-[:watched {watched_at: '2024-01-01T23:00:00Z'}]->(m4)
      , (n5)-[:watched {watched_at: '2024-01-01T19:30:00Z'}]->(m1)
      , (n6)-[:watched {watched_at: '2024-01-01T18:30:00Z'}]->(m2);

    This will create the following graph:

    Now let’s write GQL query to get the recommendation for each person based on what their 1st and 2nd degree connections watched

    
    MATCH friends = ALL  (account_a)-[:friends]->{0,2}(account_b)
    CALL {
      MATCH path = (account_a)-[:watched]->(movie_a)
        , (account_b)-[:watched]->(movie_b)
      FILTER account_a <> account_b and movie_a <> movie_b
      return path
        , COLLECT_LIST(distinct movie_a.title) as recommended_movies
    }
    FILTER account_a <> account_b
    return table(account_b.name as recommend_to , account_a.name as based_on, recommended_movies);

    This will output

    Note that we use the inline CALL function to iterate over each connections pair.

  • GQL Validator

    Albert Verdeny has built a GQL Validator. Try it out:
    https://graphglot.com/playground

    It is pretty awesome. You can write GQL queries and see which of the 228 optional features your query requires.

    Key features:

    • Write a GQL query and instantly see if it’s valid
    • See which of the 228 optional features your query requires
    • Toggle features on/off to simulate different database engines
  • Support for GQL is coming to Google BigQuery!

    Labeled property graphs have emerged as a prominent form of knowledge representations in graph databases, seeing extensive use in a wide variety of contexts, ranging from social network analysis to fraud detection. Property Graphs and GQL are finally coming to Google BigQuery.

    In this blog post we will look at how to build Property Graphs in BigQuery and then query them using GQL. We will use our example of Social Streaming platform:

    We are interested in the recommending movies based on the person’s 1st and 2nd degree connection. Let’s start by creating some sample data:

    BigQuery Graph creates graphs from input BigQuery Tables. We will use the following tables to create the Social Streaming platform.

    CREATE OR REPLACE TABLE graph_db.account (
     account_name                STRING  NOT NULL,
     PRIMARY KEY (account_name) NOT ENFORCED
    );
    
    CREATE OR REPLACE TABLE graph_db.movie (
     movie_name                STRING NOT NULL,
     PRIMARY KEY (movie_name) NOT ENFORCED
    );
    
    
    CREATE OR REPLACE TABLE graph_db.watched (
     account_name                STRING NOT NULL,
     movie_name                  STRING NOT NULL,
     PRIMARY KEY (account_name, movie_name) NOT ENFORCED
    );
    
    
    CREATE OR REPLACE TABLE graph_db.friend (
     account_name_1                STRING NOT NULL,
     account_name_2                  STRING NOT NULL,
     PRIMARY KEY (account_name_1, account_name_2) NOT ENFORCED
    );

    Now we will insert some graph data into above tables. You can use your familiar SQL INSERT statements to insert data into the input tables.

    insert into graph_db.account values ('Uroosa'), ('Sara'), ('Maryam'), ('Fatima'), ('Zainab'), ('Hannah');
    
    insert into graph_db.movie values ('Penguins of Madagascar'), ('Zootopia'), ('Time Hoppers'), ('Despicable Me');
    
    insert into graph_db.watched values 
      ('Uroosa', 'Penguins of Madagascar')
      , ('Sara', 'Zootopia')
      , ('Maryam', 'Time Hoppers')
      , ('Fatima', 'Time Hoppers')
      , ('Zainab', 'Time Hoppers')
      , ('Fatima', 'Despicable Me')
      , ('Hannah', 'Despicable Me');
    
    insert into graph_db.friend values 
      ('Sara', 'Uroosa')
      , ('Uroosa', 'Maryam')
      , ('Maryam', 'Fatima')
      , ('Zainab', 'Hannah');

    BigQuery Graph supports the property graph data model with nodes and edges annotated with labels and properties. 

    In this Social Streaming graph schema, we will create 2 types of nodes and 2 types of edges based on the relational input tables defined previously:

    We define 2 types of nodes based on 2 input tables:

    • Account node, based on Account table
    • Movie node, based on Movie table

    Edges

    We define 2 types of edges (relationships) between nodes:

    • Watched edge, based on Watched table, connecting Account node to Movie node.
    • Friend edge, based on Friend table, connecting Account node to another Account node.
    CREATE or replace PROPERTY GRAPH graph_db.streaming_network_graph
     NODE TABLES (
       graph_db.account
         KEY (account_name)
         LABEL account
         PROPERTIES (account_name),
       graph_db.movie
         KEY (movie_name)
         LABEL movie
         PROPERTIES (movie_name),
     )
     EDGE TABLES(
       graph_db.watched
         KEY (account_name, movie_name)
         SOURCE KEY (account_name) REFERENCES account (account_name)
         DESTINATION KEY (movie_name) REFERENCES movie (movie_name)
         LABEL watched
         PROPERTIES (account_name, movie_name),
       graph_db.friend
         KEY (account_name_1, account_name_2)
         SOURCE KEY (account_name_1) REFERENCES account (account_name)
         DESTINATION KEY (account_name_2) REFERENCES account (account_name)
         LABEL friend
         PROPERTIES (account_name_1, account_name_2),     
     );

    Once your Property Graph is created, you can find a new property graph object added to your dataset where you created the graph in the BigQuery Console:

    The graph can also be visualized using the BigQuery Notebook:

    %%bigquery --graph
    GRAPH graph_db.streaming_network_graph
    MATCH (a)-[e]->(b)
    RETURN
    TO_JSON(a) AS c1, TO_JSON(b) AS c2, TO_JSON(e) AS c3;

    Now let’s use GQL to query the graph to get all Movie Recommendations for each Account based on what the the 1st and the 2nd Degree connections watched.

    GRAPH graph_db.streaming_network_graph
    MATCH ALL (based_on:account)-[:friend]-{0,2}(recommend_to:account)-[:watched]-(movie_b:movie)
    , (based_on)-[:watched]->(movie_a)
    FILTER movie_a <> movie_b and based_on <> recommend_to
    RETURN distinct recommend_to.account_name as recommend_to
      , based_on.account_name as based_on
      , STRING_AGG(distinct movie_a.movie_name) as recommended_movies
    group by recommend_to, based_on;

  • Recursive Queries in SQL vs. GQL

    SQL Recursive CTE queries in are commonly used when working with hierarchical data in Relational Databases. For e.g. the following Highway Network:

    Let’s say we are interested in getting all the path from each Highway to other Highways. We would use a Recursive CTE for that. Let’s create some sample data and write a Recursive CTE Queries to extract all the PATHs.

    create table flows_to (
        highway_from varchar(100)
        , highway_to varchar(100)
    );
    
    insert INTO FLOWS_TO (highway_from, highway_to) VALUES
    (null, 'highway_1'), -- starting node
    (null, 'highway_2'), -- starting node
    (null, 'highway_9'), -- starting node
    (null, 'highway_11'), -- starting node
    ('highway_1', 'highway_3'),
    ('highway_2', 'highway_3'),
    ('highway_3', 'highway_4'),
    ('highway_3', 'highway_6'),
    ('highway_4', 'highway_5'),
    ('highway_6', 'highway_7'),
    ('highway_7', 'highway_8'),
    ('highway_5', 'highway_8'),
    ('highway_9', 'highway_10'),
    ('highway_10', 'highway_12'),
    ('highway_8', 'highway_12'),
    ('highway_11', 'highway_12'),
    ('highway_12', 'highway_13');

    Now let’s construct a Recursive CTE for this:

    WITH  outcome(starting, highway_to, highway_from, path) AS (
    
      -- ANCHOR: Initial query to establish the base result set. 
      SELECT -- Anchor 
        highway_to as starting
        , highway_to
        , highway_from
        , highway_to as path
      FROM flows_to
      WHERE highway_from is null
      
      UNION ALL
      
      -- RECURSIVE: References outcome CTE to build on previous results
      SELECT starting, flows_to.highway_to, flows_to.highway_from, outcome.path || '->' || flows_to.highway_to
      FROM outcome
      JOIN flows_to ON flows_to.highway_from = outcome.highway_to
    )
    SELECT DISTINCT starting,  highway_to, path FROM outcome;

    The above query will output:

    Sample of the PATHs returned by Rercursive CTE

    While the recursive CTE is not complex, it is not trivial to write.

    Enter GQL.

    Hierarchical data is a Graph. As such a Graph Query Language (GQL) may be a more elegant way to extract the above Paths. GQL, like SQL, is a high-level, declarative, programming language. It is specifically designed for graph structures. It uses nodes and relations as first-class citizens, although the output to a query can be a graph or a set of tuples.

    Let’s explore using the same dataset and query using GQL. We will start with generating sample data

    drop graph freeways;
    create graph freeways{
      node freeway ({name string})
      , edge flows_to (freeway)-[]->(freeway)
    };
    
    
    use freeways;
    INSERT (n1:freeway {_id:'highway_1', name:"Highway 1"})
      , (n2:freeway {_id:'highway_2', name:"Highway 2"})
      , (n3:freeway {_id:'highway_3', name:"Highway 3"})
      , (n4:freeway {_id:'highway_4', name:"Highway 4"})
      , (n5:freeway {_id:'highway_5', name:"Highway 5"})
      , (n6:freeway {_id:'highway_6', name:"Highway 6"})
      , (n7:freeway {_id:'highway_7', name:"Highway 7"})
      , (n8:freeway {_id:'highway_8', name:"Highway 8"})
      , (n9:freeway {_id:'highway_9', name:"Highway 9"})
      , (n10:freeway {_id:'highway_10', name:"Highway 10"})
      , (n11:freeway {_id:'highway_11', name:"Highway 11"})
      , (n12:freeway {_id:'highway_12', name:"Highway 12"})
      , (n13:freeway {_id:'highway_13', name:"Highway 13"})
      , (n1)-[:flows_to]->(n3) 
      , (n2)-[:flows_to]->(n3) 
      , (n3)-[:flows_to]->(n4) 
      , (n3)-[:flows_to]->(n6) 
      , (n4)-[:flows_to]->(n5) 
      , (n6)-[:flows_to]->(n7) 
      , (n5)-[:flows_to]->(n8) 
      , (n7)-[:flows_to]->(n8) 
      , (n9)-[:flows_to]->(n10) 
      , (n11)-[:flows_to]->(n12) 
      , (n8)-[:flows_to]->(n12) 
      , (n10)-[:flows_to]->(n12) 
      , (n12)-[:flows_to]->(n13);

    Next let’s write a GQL query to extract all the PATHs. Note that in the case of GQL, to compute and list the paths, it suffices to write:

    MATCH nodes = (a)
    CALL {
      MATCH path = (a)-[edges:flows_to]->{0,}(b)
      return reduce(path = a._id, edge in edges | path || '->' || edge._to) as route
      , b
    }
    return table(a.name, b.name, route);

    This will output

    Sample of the PATHs returned by the GQL.

    It can be seen that the structure of the GQL query is far less complicated and more intuitive than its SQL counterpart, since it takes advantage of the graph structure. In this case, a basic MATCH.. RETURN structure suffices to express a recursive query. This is mainly because GQL is developed as a query language for
    graphs and recursion is typical in these cases. The(a:highway)-[:flows_to]->{0,}(b:highway) path pattern selects all nodes b that are reachable by following one or more edges in the graph, traversing the graph using the :flows_to relation.

    Path patterns describe multi-hop relationships in the graph:
    {2,4}: Exactly 2 to 4 hops
    {0,}: 0 or more hops (unbounded)
    {,3}: Up to 3 hops
    {5}: Exactly 5 hops

    SQL CONNECT BY clause.

    Some SQL based databases support CONNECT BY Clause. This clause allows queries on hierarchical data without the need for composing complex recursive CTEs. For e.g. the following CONNECT BY clause query will find all the paths from each to all others nodes

    SELECT
        level,
        highway_from,
        SYS_CONNECT_BY_PATH(highway_from, '->') || '->' || highway_to  AS path,
        highway_to
    FROM
        FLOWS_TO
    START WITH
        highway_from IS NULL
    CONNECT BY
        PRIOR highway_to = highway_from;

    While this query is easier to read than its recursive CTE counterpart, it is not intuitive as GQL…

  • Association Rule Mining using GQL

    What is the most frequently bought item with Infant Formula and Infant Diapers? Association Rule Mining is a key to Market Basket Analysis. While Association Rule Mining is a complex topic, we can use SQL to figure out what items most frequently bought together.

    An association rule is a implication expression of the form X → Y, where X and Y are itemset. X is the Antecedent and Y is the Consequent

    Example of Association Rules

    {Infant Diaper} → {Coke},
    {Milk, Bread} → {Eggs, Coke},
    {Coke, Bread} → {Infant Milk}
    Antecedent → Consequent

    Now let’s assume we take {Infant Diaper, Infant Milk} as the Antecedent, and we want to figure out the Consequent i.e. what ITEM is most often bought with Infant Diaper and Infant Milk.

    Association Rule are, by definition, extracted from the data i.e. they are core properties of the Data. We are not looking for correlation in the Item sets in a Rule. This makes Graph Databases a good platform for Association Rule Mining. While there are lot of Python and R packages that are designed for Association Rule Mining, we want to start exploring the use of GQL to perform Exploratory Data Mining exercise.

    We will start with the following sample Orders and the Items contained in those Orders

    ORDER_NUMBERITEMS
    3infant formula, infant diaper, coffee, coke
    7infant formula, infant diaper, mountain dew, coke, coffee
    1bread, infant formula
    4bread, infant formula, infant diaper, coffee
    6infant formula, infant diaper, mountain dew
    5infant formula, infant diaper, coke
    2bread, infant diaper, coffee, eggs

    We are interested in Items that are bought together with Infant Diaper and Infant Milk.

    Let’s start by creating the dataset

    create graph order_items {
      node orders ({order_number string})
      , node item ({description string})
      , edge includes (orders)-[]->(items)
    };
    
    use graph order_items;
    insert (n_o1:orders {_id: "order_1", order_number:'1'})
      , (n_o2:orders {_id: "order_2", order_number:'2'})
      , (n_o3:orders {_id: "order_3", order_number:'3'})
      , (n_o4:orders {_id: "order_4", order_number:'4'})
      , (n_o5:orders {_id: "order_5", order_number:'5'})
      , (n_o6:orders {_id: "order_6", order_number:'6'})
      , (n_o7:orders {_id: "order_7", order_number:'7'})
      , (n_infant_formula:item {_id: 'infant_formula', description:'infant formula'})
      , (n_infant_diaper:item {_id: 'infant_diaper', description:'infant diaper'})
      , (n_coffee:item {_id: 'coffee', description:'coffee'})
      , (n_coke:item {_id: 'coke', description:'coke'})
      , (n_mountain_dew:item {_id: 'mountain_dew', description:'mountain dew'})
      , (n_bread:item {_id: 'bread', description:'bread'})
      , (n_eggs:item {_id: 'eggs', description:'eggs'})
      , (n_o1)-[:includes]->(n_infant_formula)
      , (n_o1)-[:includes]->(n_bread)
      , (n_o2)-[:includes]->(n_bread)
      , (n_o2)-[:includes]->(n_infant_diaper)
      , (n_o2)-[:includes]->(n_coffee)
      , (n_o2)-[:includes]->(n_eggs)
      , (n_o3)-[:includes]->(n_infant_formula)
      , (n_o3)-[:includes]->(n_infant_diaper)
      , (n_o3)-[:includes]->(n_coffee)
      , (n_o3)-[:includes]->(n_coke)
      , (n_o4)-[:includes]->(n_bread)
      , (n_o4)-[:includes]->(n_infant_formula)
      , (n_o4)-[:includes]->(n_infant_diaper)
      , (n_o4)-[:includes]->(n_coffee)
      , (n_o5)-[:includes]->(n_infant_formula)
      , (n_o5)-[:includes]->(n_infant_diaper)
      , (n_o5)-[:includes]->(n_coke)
      , (n_o6)-[:includes]->(n_infant_formula)
      , (n_o6)-[:includes]->(n_infant_diaper)
      , (n_o6)-[:includes]->(n_mountain_dew)
      , (n_o7)-[:includes]->(n_infant_formula)
      , (n_o7)-[:includes]->(n_infant_diaper)
      , (n_o7)-[:includes]->(n_mountain_dew)
      , (n_o7)-[:includes]->(n_coke)
      , (n_o7)-[:includes]->(n_coffee)
    ;

    Let’s use GQL to query the Items that are bought with Infant Diaper and Infant Formula the most.

    match (o:orders)-[:includes]->(third_item:item)
    CALL (third_item) {
      match (o:orders)-[:includes]->(i1:item {_id:'infant_formula'})
        , (o)-[:includes]->(i2:item {_id:'infant_diaper'})
        , (o)-[:includes]->(third_item)
      return third_item, count(o) as counter  
    }
    filter counter > 0
    return distinct table(third_item.description, counter)
    order by counter desc;

    Notice the use of CALL statement. The CALL statement is used to invoke an inline procedure or subquery. An inline procedure is a user-defined procedure embedded within a query, commonly used to execute subqueries or perform data modifications. It enables complex logic such as looping. In short, CALL statement enables modular query design and the invocation of complex logic in a graph query.

    In the above GQL query we first get a list of all items and then loop through each of them using the CALL statement to count the number of Orders where that item is included along with Infant Formula and Infant Diaper.

    This query will output:

    Based on the above query, we can extract the following Association Rule:

    {Infant Milk, Infant Diaper} → {Coke}
    {Infant Milk, Infant Diaper} → {Coffee}

    This doesn’t mean that there is correlation between purchase of Infant Milk, Infant Diaper and Coffee. This is just a property if of the Data.

    Alternate way is to the GROUP BY clause in GQL:

    match ALL (o:orders)-[:includes]->(i1:item {_id:'infant_formula'})
      , (o)-[:includes]->(i2:item {_id:'infant_diaper'})
      , (o)-[:includes]->(i3:item)
    return table(i1.description, i2.description, i3.description, count(distinct o))
    group by i3.description;

    Both the CALL function and the GROUP BY will produce the same results in this case. However CALL function is more versatile allows for complex MATCH statement as part of the sub-query.

  • Aggregating Edge Weights in GQL

    Often times we have to aggregate the weights along paths (edges) in a Graph. GQL provides a REDUCE function for this. We will use Ultimate Beneficial Ownership (UBO) graph to demonstrate this.

    A beneficial owner is defined as a physical person “who directly or indirectly owns or controls a sufficient part of the shares or the voting rights, or who practices control through other means.”.

    If a company is owned by another company (e.g. a holding company), it is the physical person(s) who ultimately own(s) the company who shall be registered as beneficial owner(s). Identifying Ultimate Beneficial Ownership (UBO) can be challenging due to complex and opaque ownership structures. But Property Graphs and GQL make this process easier.

    Let’s take a look at the following Beneficial Ownership chart

    Let’s create a the dataset based on the above Beneficial Ownership chart:

    
    create graph ubo_graph {
      node person ({name string})
      , node business ({name string})
      , edge owns ()-[{percentage float}]->()
    };
    
    use ubo_graph;
    
    insert 
      (n_uroosa:person {_id:'uroosa', name:'Uroosa'})
      , (n_fatima:person {_id:'fatima', name:'Fatima'})
      , (n_maryam:person {_id:'maryam', name:'Maryam'})
      , (n_zainab:person {_id:'zainab', name:'Zainab'})
      , (n_hannah:person {_id:'hannah', name:'Hannah'})
      , (n_acme_corp:business {_id:'acme corp', name:'Acme Corp.'})
      , (n_northwind:business {_id:'northwind', name:'Northwind'})
      , (n_fourth_cofee:business {_id:'fourth coffee', name:'Fourth Coffee'})
      , (n_tasmanian_traders:business {_id:'tasmanian traders', name:'Tasmanian Traders'})
      , (n_coffee_corp:business {_id:'coffee corp', name:'Coffe Corp.'})
      , (n_blue_yonder:business {_id:'blue yonder', name:'Blue Yonder'})
      , (n_hotel_kitchen_sink:business {_id:'hotel kitchen sink', name:'Hotel Kitchen Sink'})
      , (n_uroosa)-[e1:owns {percentage:0.80}]->(n_acme_corp)
      , (n_uroosa)-[e2:owns {percentage:0.15}]->(n_hotel_kitchen_sink)
      , (n_fatima)-[e3:owns {percentage:0.20}]->(n_acme_corp)
      , (n_zainab)-[e4:owns {percentage:0.19}]->(n_fourth_cofee)
      , (n_maryam)-[e5:owns {percentage:1.0}]->(n_coffee_corp)
      , (n_acme_corp)-[e6:owns {percentage:0.30}]->(n_fourth_cofee)
      , (n_northwind)-[e7:owns {percentage:0.09}]->(n_hotel_kitchen_sink)
      , (n_tasmanian_traders)-[e8:owns {percentage:0.49}]->(n_blue_yonder)
      , (n_coffee_corp)-[e9:owns {percentage:0.51}]->(n_blue_yonder)
      , (n_blue_yonder)-[e10:owns {percentage:0.51}]->(n_fourth_cofee)
      , (n_fourth_cofee)-[e11:owns {percentage:0.51}]->(n_hotel_kitchen_sink)
      , (n_hannah)-[e12:owns {percentage:0.25}]->(n_hotel_kitchen_sink)
    ;

    Now let’s see how the REDUCE function can be used to calculate the aggregate along the path. We will calculate the Uroosa’s total Ownership for the Hotel Kitchen Sink. Notice that Uroosa directly owns a share of the Hotel Kitchen Sink and also through a separate business entities (Acme Corp. and Fourth Coffee). We will need to account for both of these paths:

    We can use the following GQL to Calculated the Ownership Percentage along each Path:

    match path = all (a:person {_id:'uroosa'})
      -[owns]
      ->{0,} (b:business {_id:'hotel kitchen sink'})
    return path, 
      reduce(total_ownership = 1 , own in owns | total_ownership * own.percentage);
    

    Notice the use of the REDUCE function to calculate the total_ownership. This will output:

    This corresponds to 0.8 x 0.3 x 0.51 = 0.1224 through Acme Corp. and Fourth Coffee and direct share of 0.15 in Hotel Kitchen Sink.

    We can also use the Linear Composition in GQL to sum up the Ownership percentages (0.15 + 0.1224) as following

    match path = all (a:person {_id:'uroosa'})-[owns]->{0,} (b:business {_id:'hotel kitchen sink'})
    return path, 
    reduce(total_ownership = 1 , own in owns | total_ownership * own.percentage) as ownership
    next 
    return sum(ownership)
    ;

    Notice the use of NEXT to formulate the Linear Statement Composition. This will output:

  • Decomposing path into steps in Property Graphs – Part 2

    In the post titled, , we looked at how we can unnest / flatten a path in Graph using GQL. In this post we will look at the how to identify each match when there are multiple paths . Let’s start with our basic Social Network example.

    We are interested in path(s) from Saqib to Uroosa. Notice that are two paths for that 1) Saqib -> Angela -> Scott -> Uroosa; 2) Saqib -> Fatima -> Eleni -> Uroosa. We are also interested in the exact nature of the relationship i.e. co-worker vs. friend.

    Let’s start by create a Graph that represents that above relationships

    
    create graph social_graph{
    
     node person ({name string})
    , edge relationship (person)-[{type string}]->(person)
    };
    
    use social_graph;
    
    
    INSERT (n1:person {_id:'saqib', name:"Saqib"})
      , (n2:person {_id:'angela', name:"Angela"})
      , (n3:person {_id:'scott', name:"Scott"})
      , (n4:person {_id:'uroosa', name:"Uroosa"})
      , (n5:person {_id:'fatima', name:"Fatima"})
      , (n6:person {_id:'eleni', name:"Eleni"})
      , (n1)-[link1:relationship {type:"friend"}]->(n2)
      , (n2)-[link2:relationship {type:"co-worker"}]->(n3)
      , (n3)-[link3:relationship {type:"friend"}]->(n4)
      , (n1)-[link4:relationship {type:"co-worker"}]->(n5)
      , (n5)-[link5:relationship {type:"friend"}]->(n6)
      , (n6)-[link6:relationship {type:"friend"}]->(n4)
    

    Next use the pnodes function to get the two possible PATH(s) as JSON

    use social_graph;
    MATCH path = all SHORTEST  (a {_id:'saqib'})-[links]->{1,4}(b {_id:'uroosa'})
    return pnodes(path);
    

    This will output:

    ------------ path 1    
    [{"schema":"person","id":"saqib","uuid":"11024814086826229762","values":{"name":"Saqib"}},{"schema":"person","id":"angela","uuid":"7205761602816049155","values":{"name":"Angela"}},{"schema":"person","id":"scott","uuid":"2666133178426589188","values":{"name":"Scott"}},{"schema":"person","id":"uroosa","uuid":"14771808976798482437","values":{"name":"Uroosa"}}]
        
    ------------ path 2
    [{"schema":"person","id":"saqib","uuid":"11024814086826229762","values":{"name":"Saqib"}},{"schema":"person","id":"fatima","uuid":"6413128068398841862","values":{"name":"Fatima"}},{"schema":"person","id":"eleni","uuid":"9583662206067671047","values":{"name":"Eleni"}},{"schema":"person","id":"uroosa","uuid":"14771808976798482437","values":{"name":"Uroosa"}}]

    Now let’s see look at how to get the each hop as row and also get the relationship type:

    MATCH path = all SHORTEST  (a {_id:'saqib'})-[links]->{1,}(b {_id:'uroosa'})
    for link in links WITH ORDINALITY hop_number
    return table(link._from, link._to, link.type, hop_number, pnodeIds(path), PATH_LENGTH(path));
    

    This will output:

    Notice that the output contains two possible paths. pnodeIds are Path Identifiers. hop_number indicates the ordinal hop number in the path.

  • Decomposing path into steps in GQL

    When working with Property Graphs, resolving the data path aka. decomposing Graph path is key to understanding the path traversed by a MATCH query in GQL. To facilitate this unnesting / flattening of path data, GQL provides few options. We will look at two options and apply them to following Graph to unnest the SHORTEST path between Alice and Diana i.e. Alice -> Chalie -> Diana

    Let’s start by creating a sample Sample Graph:

    create graph social_network {
      NODE PERSON ({name STRING, age int}), 
      EDGE FOLLOWS ()-[]->()
    };
    use graph social_network;
    
    SHOW NODE SCHEMA;
    SHOW EDGE SCHEMA;
    INSERT (n1:PERSON {_id:'alice', name: "Alice", age:25})
      , (n2:PERSON {_id:'bob', name: "Bob", age:30})
      , (n3:PERSON {_id:'diana', name: "Diana", age:28})
      , (n4:PERSON {_id:'charlie', name: "Charlie", age:32});
    MATCH (n1:PERSON {_id:'alice'})
      , (n2:PERSON {_id:'bob'})
      , (n3:PERSON {_id:'diana'})
      , (n4:PERSON {_id:'charlie'})
    INSERT (n1)-[e1:FOLLOWS]->(n2)
      , (n2)-[e2:FOLLOWS]->(n1)
      , (n2)-[e3:FOLLOWS]->(n4)
      , (n1)-[e4:FOLLOWS]->(n4)
      , (n3)-[e5:FOLLOWS]->(n4)
      , (n4)-[e6:FOLLOWS]->(n3)
      , (n3)-[e7:FOLLOWS]->(n2) 
    RETURN n1, n2, n3, n4;
    

    First let’s use pnodes function to get the unnested path as a JSON.

    MATCH paths =  SHORTEST 1 (n1 {_id:'alice'})-[links]->{0,4}(n2 {_id:'diana'}) 
    LET length = PATH_LENGTH(paths)
    RETURN pnodes(paths);

    This will output a JSON array as following:

        [{"schema":"PERSON","id":"alice","uuid":"5116091375716139010","values":{"age":25,"name":"Alice"}},{"schema":"PERSON","id":"charlie","uuid":"11961562809319292933","values":{"age":32,"name":"Charlie"}},{"schema":"PERSON","id":"diana","uuid":"16789421609860464644","values":{"age":28,"name":"Diana"}}]

    Another way to UNNEST is to produce one row per hop. We can use the FOR loop for this

    MATCH paths =  (n1 {_id:'alice'})-[links]->{0,4}(n2 {_id:'diana'})
    FOR link in links
    RETURN link;

    This will output:

    Notice that we used Variable Length Path Pattern {0,4} in the aforementioned query. GQL supports the following Variable Length Path Patterns:

    QuantifierDescription
    {n}Exactly n
    {n, m}Between n and m (inclusive)
    {, m}Between 0 and m (inclusive)
    {m,} m (inclusive) or higher
  • Decomposing PATH into Steps using GQL

    Returning one or more paths during a graph traversal is a central feature of regular path queries in the GQL. To illustrate we will use the following example of a travel network:

    We are interested in all the SHORTEST PATHS available from Bayreuth to Santiago.

    Let’s start by creating a sample dataset

    create graph travel_network {
      node city ({name string, country string})
      , edge train (city)-[{duration int}]->(city) -- duration in minutes
      , edge flight (city)-[{code string, duration int}]->(city) -- duration in minutes
      , edge car (city)-[{tolls float}]->(city)
    };
    
    use travel_network;
    
    -- Create City nodes
    INSERT 
      (n_bayreuth:city {_id: 'Bayreuth', name: 'Bayreuth', country: 'Germany'})
      , (n_nürnberg:city {_id: 'Nürnberg', name: 'Nürnberg', country: 'Germany'})
      , (n_stuttgart:city {_id: 'Stuttgart', name: 'Stuttgart', country: 'Germany'})
      , (n_frankfurt:city {_id: 'Frankfurt', name: 'Frankfurt', country: 'Germany'})
      , (n_paris:city {_id:'Paris', name: 'Paris', country: 'France'})
      , (n_amsterdam:city {_id:'Amsterdam', name: 'Amsterdam', country: 'Netherlands'})
      , (n_rome:city {_id:'Rome', name: 'Rome', country: 'Italy'})
      , (n_santiago:city {_id:'Santiago', name: 'Santiago', country: 'Chile'})
      , (n_bayreuth)-[:train]->(n_nürnberg)
      , (n_nürnberg)-[:train]->(n_bayreuth)
      , (n_nürnberg)-[:train {duration: 120}]->(n_stuttgart)
      , (n_nürnberg)-[:train]->(n_frankfurt)
      , (n_frankfurt)-[:train]->(n_paris)
      , (n_stuttgart)-[:train]->(n_paris)
      , (n_paris)-[:flight {code: 'AF406', duration: 780}]->(n_santiago)
      , (n_amsterdam)-[:flight]->(n_santiago)
      , (n_amsterdam)-[:flight]->(n_rome)
      , (n_rome)-[:car {tolls: 74}]->(n_bayreuth)
      , (n_frankfurt)-[:train]->(n_amsterdam)
       
    ;

    We can use the following GQL to get all the possible SHORTEST PATHS.

    match paths = all shortest  (a:city {_id:'Bayreuth'})-[edges:train|flight]->{1,}(b:city {_id:'Santiago'})
    for edge in edges
    return distinct table(edge._from, edge._to, labels(edge), pnodeIds(paths), length(paths));

    This will return each of the path decomposed into the each hop (step) as following

    In the above GQL Query a is matched to the city of Bayreuth, and b to Santiago city. The PATH(s) are all possible combinations of the edges between between a and b which minimize the number of hops. In our case 4 is the minimum number of hops and there 3 possible PATHs that qualify.

  • GQL Playground

    We are working on a interactive GQL Playground. The GQL Playground will provide interactive learning experience with guided examples. Stay tuned!