# General Graph Algorithm

`algo(khop_all)`

** Advanced **

Khop_all is a whole-graph algorithm that does K-hop against all nodes of the entire graph, and returns the total number of k-hop neighbors found for each node.

Obviously, this algorithm is rather an expensive operation, especially on large graphs. Khop_all will be executed as a task (asynchronous execution mode, it will be pipelined if there are other tasks before it).

Configuration items for whole graph K-hop operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<depth>` |
int | >0 | A reasonable value, e.g., starting from 1 and grow incrementally |

`<direction>` |
string | 'left' or 'right' | (Optional) To search paths where all edges are in the same direction along the path, either left or right, or ignore edge direction if not configured |

Validity of `write_back()`

:

Approach | Destination |
---|---|

To database | node property `#khop_all` |

To disk file | RPC port |

`Ultipa Graph has optimized regular node-based K-Hop and whole graph k_hop with full parallelization, in most graphs, 1-hop takes microseconds to complete, 2-hop and deeper hops take milliseconds. However, on large graphs (having hundreds of millions of nodes/edges and many hotspots, the whole-graph k-hop can take longer time to complete. The computational complexity is astronomical, and here is the simple math: Assuming we have a graph with (# of edges / # of nodes) = 100, and doing just 1 node’s 5-hop takes: 100*100*100*100*100 = 10B in terms of complexity, if we are doing this against the entire graph’s 10M nodes, this is 10B * 10M, if doing 1 node’s 5-hop takes 100ms (0.1s), doing 10M nodes’ 5-hop takes: 0.1 * 10M = 1M seconds ~= 12 Days(& nights). The idea here is that: before you do whole-graph 5-hop (or deeper), think again!`

Example 1: Whole graph 2-hop:

```
algo(khop_all).params({ depth: 2 })
```

`algo(connected_component)`

** Basic **

A connected component (shortened as CC hereafter) is a maximal subset of interconnected nodes and edges of a graph. If one CC of a graph includes all the nodes and edges of the graph, that means the number of CCs of the graph is one, in which case the graph is also said to be connected that every pair of nodes in it have path(s) connecting them.

There are two types of CCs in terms of connection direction, WCC vs. SCC:

- WCC: Weakly Connected Component, where the edge direction is ignored when judging whether two nodes are connected.
- SCC: Strongly Connected Component, where a pair of nodes are considered connected only when the paths connecting them are bidirectional (meaning for any pair of node {u, v}, there is a path from u→v, and a path from v→u).

Connected component algorithm in Ultipa Graph counts the total number of connected components in the graph and returns which nodes belong to which CC.

Configuration item for whole connected component operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<cc_type>` |
int | 1; 2 | 1: WCC (weakly connected) 2: SCC (strongly connected) |

Validity of `write_back()`

:

Approach | Destination |
---|---|

To database | node property `#connected_component` |

To disk file | / |

Example 1: Calculate WCC of a garph

```
algo(connected_component).params({ cc_type: 1 })
```

Apply this uQL command to a graph with below structure:

*Figure: A dis-connected Graph*

And acquire below results:

*Figure: Running Weakly Connected Component*

`algo(triangle_counting)`

** Basic **

Triangle counting algorithm counts how many triangles are there in the graphset and returns the nodes or edges forming each trangle.

Triangle counting is an algorithm that's used to determine and gauge the stability of a graph, and is often used to construct more sophisticated graph algorithms. It's widely applied to measure the cohesiveness of communities within social networks, to measure how closely connected are the accounts (account holders) to each other in a banking system, etc.

Configurations item for triangle counting operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<type>` |
int | 1; 2 | 1: Count by edge 2: Count by node |

`<limit>` |
int | >0; -1 | The maximum number of triangles to return; -1: only return the total count of triangles without listing their nodes/edges |

Validity of `write_back()`

:

Approach | Destination |
---|---|

To database | / |

To disk file | RPC port |

It’s imperative to understand the difference between the `<type>`

values:

- Count by edge: Each triangle is defined by its edges.
- Count by node: Each triangle is defined by its nodes.

Obviously, a triangle defined by nodes will be counted as different triangles when being defined by edges if there are multiple edges between any two of its nodes. In other words, 'count by node' will churn out a number smaller than 'count by edge'.

To best illustrate this, check out the below screenshot:

*Figure: Triangle Counting Illustrated*

When aggregating by nodes, there are only 2 triangles; on the other hand, there are 5 triangles if we aggregate by edges (1 + 1*2*2).

It’s worthwhile to point out that some graph databases can NOT count and aggregate triangles by edges, which just misses the point of a graph database. Mathematically, a unique triangle is determined by 3 circularly connected edges sharing endpoints with each other.

Example: Launch Triangle Counting by edge against the graph in the previous illustration

```
algo(triangle_counting).params({type: 1, limit: 20}).write_back()
```

*Figure: The 2 Modes of Triangle Counting (Note the Difference)*

`algo(common_neighbours)`

** Real-time **

Common Neighbors calculates the shared (common) neighbors (1-hop) between two nodes, returns both the total number and the neighbors list.

Configuration items for Common Neighbors operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<node_id1>` |
int | Ultipa ID | / |

`<node_id2>` |
int | Ultipa ID | / |

Validity of `write_back()`

:

Not supported.

Example: Calculate the common neighbors of node 12 and 13

```
algo(common_neighbours).params({node_id1: 12, node_id2: 13})
```

*Figure: Calculating Common Neighbors*

To validate the accuracy of Common Neighbors, use template query to the rescue - calculate 2-hop paths from node 12 to 13 and use `$count_distinct`

to count the distinctive in-between nodes:

```
t(p).n(12).e().n(a).e().n(13).limit(10000).return({$count_distinct: a})
```

*Figure: Using Template Query to Validate Common Neighbors*

`algo(subgraph)`

** Real-time **

Subgraph algorithm calculates the induced subgraph based on the given subset of nodes, by returning all the edges whose endpoints both belong to the given subset of nodes.

Configuration item for Subgraph operation:

Item | Data Type | Specification | Description |
---|---|---|---|

`<node_ids>` |
[]int | Ultipa ID | The subset of nodes |

Validity of `write_back()`

:

Approach | Destination |
---|---|

To database | / |

To disk file | RPC port |

Example: Calculate the induced subgraph using nodes [1,7,8,9]

```
algo(subgraph).params({node_ids: [1,7,8,9]})
```

Run this example in uQL CLI:

*Figure: Running Subgraph Algorithm in CLI*

The screenshot below shows how subgraph algorithm is executed with Algos module in Ultipa Manager:

*Figure: Running Subgraph Algorithm with Algos*

Subgraph algorithm by definition is equivalent to running a 1-hop `autoNet()`

query with the given nodes as sources. However, exception occurs when there are self-loops (edges starting and pointing to the same nodes) on the given nodes, and these self-loops will not be calculated by the 1-hop `autoNet()`

query. The reason is that `autoNet()`

does not calculate cycles, which are the paths whose source and destination are identical. Compare the difference by running `autoNet()`

query in Ultipa Manager:

```
autoNet().srcs(1,7,8,9).depth(1).limit(100).select(*)
```

*Using autoNet() to Validate Subgraph Algorithm Calculation Correctness*

## No Comments