wrote:
This is long, so bear with me.
I am trying to come up with a way to model a general graph in SQL
which does not use the adjacency list model. Her is what I have so
far and I'd like to get any ideas on it.
What if we attempt to store all the paths in a directed graph in a
single table in an RDBMS? The table for this would look like this,
with all the edges directed downward (pardon the ASCII artwork):
A
/ \
B C
>\ |
F D
|
E
So start with a simple table of paths.
CREATE TABLE Path
(path_nbr INTEGER NT NULL,
step_nbr INTEGER NT NULL
CHECK (path_nbr >= 0),
node_id CHAR(1) NT NULL,
PRIMARY KEY (path_nbr, step_nbr));
--
Each path is assigned an id number and the steps are numbered from
zero (the start of the path) to (k), the final step. Using the simple
six node graph, the one edge paths are:
1 0 A
1 1 B
2 0 B
2 1 F
3 0 C
3 1 D
4 0 B
4 1 D
5 0 D
5 1 E
Now we can add the two edge paths:
6 0 A
6 1 B
6 2 F
7 0 A
7 1 B
7 2 D
8 0 A
8 1 C
8 2 D
9 0 B
9 1 D
9 2 E
And finally the three Edge paths:
10 0 A
10 1 B
10 2 D
10 3 E
11 0 A
11 1 B
11 2 D
11 3 E
These rows can be generated from the single edge paths using a CTE
(Common Table Expression) or with a loop in a procedural language,
such as SQL/PSM. , there are fewer longer paths, but as the
number of edges increases so does the number of paths. By the time
you get to a realistic sized graph, the number of rows is huge.
However, it is easy to find a path between two nodes:
SELECT DISTINCT start_node, :end_node,
(P2.step_nbr- P1.step_nbr) AS distance
FRM Paths AS P1, Paths AS P2
WHERE P1.path_nbr = P2.path_nbr
AND P1.step_nbr <= P2.step_nbr
AND P1.node_id = :start_node
AND P2.node_id = :end_node;
Notice the use of SELECT DISTINCT because most paths will be a
sub-path of one or more longer paths. Without it, the search for all
paths from A to D in this simple graph would return:
7 0 A
7 1 B
7 2 D
8 0 A
8 1 C
8 2 D
10 0 A
10 1 B
10 2 D
11 0 A
11 1 B
11 2 D
However, there are only two distinct paths, namely (A, B, D) and (A,
C, D). In a realistic graph with lots of connections, there is a good
chance that a large percentage of the table will be returned.
Can we do anything to avoid the size problems? Yes, and no.
In this graph, most of the paths are redundant and can be removed and
replaced by a set of sub-paths that cover all of the paths in the
original graph. This is easy enough to do by hand for this simple
graph:
1 0 A
1 1 B
1 2 F
2 0 A
2 1 B
2 2 D
2 3 E
3 0 A
3 1 C
3 2 D
3 3 E
I have no idea if there is already a name in the literature for such a
set of paths, so I will call it a "covering path set" and hope that
it sticks. Furthermore, I have no idea how to construct it; no idea
how to test that the set has the minimal number of paths or how to
find all possible such sets other than brute force.
Here are some random thoughts I have had so far:
1) The first observations are that if we have a cycle of size (k) in
the graph, we would split them into two paths of (k-1) nodes that
overlap. If the graph has cycles that overlap or form a lattice, the
number of possible paths increases; notice the way that this graph has
two paths between node A and D that split and re-converge.
2) While search queries are easier in this model, dropping, changing
or adding a single edge can alter the entire structure, forcing us to
re-build the entire table. The combinatory explosion problem again!
3) Can I use Dijkstra's minimal spanning tree algorithm in some way
to get a starting point?
A quick first thought without fully digesting your posting: as you might
know there is a fairly easy way to store a tree in a RDBMS which allows
for fast access to subtrees and paths (even without 's CNNECT BY
PRIR). Maybe that can be used somehow in this context for storing a DFS
traversal. The tricky part will be the storage of backwards edges
From memory:
CREATE TABLE SME_TREE (
node_id int,
pre int,
post int
)
subtree
SELECT node_id
FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
&rootid) RT
WHERE ST.pre >= RT.pre
AND ST.post <= RT.post
path
SELECT node_id
FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
&childid) CT
WHERE ST.pre <= CT.pre
AND ST.post >= CT.post
Updates are also fairly easy.
Kind regards
robert