Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Modeling General Graphs in SQL

    13 answers - 3960 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    This is long, so bear with me.
    I am trying to come up with a way to model a general graph in SQL which
    does not use the adjacency list model. Her is what I have so far and
    I'd like to get any ideas on it.
    What if we attempt to store all the paths in a directed graph in a
    single table in an RDBMS? The table for this would look like this,
    with all the edges directed downward (pardon the ASCII artwork):
    A
    / \
    B C
    | \ |
    F D
    |
    E
    So start with a simple table of paths.
    CREATE TABLE Path
    (path_nbr INTEGER NT NULL,
    step_nbr INTEGER NT NULL
    CHECK (path_nbr >= 0),
    node_id CHAR(1) NT NULL,
    PRIMARY KEY (path_nbr, step_nbr));
    Each path is assigned an id number and the steps are numbered from zero
    (the start of the path) to (k), the final step. Using the simple six
    node graph, the one edge paths are:
    1 0 A
    1 1 B
    2 0 B
    2 1 F
    3 0 C
    3 1 D
    4 0 B
    4 1 D
    5 0 D
    5 1 E
    Now we can add the two edge paths:
    6 0 A
    6 1 B
    6 2 F
    7 0 A
    7 1 B
    7 2 D
    8 0 A
    8 1 C
    8 2 D
    9 0 B
    9 1 D
    9 2 E
    And finally the three Edge paths:
    10 0 A
    10 1 B
    10 2 D
    10 3 E
    11 0 A
    11 1 B
    11 2 D
    11 3 E
    These rows can be generated from the single edge paths using a CTE
    (Common Table Expression) or with a loop in a procedural language, such
    as SQL/PSM. , there are fewer longer paths, but as the number
    of edges increases so does the number of paths. By the time you get to
    a realistic sized graph, the number of rows is huge.
    However, it is easy to find a path between two nodes:
    SELECT DISTINCT start_node, :end_node,
    (P2.step_nbr- P1.step_nbr) AS distance
    FRM Paths AS P1, Paths AS P2
    WHERE P1.path_nbr = P2.path_nbr
    AND P1.step_nbr <= P2.step_nbr
    AND P1.node_id = :start_node
    AND P2.node_id = :end_node;
    Notice the use of SELECT DISTINCT because most paths will be a sub-path
    of one or more longer paths. Without it, the search for all paths from
    A to D in this simple graph would return:
    7 0 A
    7 1 B
    7 2 D
    8 0 A
    8 1 C
    8 2 D
    10 0 A
    10 1 B
    10 2 D
    11 0 A
    11 1 B
    11 2 D
    However, there are only two distinct paths, namely (A, B, D) and (A, C,
    D). In a realistic graph with lots of connections, there is a good
    chance that a large percentage of the table will be returned.
    Can we do anything to avoid the size problems? Yes, and no.
    In this graph, most of the paths are redundant and can be removed and
    replaced by a set of sub-paths that cover all of the paths in the
    original graph. This is easy enough to do by hand for this simple
    graph:
    1 0 A
    1 1 B
    1 2 F
    2 0 A
    2 1 B
    2 2 D
    2 3 E
    3 0 A
    3 1 C
    3 2 D
    3 3 E
    I have no idea if there is already a name in the literature for such a
    set of paths, so I will call it a "covering path set" and hope that
    it sticks. Furthermore, I have no idea how to construct it; no idea
    how to test that the set has the minimal number of paths or how to find
    all possible such sets other than brute force.
    Here are some random thoughts I have had so far:
    1) The first observations are that if we have a cycle of size (k) in
    the graph, we would split them into two paths of (k-1) nodes that
    overlap. If the graph has cycles that overlap or form a lattice, the
    number of possible paths increases; notice the way that this graph has
    two paths between node A and D that split and re-converge.
    2) While search queries are easier in this model, dropping, changing or
    adding a single edge can alter the entire structure, forcing us to
    re-build the entire table. The combinatory explosion problem again!
    3) Can I use Dijkstra's minimal spanning tree algorithm in some way
    to get a starting point?
  • No.1 | | 4892 bytes | |

    wrote:
    This is long, so bear with me.

    I am trying to come up with a way to model a general graph in SQL
    which does not use the adjacency list model. Her is what I have so
    far and I'd like to get any ideas on it.
    What if we attempt to store all the paths in a directed graph in a
    single table in an RDBMS? The table for this would look like this,
    with all the edges directed downward (pardon the ASCII artwork):

    A
    / \
    B C
    >\ |

    F D
    |
    E

    So start with a simple table of paths.

    CREATE TABLE Path
    (path_nbr INTEGER NT NULL,
    step_nbr INTEGER NT NULL
    CHECK (path_nbr >= 0),
    node_id CHAR(1) NT NULL,
    PRIMARY KEY (path_nbr, step_nbr));
    --
    Each path is assigned an id number and the steps are numbered from
    zero (the start of the path) to (k), the final step. Using the simple
    six node graph, the one edge paths are:

    1 0 A
    1 1 B
    2 0 B
    2 1 F
    3 0 C
    3 1 D
    4 0 B
    4 1 D
    5 0 D
    5 1 E

    Now we can add the two edge paths:

    6 0 A
    6 1 B
    6 2 F
    7 0 A
    7 1 B
    7 2 D
    8 0 A
    8 1 C
    8 2 D
    9 0 B
    9 1 D
    9 2 E

    And finally the three Edge paths:

    10 0 A
    10 1 B
    10 2 D
    10 3 E
    11 0 A
    11 1 B
    11 2 D
    11 3 E

    These rows can be generated from the single edge paths using a CTE
    (Common Table Expression) or with a loop in a procedural language,
    such as SQL/PSM. , there are fewer longer paths, but as the
    number of edges increases so does the number of paths. By the time
    you get to a realistic sized graph, the number of rows is huge.

    However, it is easy to find a path between two nodes:

    SELECT DISTINCT start_node, :end_node,
    (P2.step_nbr- P1.step_nbr) AS distance
    FRM Paths AS P1, Paths AS P2
    WHERE P1.path_nbr = P2.path_nbr
    AND P1.step_nbr <= P2.step_nbr
    AND P1.node_id = :start_node
    AND P2.node_id = :end_node;

    Notice the use of SELECT DISTINCT because most paths will be a
    sub-path of one or more longer paths. Without it, the search for all
    paths from A to D in this simple graph would return:

    7 0 A
    7 1 B
    7 2 D
    8 0 A
    8 1 C
    8 2 D
    10 0 A
    10 1 B
    10 2 D
    11 0 A
    11 1 B
    11 2 D

    However, there are only two distinct paths, namely (A, B, D) and (A,
    C, D). In a realistic graph with lots of connections, there is a good
    chance that a large percentage of the table will be returned.

    Can we do anything to avoid the size problems? Yes, and no.

    In this graph, most of the paths are redundant and can be removed and
    replaced by a set of sub-paths that cover all of the paths in the
    original graph. This is easy enough to do by hand for this simple
    graph:

    1 0 A
    1 1 B
    1 2 F

    2 0 A
    2 1 B
    2 2 D
    2 3 E

    3 0 A
    3 1 C
    3 2 D
    3 3 E

    I have no idea if there is already a name in the literature for such a
    set of paths, so I will call it a "covering path set" and hope that
    it sticks. Furthermore, I have no idea how to construct it; no idea
    how to test that the set has the minimal number of paths or how to
    find all possible such sets other than brute force.

    Here are some random thoughts I have had so far:

    1) The first observations are that if we have a cycle of size (k) in
    the graph, we would split them into two paths of (k-1) nodes that
    overlap. If the graph has cycles that overlap or form a lattice, the
    number of possible paths increases; notice the way that this graph has
    two paths between node A and D that split and re-converge.

    2) While search queries are easier in this model, dropping, changing
    or adding a single edge can alter the entire structure, forcing us to
    re-build the entire table. The combinatory explosion problem again!

    3) Can I use Dijkstra's minimal spanning tree algorithm in some way
    to get a starting point?

    A quick first thought without fully digesting your posting: as you might
    know there is a fairly easy way to store a tree in a RDBMS which allows
    for fast access to subtrees and paths (even without 's CNNECT BY
    PRIR). Maybe that can be used somehow in this context for storing a DFS
    traversal. The tricky part will be the storage of backwards edges

    From memory:

    CREATE TABLE SME_TREE (
    node_id int,
    pre int,
    post int
    )

    subtree

    SELECT node_id
    FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
    &rootid) RT
    WHERE ST.pre >= RT.pre
    AND ST.post <= RT.post

    path

    SELECT node_id
    FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
    &childid) CT
    WHERE ST.pre <= CT.pre
    AND ST.post >= CT.post

    Updates are also fairly easy.

    Kind regards

    robert

  • No.2 | | 1004 bytes | |


    Robert Klemme wrote:
    wrote:
    to get a starting point?

    A quick first thought without fully digesting your posting: as you might
    know there is a fairly easy way to store a tree in a RDBMS which allows
    for fast access to subtrees and paths (even without 's CNNECT BY
    PRIR). Maybe that can be used somehow in this context for storing a DFS
    traversal. The tricky part will be the storage of backwards edges

    From memory:

    CREATE TABLE SME_TREE (
    node_id int,
    pre int,
    post int
    )

    subtree

    SELECT node_id
    FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
    &rootid) RT
    WHERE ST.pre >= RT.pre
    AND ST.post <= RT.post

    path
    --
    SELECT node_id
    FRM SME_TREE ST, (SELECT pre, post FRM SME_TREE WHERE node_id =
    &childid) CT
    WHERE ST.pre <= CT.pre
    AND ST.post >= CT.post

    Updates are also fairly easy.

    Celko has been referenced to his own method! Remarkable.

  • No.3 | | 795 bytes | |

    I know this approach -- see my book TREES & HIERARCHIES IN SQL :)

    approach I was considering was to get a minimal spanning tree
    (Dijkstra's algorithm? something else?). The next step would be to
    look at the edges not in the first tree and find another spanning tree
    rooted at a second node that covers those remaining edges.

    The problem is that I have no idea if the second step is possible
    because one edge could be left out of both trees, so I have to add a
    third tree to the forest, etc. I see a horrible upper limit of (n^2)
    rows table because I must insert one tree per node to get the coverage
    I want.

    I have posted on a math newsgroup to see if I can find a graph theory
    expert and then adapt an algorithm to SQL code.

  • No.4 | | 1068 bytes | |

    wrote:
    I know this approach -- see my book TREES & HIERARCHIES IN SQL :)

    I didn't know *cough* Sorry for the noise. :-)

    approach I was considering was to get a minimal spanning tree
    (Dijkstra's algorithm? something else?). The next step would be to
    look at the edges not in the first tree and find another spanning tree
    rooted at a second node that covers those remaining edges.

    The problem is that I have no idea if the second step is possible
    because one edge could be left out of both trees, so I have to add a
    third tree to the forest, etc. I see a horrible upper limit of (n^2)
    rows table because I must insert one tree per node to get the coverage
    I want.

    Yeah, that doesn't really sound attractive.

    I have posted on a math newsgroup to see if I can find a graph theory
    expert and then adapt an algorithm to SQL code.

    Let us know the outcome. My graph theory is a bit rusty now but this is
    certainly an interesting topic!

    Kind regards

    robert

  • No.5 | | 1172 bytes | |

    "" <jcelko212@earthlink.netwrote in message
    news:1134680948.716427.27410@
    3) Can I use Dijkstra's minimal spanning tree algorithm in some way
    to get a starting point?

    You don't mention assigning weights to the edges, so can we assume that the
    weights are all equal? Therefore a minimal spanning tree of a graph with n
    vertices is any tree covering those vertices with the number of edges m
    equal to n-1. It doesn't matter where you put the edges. You could, for
    example, connect any arbitrary vertex to all the other vertices and be done
    with it.

    You need to decide what aspect of graphs are you trying to model, because
    the storage design may depend on what operations you intend to do with the
    data.

    For example, you might only need to be able to test if a path exists between
    two vertices.
    you might need to test if the graph is cyclic or acyclic. calculate
    the graph density. you might need to assign weights to the edges, and
    return the minimum path between two vertices (wow, that would be a *cool*
    SQL expression that could do that!).

    Regards,
    Bill K.

  • No.6 | | 347 bytes | |

    >You don't mention assigning weights to the edges, so can we assume that the
    weights are all equal? <<

    In the product, there are colored edges for various relationships,
    which can be treated as weights. I remember some of the classic graph
    algorithms, but I do not know the enternals of the engine.

  • No.7 | | 517 bytes | |

    Bill Karwin wrote:
    you might need to assign weights to the edges, and
    return the minimum path between two vertices (wow, that would be a *cool*
    SQL expression that could do that!).

    The minimum path query is the same as transitive closure, which is
    widely believed to be inexpresible by standard SQL. By "standard SQL"
    I've meant to exclude certain "non-cool" extensions like "connect by"
    and "recursive with" (even though the latter is formally a part of
    ANSI SQL).

  • No.8 | | 384 bytes | |

    mikharakiri_nospaum@yahoo.com wrote:

    The minimum path query is the same as transitive closure, which is
    widely believed to be inexpresible by standard SQL.

    I was under the impression that it was proven, rather than "widely
    believed." Am I mistaken? Is this one of those things that's not
    been proven one way or the other?

    Marshall

  • No.9 | | 714 bytes | |


    Marshall Spight wrote:
    mikharakiri_nospaum@yahoo.com wrote:

    The minimum path query is the same as transitive closure, which is
    widely believed to be inexpresible by standard SQL.

    I was under the impression that it was proven, rather than "widely
    believed." Am I mistaken? Is this one of those things that's not
    been proven one way or the other?

    You are correct. It has been proven that transitive closure is not
    possible in SQL. There are ACM papers and others. If you google for

    sql transitive-closure

    there are many relevant resources, some with a proof and others which
    introduce possible extensions to SQL for transitive closure.

  • No.10 | | 1141 bytes | |

    dawn wrote:
    Marshall Spight wrote:
    >
    >>mikharakiri_nospaum@yahoo.com wrote:
    >>

    The minimum path query is the same as transitive closure, which is
    widely believed to be inexpresible by standard SQL.
    >>
    >>I was under the impression that it was proven, rather than "widely
    >>believed." Am I mistaken? Is this one of those things that's not
    >>been proven one way or the other?

    >
    >

    You are correct. It has been proven that transitive closure is not
    possible in SQL. There are ACM papers and others. If you google for

    sql transitive-closure

    there are many relevant resources, some with a proof and others which
    introduce possible extensions to SQL for transitive closure.

    From "Universality of Data Retrieval Languages" by Alfred V. Aho and
    Jeffrey D. Ullman:

    "In this appendix. we prove that the transitive closure of a relation
    cannot be couched as an expression of relational algebra."

    p
  • No.11 | | 379 bytes | |

    paul c wrote:
    which is
    widely believed to be inexpresible by standard
    SQL.
    ^^^^^^^^^^

    From "Universality of Data Retrieval Languages" by Alfred V. Aho and
    Jeffrey D. Ullman:

    "In this appendix. we prove that the transitive closure of a relation
    cannot be couched as an expression of
    relational algebra."
    ^^^^^^^^^^^^^^^^^^^

  • No.12 | | 545 bytes | |

    Mikito Harakiri wrote:
    paul c wrote:

    which is
    widely believed to be inexpresible by standard
    SQL.

    ^^^^^^^^^^
    >
    >
    >From "Universality of Data Retrieval Languages" by Alfred V. Aho and
    >>Jeffrey D. Ullman:
    >>
    >>"In this appendix. we prove that the transitive closure of a relation
    >>cannot be couched as an expression of

    >relational algebra."
    >

    ^^^^^^^^^^^^^^^^^^^

    Yes.

    p
  • No.13 | | 37 bytes | |


    -- Jan Hidders

Re: Modeling General Graphs in SQL


max 4000 letters.
Your nickname that display:
In order to stop the spam: 2 + 1 =
QUESTION ON "Databases"

EMSDN.COM