h a l f b a k e r yQuis custodiet the custard?
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
SQL for trees
UDA in SQL as a mean to work with recursive hierarchies | |
The problem: unlimited depth tree represented as (id, pid) relation is hard to use in RDBMS.
The solution:
First you need a logical description of your UDA (user defined aggregation) functions recursivly traversing unlimited depth tree by one traverse pattern per function.
Second, you need an
index optimizing of and used by such UDA functions.
Third, you need such UDA implementation.
I've got a prototype already.
[link]
|
|
Ooooo a prototype! I suppose you're going to use your swell new database to destroy a major North American city if we don't meet your demands, aren't you? |
|
|
[panserg] you might have to wait a while for the more technical users here to make worthwhile comments. I wish I knew enough about DBs to comment but, alas, I don't. |
|
|
Firstly, [panserg] claims to have baked this idea.
Secondly, there's no description or indication as to how or why this method is better. |
|
|
Slaked Quick Lime is not good
for trees. It makes them drop
their leaves prematurely and
gives them that ghostly look as
they stand there bare in the
moonlight. |
|
|
That's right neelandan, but it also masks those annoying dead fish odors. |
|
|
I'm working with trees in SQL right now, and it's an ABSOLUTE BITCH. I wish there was something better. Are you telling me you have such a thing? |
|
|
I've found that you can do trees in SQL by adding two extra columns to the table, sort of like a tree index. This stores the tree "path" and a "depth" of the placement. When you pull out the statement, you need to sort by the "path" and monitor the "depth". If the depth changes, move in or out. The path is really just used for sorting (so that should include your sort index). You could also add another column that stores the actual IDs so you can easily look up parents, siblings, delete whole branch, yada... |
|
|
The real trick is saving this data and converting it in your host application. Here's a sample tree... |
|
|
ID, name, description, sort, depth, tree, path
1, 'parent', '', 'a', 1, '/a', '/1'
2, 'child', '', 'a', '/a/a', '/1/2'
3, 'another child', 'b', '/a/b', '1/3'
etc... |
|
|
I also have done some stuff with joining tables, returning the ID from multiple tables and then monitor those field, indenting and stuff when they change. I agree that if some software function existed where you can define that, it would be really cool. |
|
|
Can I check out that prototype? I could send you something as well. |
|
|
What I'm saying is, SQL, SchmeQL. This application is right up OO's alley. Forget the relational model here. Recursive hierarchies are a snap with an OODBMS. |
|
|
Representing a tree structure in SQL isn't the problem, [jkichline]. The problem is navigating it in any reasonable way, because in SQL that gets incredibly clumsy. |
|
|
It's even worse when you have a lattice, a many-to-many tree that must be represented with a separate table. |
|
|
Oracle SQL has a thingie called CONNECT BY
(look it up in the manual). It sorta can do
what you want (it screws up if you have a
loop, which you wouldn't have in a tree,
but still...). Of course, I vote for this
feature; I've been in a few situations where
it would have been very welcome. |
|
|
This idea is most disappointing. I was hoping for some sort of database query performed by vegetables. |
|
|
Algorithms exist that associate numeric integer-based ranges with each node in a hierarchy such any child is guaranteed to be within the range of its parent. For example, the child could have a range of 100-200 and the parent a range of 80-1000. In practice, the ranges are *much* larger. Also each child node also has a reference to its parent. Of course each node is just modeled as a row in a relational table. |
|
|
Assigning ranges to a fixed tree that have this property is relatively simple, and with proper web searching you should find descriptions of such algorithms. Once a tree has range assignments, it becomes trivial to determine if one node is contained in another, or to search for all nodes contained in the subtree of another. And this can all be done in standard SQL. |
|
|
But lots of trees are *not* fixed. Instead, trees often need to be updated. Handling dynamic trees requires an incremental algorithm to assign ranges to the tree. This incremental assignment is much trickier to do than the fixed tree version. |
|
|
Still algorithms do exist to incrementally maintain such range information and have been successfully implemented in trees with millions of nodes. I know because I invented an incremental algorithm for range-based hierarchies and implemented it. There are no open source versions at this time. |
|
|
Doesn't the CONNECT BY construction in Oracle already come close to this (as of at least ten years ago)? |
|
|
Maybe, I've heard about, but never tried to use that one (I remember trying to write a single-shot script that traversed a node-tree - one table showing nodes, the other showing "edges" or connections - which, given two node names as parameters, determined the best route between them as an ordered result set of nodes in the path - it was pretty tricky as I was trying to avoid using Oracle specific code like CONNECT BY) - but the idea is 8 years old so that syntax may not have percolated up into common(ish) knowledge by then. |
|
|
However, representing a "graph" (directed or otherwise) seems to be a slightly different thing to this sort of dynamic "recursive aggregation" described in the idea - which at face value might be about (for example) a series of heirarchical sums.
e.g. Assuming you have a database of the people in the world, did a select count(*) and grouped by various fields, you'd be able to show different aggregations - i.e. Number of People per Individual = (Normally) 1, Number of People per Household = p, Number of People in Street = q, Number of People in County = r, Number of People in Region = s, Number of People in Country = t, Number of People in Continent = u, Number of People in Hemisphere = v, Number of People in World = w. Which is fine, but each aggregation is (notionally) a sort of child of the next, I'm wondering whether [panserg] is talking about sums of sums. |
|
|
Or maybe they're talking about something else entirely - to be honest, it's not clear. |
|
|
I'm not sure I completely understand the issue, but if I understand
correctly, one way I've seen this done is to concatenate the levels
as a path string. It's slow, but it works. Your primary key would be
(id), and you'd have unique indexes (id, pid) and (pid, id) for
traversing up and down. |
|
|
Example records:
id = 2000
pid = 1000
path = '0,20,50,1000,2000' |
|
|
id = 1000
pid = 50
path = '0,20,50,1000' |
|
|
id = 50
pid = 20
path = '0,20,50' |
|
|
id = 20
pid = 0
path = '0,20' |
|
|
The problem, surely, is that the QDL of an APSOP
can't map back, via SQL, to the originating MB3? |
|
|
Or am I missing something? Are we saying it's a
function of the limits or recursive depth, or is this
just a software power-law problem? |
|
|
I think I see through your schema, [MB] |
|
|
//The problem, surely, is that the QDL of an APSOP can't
map back, via SQL, to the originating MB3? // |
|
| |