Wednesday, November 21, 2007

If only it were that simple...

Both Kim and Paul picked up this post by Francis Shanahan about the fragmentation of our online information. The center piece of his post is his diagram representing our information spheres... here it is:


I like the diagram in as much as it STARTS to show the problem we face. I dislike it because it implies a structure and solution that WAY over simplifies the problem.

Consider these 2 questions and you'll see what I mean...

Think about the next level out beyond the blue boxes... the attributes. You'll notice that there is massive duplication of information all around circle. This diagram totally fails to represent the interconnectedness of the data. The DataWeb is NOT a set of nicely ordered hierarchies and diagrams that lock us into that way of thinking, I think, do us a disservice.

While this diagram neatly implies that the blue boxes can be canonically categorized it is simply not true. My guess is if we gave each of you the job of categorizing the blue boxes you would come up with not only different groupings but different semantics for those groupings. Don't get me wrong; Francis's diagram is as valid a projection of order onto the mess as any. My complaint is that, we, the people trying to solve these problems must not get lulled into only seeing one dimension of this problem.

I think about the problem like this... Each of you do your version of Francis's diagram but include all the lines between the blue boxes that have data duplication. Also don't limit yourself to only putting each blue box as a child to only one green box... embrace the fact that World of Warcraft is a community, a social network and a gaming site. Once I have gathered all of you diagrams I make them all semi-transparent and put them on top of each other. That diagram is a fair representation of the DataWeb.

An interesting thing to notice is that the lines that go around the outer rim of the diagram are not, on the whole, subjective. We can build a 'rule' that says if two data points have the same value and the same update rules then they should be linked. In other words the lines at the third level should juxtapose fairly well from one persons diagram to the next. Notice that the linking rule is based on values not on labels as the semantic issues in looking at the labels adds another level of complexity.

So.... to the point.... What Francis described is exactly what I have been talking about for the last 3 years. If you take Farncis's diagram, with my radial additions, and put it into a linear form instead of radial, you get exactly the graphs that I have been drawing for years. Three levels, lines going up and down the levels and lines going across the levels. This is no coincidence; there is a fundamental 'truth' about that representation that is much like, in my mind, Kim's laws. This truth is not me, or anyone else, saying 'you must do it this way' it is us trying to point out "this is the nature of the DataWeb". What we have done, and I would be happy to spend time with any of you showing you this in detail, is specified a syntax for describing, precisely, the relationships in that web. Relational databases NEEDED ERD in order to get wide adoption, if people couldn't simply communicate, capture and represent the data models they were working with how could they ever build large complex systems. We need not just an abstract data model but a clear way to graphically represent that model.

No comments: