Saturday, March 31, 2007

The quality of data is not strained

“The quality of data is not strained; It droppeth as the gentle rain from heaven Upon the place beneath. It is twice blessed- It blesseth him that gives, and him that takes.”– A bastardization of William Shakespeare.

I am often faced with the question; “Why don’t we just do this with our Web Services?” Generally when I’m asked that question it’s in relation to what I call Dataweb technologies. When I’m asked it in other contexts it makes even less sense.

There are many answers to this question and different ones tend to resonate with different people. One of the main qualities of the Dataweb that I strive for is the richness of interaction that one gets when accessing data, through ANSII SQL that is in a well designed schema. This is a quality that you only grock if you have spent time writing database reports or very data intensive apps. Those of us that have been there know that extracting information from a well written schema is a joy. In fact, given a little imagination and a reporting tool you can learn stuff from a well built data set that you didn’t know you knew. This phenomenon fueled a whole industry starting back in the mid 80’s when ODBC first hit our radar. We still build big data warehouses that we troll and derive new information and stats from, but only inside closed systems.

Back in the early 80’s all the data was locked up on the mainframes and we started writing PC apps that needed to access that data. Each time we wrote an app, we wrote a data driver to access the data we needed off the mainframe. There was very little reusability and no appreciation of the ‘value of data’. Then, along came ODBC, the first widely adopted manifestation of ANSII SQL, and everything changed. Now, you built an ODBC driver that could access your mainframe and went to town, you never had to write another custom driver again. This was the inflection point where we discovered that using a fluid, abstract, data access mechanism let us learn new things from the data we had already collected. The difference between those custom data drivers and the ODBC data access paradigm was that the drivers tightly bound the purpose of the access of the data to the mechanism for accessing it, while ODBC (SQL) provided an abstract mechanism that didn’t care what the data was or how it was going to be used. These qualities were inherent in the way we thought about those custom data drivers; when we designed and built them we built interface definitions getUser(), getInvoice(), etc… We used method invocation to access the data we needed. SQL provided us a way to query any schema in any way and ‘try new stuff’ without having to re-program our data access layer.

Given my example of getUser() and getInvoice(), what happened if I wanted to find out if there was any correlation between geographic region and annual total purchases… I was basically stuck waiting for the mainframe guys. With SQL in place I could slice and dice my 2 table schema (users and invoices) any way I wanted. I could look for patterns and play to my hearts content… but it wasn’t really play, it was the birth of business intelligence. Now that I could work out the profile of my best customers, I could target other people with that profile to become my new customers. How’s that as an unexpected outcome from a higher level of data access abstraction?

The way that we conventionally use Web Services today is not just akin to those old data drivers, it is the same thing. We know this, it’s inherent in the names of the protocols that we use, XML RPC, Remote Procedure Calls; method invocation. getUser() and getInvoice() would be very reasonable methods to see defined in a WSDL.

Now sometimes you need the quality of RPC, you don’t want people to be trolling through you data and deriving all sorts of stuff, you want to keep them on a tight leash, so use conventional Web Services. I call this integration pattern ‘application integration’, not data integration.

The protocols that support the Dataweb, XRI, Higgins, XDI, SAML, OpenID, WS-*, etc… provide mechanisms to access a distributed network of data with the same richness as if you were accessing a single data source via SQL, but with more control. Imagine doing a database join between two tables, now imagine doing a join between two heterogeneous, distributed systems… wouldn’t it be cool.

The qualities of an abstract data layer are; a well defined query language that can be used to access a well defined abstract data model that in turn returns a persistence-schema agnostic data representation. These qualities are shared by SQL, XDI and Higgins.

When contemplating a data abstraction for a distributed data network there are some other things that we have to add to the mix; trust frameworks, finer grain security, social and legal agreements, network optimization, fault tolerance, to name but a few… And that is what I spend a lot of my time thinking about.

So I hope that this describes somewhat why Dataweb technology is different from conventional Web Services implementations, although they run on the exact same infrastructure.

It is interesting to note, and I may be way off line here so if you know better please correct me, from what I’ve seen: SalesForce agrees with me. What I mean by that is that their new generation Web Services are some of the most abstract interfaces you are likely to see in a system that derives so much of its value from its programmatic interfaces. (Along with Kintera who we are working with). The only downside with the SalesForce approach is that it’s proprietary, which is a shame, when there are open standards that, appear, on the face of it, to satisfy their requirements. (SalesForce, I’d love to hear from you if you want to talk about this.)

Wednesday, March 14, 2007

Higgins IdAS and XDI

The more I look at the Higgins IdAS the more I recognize that it is the part of the puzzle in the Higgins world that maps fairly closely to what I call the XDI Engine. They both present abstract data interfaces that are meant to be put in front of legacy persistence. I have been telling Paul for a while that I think that IdAS is going to need indexing capability to be really useful. I realized, not long ago, that we need to replace the ooTao specific ‘plugin’ engine with an IdAS implementation. I am seeing more and more that once xdi takes into account the Higgins IdAS use cases and Higgins IdAS consumes the xdi use cases as sub sets of the complete ‘dataweb’ use cases that an xdi engine and an IdAS implementation are going to end up being pretty much identical. I watch the Higgins-dev list and learn and hope to contribute where I can.

In that light I am going to start putting more Higgins musings on this blog as well as xdi stuff.

Here is a thought provoked by a current discussion on the Higgins list:

The way that we are dealing with systemic and semantic mapping in xdi is by introducing an xri abstraction into the mix... attribute types are xris, generally in the '+' namespace, known as the 'dictionary space', like +email, or +first.name. Unlike the '=' and '@' namespaces the '+' namespace is not a rooted space, but I'll get back to that.

So in xdi land, any attribute name is resolvable in dictionary space to a dictionary entry, a dictionary entry may include a bunch of different stuff, including:

synonyms (both semantic (street and rue) and systemic(phone_number and phoneNumber))
schematic constraints (+address = must link to 1 or 2 streets, 1 city, 1 state and 1 zip... I KNOW that +address is a bad example because it's not a global construct)
validations (validation lists, real expressions (masks), executable validation scripts (different implementations in different languages))
UI implementations (for building rich UIs for arbitrary attribute types; +eye.color may provide a color picker that limits color choices to natural human color range as dcom, .class, xul, etc...)

So, in xdi land, as we build indexes of the various contexts (one of the primary 'qualities' of xdi is indexing the contexts it knows about so that you don't have to go trolling 200 contexts to find the attribute that you need about a given subject); rather than indexing the attribute type '+email' we index the canonicallized i-number that the i-name resolves to.. +!3215.2154.1254.

example:

xri://=andy/+email, 'andy's email address', points to a specific attribute in a specific context but what we persist in the index is xri://=andy/+!3215.2154.1254

Now when anyone wants to do a get against the index they can search for xri://=andy/+email or xri://=andy/+e_mail or xri://=andy/+Email or xri://=andy/+doar.hashmali (transliteration from Hebrew) and get back the desired record because the type is always resolved back to the i-number. On set operations the xdi engine checks the validations and schema constraints of the type before parsing the operation back into the 'context provider' to persist the new data.

I said that the '+' space is not rooted, so how does it resolve? Well, just like with English, you can look up a word in whatever dictionary you want, you might prefer Webster’s, personally I like the Oxford Standard. This quality lends itself to supporting a seamless continuum of global, community and personal dictionaries so you can be as precise or as vague with any given term as you like. A person can specify the intended dictionary for a given type: @ootao*(+email) would be ootao’s definition of +email and IS resolvable in the global @ namespace.

The early dictionary implementations that we are working with use a folksonomy approach to building the communal knowledge… anyone can edit the dictionary. So if your system uses a field name for an attribute that hasn’t been mapped yet, you just add it to the dictionary. Once one person has added the ldap schema and one person has added the vcard schema the world now knows that +cn, is the same as +fn and they are both instances of +!3211.5485.3656, which is also +full_name, +person.name, etc…

I’m not saying we have all of the problems solved. Off the top of my head I don’t know how we would express the transformation between givenName, sn and cn… But I could propose a few suggestions if anyone was interested.

Tuesday, March 13, 2007

More on CardSpace and XRI

I like CardSpace. I finally got it installed on my XP machine at home and have used it to log into Kims Blog. Installing it wasn’t as easy as I would have liked, it was a big download, a long install and then I had to get ‘special’ tech support in order to get it to work (by special I mean I had to call someone I know over at MS, in that department to help me). Now it turned out that it was an ‘obvious’ problem but on-line help was not easy to find and the error messages were not helpful. All I had to do was install IE7… another big download and install… BUT… that’s the price we pay for security :-)

I have seen CardSpace demos for years now, and have pondered the paradigm shift in the user login experience and have always liked it… Now that I’ve tried it I like it even more!! As a user experience this makes a lot of sense to me… and with some xri and xdi integration this thing could really rock :-0

There are 3 places that I would like to see xri and xdi integration into the CardSpace world. These opinions are based on a deep knowledge of xri and xdi and a pitiful understanding of anything beyond the basic mechanisms of WS-* that make up the CardSpace. I will try to explain the use cases and the properties of the interactions that I am looking for as I talk about these integration points and if there is alternate (better?) ways of solving the same problems I would love to hear about it.

Integration 1: Portability
One problem I still have with CardSpace is that my cards seem to be bound to a specific machine. If I create self issued cards at home and at the office and log in to Kim’s blog from both places; how do I get recognized as the same person?

In the Higgins project HBX card selector the Card Store is not on the client machine, it is ‘in the cloud’. I think that using i-names to bootstrap authenticating me and finding my card store would make CardSpace better. I want to walk up to any machine that is CardSpace enabled, enter my i-name, authenticate (using the multi-factor mechanism of MY choice) and have trusted resolution (not spoofable like DNS resolution) find my Card Store and let me use my cards. Now, I only have to log in once using my i-name, after that I just pick cards. Because I only have to log in once I’m fine jumping through a few multi-factor hoops to make sure that the authentication is solid. That would be cool!!

Integration 2: I-Name Authentication
OpenID is great a way to authenticate an i-name… but not the only way; I really like the ease of picking a card to login. BUT just because a card says “my i-name is =andy” does NOT mean it should be trusted. This is just the same as on Kim’s blog, my card asserted my email address but I still had to go through an email validation…you can’t trust self asserted claims!

So who should be able to make claims about i-name ownership… whoever the i-name owner wants…who the relying party is willing to trust… and here’s how that can work:

EZIBroker (an ooTao business) is about to start offering managed cards with i-name assertions. We hope that through our XDI.org accreditation and our general reputation we will become a trusted provider of assertions. But that isn’t enough… The owner of the i-name needs to ‘show’ that they have selected EZIBroker as their token service. They can do that by adding a Service Block of type “managed card’ to their i-name record (XRDS). So, a relying party, on receipt of an assertion that the ‘bearer’ of this card is the rightful user of the i-name ‘=andy’ should do 2 validation checks… 1) They should check that the asserting party is who they say they are and that they are trusted by the RP to make the claim and 2) They should perform xri resolution to check that the XRDS for that i-name does, in deed, designate that Token Service as the claims provider for that i-name (the theory being that only the i-name ‘holder’ can change the XRDS). XRI resolution should be performed by the RP anyway to persist the i-number as well as, or in stead of, the i-name.

Integration 3: Pointers as Data
This is close to the heart of my real passion… distributed data management. When an RP asks for an email address I want to be able to return either an email address OR a pointer to an email address. Today if an RP asked for an email address and got back an xri (or uri) I would expect it to be upset… and that’s why we need integration. There are use cases where you want to push the data to the RP, but there also use cases where having the RP be able to pull data on demand can be very useful (like current temp in your location so we know how much beer to deliver). In XDI land the response to any request can be one of 2 things… data or a pointer to data. In card space land the response can only be data (as I understand it). If the response is a pointer to data then the RP has to know how to dereference the pointer.. of course you want the protocols that support the pointer to protect privacy, have fine grained security, have link contracts, have pull and push cache syncornization… be xdi :-)

So at a REALY high level those are the 3 points of integration that I am interested in seeing between XRI, XDI and CardSpace. (The 4^th one that I have talked about on the TC calls is really an integration with the Higgins IdAS service not CardSpace so that will go in a different post).

I will dig into these more as time lets me… I’ll let you know when you can get i-name cards at EZIBroker i-brokers.

The Tao of XDI

Saturday, March 31, 2007