ebxml-regrep message

Subject: Re: A DBA perspective.
From: David RR Webber <Gnosis_@compuserve.com>
To: Michael Rowley <mrowley@exceloncorp.com>
Date: Thu, 11 Jan 2001 12:17:43 -0500
Michael,

Reading your replies below I'm concerned that there is a major
disconnect here.   I am grappling with the metrics here and perhaps
I can layout some so that I can see the differences correctly?

My particular view is based on a high level conceptual business
functional view, coupled with nearly 25 years of working with 
literally every major database system on the planet, in some of the
largest installations on the planet, and then some research stuff
nobody has ever heard of - plus now working for a
vendor of a leading XML search tool - and having evaluated
much of the XML competition head-to-head.

So - where does this all take us?  I'd like to think we can deliver 
something here that is completely implementation agnostic.
Therefore to achieve that means explicitly NOT diving down into
specific foobarSQL or foobarXML, or whatever syntax specifics.
Letting go of that - and saying - what business functional API 
should we expose?

On the one hand we have been told - ad hoc querying - query 
anything you like.

Hard experience is teaching us, and showing us here, that
this does not work.  Why?  First of all - users will need to 
have far too much knowledge of the internals of the 
Registry information model - and frankly this stuff is non-trivial
to ingest - as can be seen from your comments below - 
stating this won't work, and that won't work - but this is 
HIGHLY dependent on the information content itself and
how it is structured.  Sometimes it may work, sometimes
not - depending who built the underlying coupling and how.

I suspect the only people running true ad hoc queries 
will be the DBA's inside the firewall of the registry host - and
then we do not care what query language they pick for that!

So what is the solution?   I believe we do need to listen to
the DBA world - and that stored procedures that have
documented behaviour just plain makes sense.

Yes you end up with a ton of these.  Ever seen the 
library the average DBA has?   Bottom line is the 
END USERS could not care less - all they want is
business results - and they do not want to have to
understand all the internals to get them - and frankly
neither do I at this point!

I've been in those trenches long enough - build SQL
for Oracle, Sybase, SQLServer, DB2, and having to 
tweak statements to make them work across 
implementations, and then have switch statements
inside of stored procedures for this that and the other.

But at the end of the day - the end user invokes the
stored procedure - passes in the business values
they want to query on - and gets the results.  To them
it should just be an API.

Also - notice we should NOT be trying to solve world
hunger.   This is version 1.0 - there is still so much 
that is flux around us - we should be focusing on a
limited simple and clear approach.  TRP chose 
MIME for these same reasons - its not perfect - but
it can get a job done - not all jobs.

A limited subset of ad hoc querying based on 
pre-defined building blocks now appears to 
be the best hope for us.  Using these set queries
we should aim to answer the obvious business
needs.  Why would someone want an adhoc 
query anyway?   Let's explore this (and BTW - 
there are registries out there already with EDI
flavourings - so you can get a feel for this). These
show you that highly focused queries solve
a great deal of the access problems not covered
by drilldown alone, and the overwhelming 
detail problem that a simple keyword search
delivers.  That's the middle ground we need.

The three business cases that jump out are
business process and transactional, and
then business specific - ie.

1) show me business process entries to do with 
     invoicing for blue cross blue shield (this may
     well be a menu option off a drill-down interface).

2) show me all element definitions for a
     transaction schema foobar (obvious get details
     shortcut to save time).

3) show me CPP entries for plumbers in 
     Cleveland, Ohio (coontent specific querying).

Foul - I hear you crying - we can't possibly 
build all those procedures - and yes indeed
we cannot.  But if we empower the deployment
staff who understand their industry and their
business needs - then we do not have to.

Reuse is all about this - and the Registry should
leverage reuse for its own purpose.

Exposing a drilldown access path to a Registry 
that shows you the available ad hoc query
procedures is a simple and obvious mechanism.
These can be tied to forms to allow users to 
point and click and add search values, and they
can be combined to allow smart filtering.  Notice
DBA's are very good at all this stuff.

In a A2A environment - you can discover these 
procedures and access that API set remotely,
(DBA's may choose to limit these procedures
depending on business volume and performance
impacts).

Therefore I am focusing on creating a mechanism
that is NOT tied to anyone query syntax, and 
specifically does not allow completely random 
ad hoc querying, and does not rely on any 
specific implementation tricks with highly specific
lowlevel storage details in XML.  

The storage relationships and partitioning should 
conform to the information model - that is all we 
care to know - and notice that
is also only the level of detail that OASIS have 
specified.

OK - enough typing already - I need to get back
to the summary WP, I hope I have painted a 
general picture here.

Looking again at the Q&A below - the exact details
of the querying 'how to' with a specific query 
technology should not concern us.   Experience has
taught me that I can solve any information 
management problem given the ability to work from
a sound conceptual design.  We have the design - 
we need to stick with that - and not engage in this
foobar syntax can or cannot do ABC.  Implementors
will take whatever steps are needed to solve those
issues - we do not have the bandwidth to solve them
for them - nor should we be trying to anticipate 
what problems they may find - becuase we cannot.

What we can specify is the highlevel business 
functional API.

I vote for a constrained external ad hoc querying 
mechanism based on descrete stored procedures.

Internally developers can use whatever ad hoc
querying syntax they care to.

Thanks, DW.

=============================================
Message text written by Michael Rowley
>
Farrukh and I tried to come up with a procedure-based scheme last Friday 
and ran into problems, so that by the time we were out of time, it was 
looking dubious.

> Here's the updated DTD and sample, and 
> a first cut at a list of RIM Procedures and Groups.
> 
> RIM Procedures
> 
> o  Classification
> 
> - getClassifedObjects

Wouldn't you want to be able to get objects that have some specified 
classification.  Then, the next question is how to specify the 
classification.  You could do it by a "geography.asia.japan.tokyo" kind 
of scheme, but you would probably want to be able to have wild cards for 
a level or even for a set of levels.

> o  Association
> 
> - getAssociatedObjects

Is the intent here that you are getting the objects that have been 
associated with a designated object?  Is the designated object the 
source or target?

There are many attributes on associations.  You may want to get all 
associations from a designated source or taget role, or only 
associations of a certain type, etc.  This gets harry pretty fast.

> o  Package
> 
> - get ObjectsByPackage

How would you specify the package.  Packages don't have a global 
name-space do they?

Wouldn't you also want to be able to get the packages that a specified 
object is in?

In general, I believe this approach can work.  The number of procedures 
will be very large and some will be awkward due to a large number of 
parameters that need to be specified.

> 
> The ebXML Registry Ad Hoc Query Accessing 
> 
> Pass an XML command container that consists
> of:
> 
> 1)  Locator + comparitor + (value | valuelist) = query term(s).
> 
>      a) Valid Locators:
> 
>          (i)   A tag name (any path context)
>          (ii)   Tag root path (specific path context)
>          (iii)  A supported Registry Procedure name.
>          (iv)  GUID or UID or URN
>          (v)  "*"   =  match any text  (HTML style on content)

This section confuses me.  By "tag name" do you mean any XML tag name 
(i.e. element name) that exists in the registry?  As soon as you doing 
things along the lines of querying for XML elements whose contents match 
some string, you are practially reinventing XPath.

I would have guessed that if you did not want to use a general purpose 
query mechanism (like XPath) that you would not include queries of 
arbitrary XML elements in the query language.  Instead, I would expect 
you to limit the search to the procedures you mention above, possibly 
with the addition of unstructured keyword searching.

Michael



<
Follow-Ups:
- Re: A DBA perspective.
  - From: Michael Rowley <mrowley@exceloncorp.com>
- Re: A DBA perspective.
  - From: Farrukh Najmi <Farrukh.Najmi@east.sun.com>