ebxml-regrep message

Subject: RE: Simple Boolean Query Proposal.
From: JP Morgenthal <jp.morgenthal@xmls.com>
To: 'David RR Webber ' <Gnosis_@compuserve.com>,"'INTERNET:matt@xmlglobal.com '" <matt@xmlglobal.com>
Date: Mon, 08 Jan 2001 10:57:19 -0500
I like it.  Good research work David and an excellent job explaining it.

As I stated in my first proposal on this point.  OQL/SQL is too heavyweight
for what we're doing here.  We simply need simple boolean queries at this
time.  I say we move in this direction and let the POC tell us if they need
more power for ad-hoc.

JP 

-----Original Message-----
From: David RR Webber
To: INTERNET:matt@xmlglobal.com
Cc: 'duane '; ''mrowley@exceloncorp.com ' '; ''RegRep ' '
Sent: 1/7/2001 6:39 PM
Subject: Simple Boolean Query Proposal.

Message text written by INTERNET:matt@xmlglobal.com
>Fine by me.  XPath is easier to learn than QUILT, and I think XPath
would do
just fine.  

Now for something different.....

Why wouldn't ebxml regrep simply define a set of calls which allow a
user/app to search for certain things?  The UDDI initiative did this
with
their find_*() methods.  If the group were to do this, there would be no
QL
holy war as vendors could use whatever technology that they are married
to
to fulfill queries.  Vendors could extend the core to perform more
complex
queries (they will anyway).

Comments?

-Matt<

>>>>>>>>>>>>>>>>>>>>>>>

Ok - Matt - remember what was published 4 months ago?

This is based on work Mike Kass at NIST and I did,
and from that Smithsonian query application that had to be 
database agnostic.

Also - if you go to http://www.monster.com - there are 445,000
job vacancies - so this equates pretty good to a directory
of 445,000 businesses that need something, or have a
speciality.

Of course they are using HTML searching style right now.
But ad hoc querying it is.

Let's re-visit and see what people reckon on a best approach
given what we've learned in the last four months.

All this XPath / XSL / techie detail is well and good - but how
about the actual business requirements here??!

The main reason I see that Farrukh wanted to do OQL was so
that we could head off UDDI and do ad hoc queries against
CPP company profiles in the ebXML Registry.
Do we really need complex searching technologies here to
get the business job done?

On inspecting Monster.com - my assertion would be - 
we can get away with a simple boolean search first pass 
and get 95% of the business functionality - if not 100%!

What is a simple boolean search?

1)  Locator + comparitor + value = query term(s).

     a) Valid Locators:

         (i)   A tag name (any path context)
         (ii)   Tag root path (specific path context)
         (iii)  GUID or UID or URN
         (iv)  "*"   =  match any text  (HTML style on content)
       
2)  Valid Comparitors:

         (i) EQUALS-STRING 
         (ii) EQUALS-DATE
         (iii) LESS-THAN
         (iv) GREATER-THAN
         (v) IS-EARLIER-THAN
         (vi) IS-LATER-THAN
         (vii) CONTAINS-STRING

   These are based on the OASIS registry query list drafts.
    Notice if you wanted to do a subset we could lose
    3 of these that are date specific and do string 
    comparisons only - since current
   XML is not typed on dates.  Given that we're
   looking for businesses a la UDDI this is most
   likely acceptable - for version 1.0 here.

3) Value = string   or datestring (YYYY-MM-DD only)

4) Valid joins - OR, AND;  NOT OR, NOT AND

Now to insert these into the DTD's for registry access
you simply need to add:

<!ELEMENT RequestItem (RegistryItem)>
<!ATTLIST RequestItem
  Action (queryURI | queryContent | queryRAW ) #REQUIRED
>

<!-- Allows specification of Registry information model -->
<!-- compliant references to actual content.            -->

<!ELEMENT Locator (term+)>

<!ELEMENT term EMPTY>
<!ATTLIST term
  tagpath CDATA #REQUIRED
  tagmode CDATA ( TAG | PATH | GUID | UID | URN | ANY ) #REQUIRED
  operator (EQUALS-STRING | CONTAINS-STRING | IS-EARLIER-THAN |
            IS-LATER-THAN | GREATER-THAN | LESS-THAN ) #REQUIRED
  value CDATA #REQUIRED
  join CDATA ( OR | AND | NOT OR | NOT AND | END-QUERY ) #REQUIRED
>

A sample query term would then be:
 <requestItem
    Action="queryURI">
    <RegistryItem/>
 </requestItem>
 <locator>
    <term  tagpath="country" tagmode="TAG"
           operator="EQUALS-STRING" value="FRANCE"
           join="AND" >
    <term  tagpath="grape" tagmode="TAG"
           operator="CONTAINS-STRING" value="MERLO"
           join="END-QUERY" >
 </locator>     

Since Farrukh already has implemented a set of functions
to retrieve context (drilldown), that will give you the
Tag Name or Tag root path or GUID/ UID - this is it.

A couple of pages in the spec' and its done.  Use what
ever backend storage database technology you want.
       
Any questions?!

Thanks, DW.