ebxml-regrep message

Subject: Perspective on Ad Hoc Queries
From: "Mark A. Hale" <mark.hale@interwoven.com>
To: 'ebXML Registry-Repository' <ebxml-regrep@lists.ebxml.org>
Date: Tue, 09 Jan 2001 20:43:26 -0800
Reg/Rep,

I have reviewed the specifications being discussed for the Registry and
Repository and also listen to the list e-mails.

I think that the members of the RegRep are doing a great job and frankly
cannot offer near the expertise as you all.  I would like to stand back and
offer another perspective on the query debate that I saw posted as an
outside observer.

There are two basic services that I desire when using a registry service for
lookup.  These are:

	Query:  The ability to locate some information based on a
		  pre-defined taxonomy or classification (I believe that
		  this is referred to as both browse and drilldown and
              focused query in your context)

	Search: The ability to location some information based on
              the information content (I believe that this is
              referred to as Ad Hoc Query in your context)

We have excellent models of both of these services on the internet today.
These are Google, Lycos, or any other of the search engines.  I also
establish that almost every major search engine today is able to respond to
a search request through a POST HTTP request and return a reasonable
sampling of results.

In terms of an ebXML registry service, I anticipate that we can match if not
exceed the efficacy of these lookup services.  We have the following
advantages that they do not:

	Controlled Content:  All content is explicitly registered with
              a repository.  We do not need to rely on web crawlers
              to find it.

	Controlled Taxonomy:  We have the brightest people in the world
              in ebXML working on classifications and core components
              with taxonomies in their business domains.

	Controlled Structure:  A fair percentage of our documents will be
              in XML format.  We have access to an implicit set of meta-
              data in the markup alone.  What a tremendous opportunity
              to go beyond traditional search engines in capability.

Given the current status of search engine capability, I think that we should
ensure that the capabilities of the de facto engines are met and then look
to excel.

I pose the following questions:

	- Given some of the debate on query languages (SQL, OQL, etc.):
        can I use my level 0 zero interface (aka HTTP POST) to
        interface with the registry and get equal to or better results
        than current search engines?

	- Given that the de facto is to be able to search content in
        a search engine that is not even owned by the engine, can
        I search all content in the registry?

      - Given that we are using XML, what are we doing to harness its
        benefits in what we know about the information structure to
        improve on the registry query and search capabilities?

Again, this view is with respect to search and query.

I saw some good things in Tokyo and in recent weeks:

	- The ability to use well a XML registry protocol defined well
        enough to reproduce a taxonomy on a client

	- The ability to register content that is both XML and non-XML

      - An interest in both content and classification level security
        within the registry

and may others.

	IMHO and thanks,

	Mark
References:
- RE: Review of thoughts on Ad Hoc Queries
  - From: David RR Webber <Gnosis_@compuserve.com>