Re: Proposal for ebXML Regrep Query

ebxml-regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Subject: Re: Proposal for ebXML Regrep Query

From: Farrukh Najmi <najmi@east.sun.com>
To: Duane Nickull <duane@xmlglobal.com>
Date: Thu, 01 Feb 2001 15:56:49 -0500

Duanne,

DISTINCT in itself is not dangerous to performance in an tuned
implementation. In a tuned schema (as opposed to a toy implementation) one
or more attributes would be declared as primary keys which are always
indexed as well as additional attributes that are also indexed. The result
would be blazing fast comparisons.

Fear around DISTINCT is unwarranted. The real questions is whether the need
for DISTINCT semantics can be justified by real world use cases.

Duane Nickull wrote:

> Len:
>
> The work you have submitted has some good points and also some ones
> which I believe could cripple the performance of the Registry,
> shoudlthey be implemented.  The particular OQL construct I am concerned
> about is the use of an SQL "DISTINCT" in the query string.  In order for
> us all to understand this,  here is aquick tutorial on what Distinct
> does and why it is so dangerous to performance:
>
> ************************
>
> The DISTINCT keyword is used to return only distinct (different) values.
>
> To understand how it differes from "SELECT, lets' look at SELECT. The
> SQL SELECT statement returns all information from table columns,
> regardless if there are duplicates in the columns. DISTINCT takes the
> basic results from SELECT, loads them into memory and runs a compare
> type function against them to weed out duplicates.  The process is
> *very* processor expensive.
>
> For SQL, all we need to do is to add a DISTINCT keyword to the SELECT
> statement with the following syntax:
>
> Imagine we have the following in a Registry:
>
> Party            ID
> CommerceOne      12334
> XML Global       2334
> DataChannel      9993
> IBM              2119
> CommerceOne      23334
>
> If we run the statement:
>
> SELECT  Party FROM ID
>
> It will return:
>
> Party
> CommerceOne
> XML Global
> DataChannel
> IBM
> CommerceOne
>
> Note that CommerceOne has appeared twice in the results.
>
> By using
>
> SELECT  DISTINCT Party FROM ID
>
> Party
> CommerceOne
> XML Global
> DataChannel
> IBM
>
> This time it is only listed once.  It is possible that we want this
> sometimes by why do we need it in ebXML?
>
> In order to run the Distinct from a large table and column structure, a
> *HUGE* amount of memory needs to be allocated for sort and compare
> utilities.
>
> USE CASE QUESTION:
>
> ebXML has mandated the use of Globally Unique Identifiers.  The data
> itself will only return one result.  If it returns two or more results
> for a UID key,  then there is an error and the UID key is not unqiue.
>
> As far as I know,  there are no other keys which are required to be
> unique by ebXML.  Therefore,  why is the select DISTINCT proposed?
>
> Also contextual searching works great here.  If we have Core Components
> and the structure of the XML document has a node like this:
>
> <Component>
>   <UID>12345</UID>
> ...
> </Component>
>
> all we really need to do is ask our query manager for a "URI" for
> instances of meta data where 12345 is inthe context of <UID>.
>
> This does work.  The actual syntax of the query TBD.
>
> Let's discuss this.  Comments?
>
> Duane Nickull

--
Regards,
Farrukh

begin:vcard 
n:Najmi;Farrukh
tel;work:781-442-0703
x-mozilla-html:FALSE
url:www.sun.com
org:Sun Microsystems;Java Software
adr:;;1 Network Dr. MS BUR02-302;Burlington;MA;01803-0902;USA
version:2.1
email;internet:najmi@east.sun.com
fn:Farrukh Najmi
end:vcard

Follow-Ups:
- Re: Proposal for ebXML Regrep Query
  - From: Duane Nickull <duane@xmlglobal.com>

References:
- Proposal for ebXML Regrep Query
  - From: Len Gallagher <LGallagher@nist.gov>
- Re: Proposal for ebXML Regrep Query
  - From: Duane Nickull <duane@xmlglobal.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]