Subject: Re: Proposal for ebXML Regrep Query
Duanne, DISTINCT in itself is not dangerous to performance in an tuned implementation. In a tuned schema (as opposed to a toy implementation) one or more attributes would be declared as primary keys which are always indexed as well as additional attributes that are also indexed. The result would be blazing fast comparisons. Fear around DISTINCT is unwarranted. The real questions is whether the need for DISTINCT semantics can be justified by real world use cases. Duane Nickull wrote: > Len: > > The work you have submitted has some good points and also some ones > which I believe could cripple the performance of the Registry, > shoudlthey be implemented. The particular OQL construct I am concerned > about is the use of an SQL "DISTINCT" in the query string. In order for > us all to understand this, here is aquick tutorial on what Distinct > does and why it is so dangerous to performance: > > ************************ > > The DISTINCT keyword is used to return only distinct (different) values. > > To understand how it differes from "SELECT, lets' look at SELECT. The > SQL SELECT statement returns all information from table columns, > regardless if there are duplicates in the columns. DISTINCT takes the > basic results from SELECT, loads them into memory and runs a compare > type function against them to weed out duplicates. The process is > *very* processor expensive. > > For SQL, all we need to do is to add a DISTINCT keyword to the SELECT > statement with the following syntax: > > Imagine we have the following in a Registry: > > Party ID > CommerceOne 12334 > XML Global 2334 > DataChannel 9993 > IBM 2119 > CommerceOne 23334 > > If we run the statement: > > SELECT Party FROM ID > > It will return: > > Party > CommerceOne > XML Global > DataChannel > IBM > CommerceOne > > Note that CommerceOne has appeared twice in the results. > > By using > > SELECT DISTINCT Party FROM ID > > Party > CommerceOne > XML Global > DataChannel > IBM > > This time it is only listed once. It is possible that we want this > sometimes by why do we need it in ebXML? > > In order to run the Distinct from a large table and column structure, a > *HUGE* amount of memory needs to be allocated for sort and compare > utilities. > > USE CASE QUESTION: > > ebXML has mandated the use of Globally Unique Identifiers. The data > itself will only return one result. If it returns two or more results > for a UID key, then there is an error and the UID key is not unqiue. > > As far as I know, there are no other keys which are required to be > unique by ebXML. Therefore, why is the select DISTINCT proposed? > > Also contextual searching works great here. If we have Core Components > and the structure of the XML document has a node like this: > > <Component> > <UID>12345</UID> > ... > </Component> > > all we really need to do is ask our query manager for a "URI" for > instances of meta data where 12345 is inthe context of <UID>. > > This does work. The actual syntax of the query TBD. > > Let's discuss this. Comments? > > Duane Nickull -- Regards, Farrukh
begin:vcard n:Najmi;Farrukh tel;work:781-442-0703 x-mozilla-html:FALSE url:www.sun.com org:Sun Microsystems;Java Software adr:;;1 Network Dr. MS BUR02-302;Burlington;MA;01803-0902;USA version:2.1 email;internet:najmi@east.sun.com fn:Farrukh Najmi end:vcard
Powered by
eList eXpress LLC