ebxml-regrep message

Subject: Problems with Classifications
From: Len Gallagher <LGallagher@nist.gov>
To: "Nieman, Scott" <Scott.Nieman@NorstanConsulting.com>
Date: Fri, 29 Sep 2000 13:31:25 -0400

ebXML RegRep group,

At our teleconference yesterday we agreed to go with version 0.2 of the
Information Model specification for the Proof of Concept (poc)
demonstration in Tokyo. That seems OK since a demo can always work around
problems.  However, I think the representation of classification schemes in
version 0.2 is broken and implementors should be aware of that as they
prepare their products for the demo.

Consider the following simple classification scheme for wood (W) that could
be used to classifiy all wood as either hardwood (HW) or softwood (SW).

           W
         /   \
        /     \
       HW     SW

This would be represented in the version 0.2 information model as 3
ClassificationNode's, where each node is a ManagedObject with one
additional attribute, namely a pointer to its parent. So the three nodes
would be represented as:

    (W, -)
    (HW,W)
    (SW,W)

These three nodes are stand alone objects, the model does not capture the
fact that the definer thought of them as a group of three nodes to
determine a classification scheme.

Later a different definer wants to create a classification scheme for
materials (M) and wants to re-use the existing scheme for wood. The intent
is that all materials will be classified as plastic (P), wood (W) and its
subclassifications, or other (O), and the intended classification scheme is:

            M
          / | \
         /  |  \
        P   W   O
           / \
          /   \
         HW   SW

The new definer creates the new nodes for M, P, and O, but wants to
reference W for wood. There is no way the second definer can have W point
to M as parent because W already has a null pointer for its parent
attribute, and that node is owned by a different definer. So the definer
will have to create a new node W' that points to M as its parent and
somehow has a Ref to W as that portion of the hierarchy. We'll get the
following representation:

    (M, -)
    (P, M)
    (W',M) with Ref to W
    (O, M)

The only way to capture this reference to W in the existing model is for
there to be a strong 1:1 association from W' to W captured in the metadata
for W'. This is possible, but the association would have to be uniquely typed.

Now suppose a product is classified as HW, and a user asks the registry to
provide the path to HW from the root node. The representation is broken
because there is no unique path to a root; instead, there are two choices:
W-->HW  derived from the first classification scheme or  M-->W'-->HW
derived from the second classification scheme.

There is no way for the system to determine which path is intended because
the system didn't retain the information that there were two separate
classification schemes involved in the definitions! And the user has no way
to communicate to the system which classification scheme is intended.

SOLUTION  

It was a mistake to try to simplify the notion of classification scheme by
deleting the notion of a classification scheme being a separate object with
a fixed set of nodes as its content. We need to go back to the concepts as
discussed in version 0.1 and think of a classification scheme as a distinct
object, with a fixed number of nodes, and a partial ordering over those
nodes to determine a fixed hierarchy. 

All that is needed to achieve the desired result is an identifier for each
separate classification scheme, say A for the original scheme and B for the
derived one. Then the classification HW specified by the user could be
qualified by the intended classification scheme, A or B, to determine which
path is the correct root-to-node path for that node. 

As a follow-on, we could relax the requirement that each node be a
ManagedObject and instead only require that each classification scheme be a
Managed <space> Object with metadata captured in a corresponding
ManagedObject. Then one could easily create new classification schemes
using existing classification schems for its various nodes, with no
abiquity when questions involving predecessors, descendents, or levels are
posed. The UML diagram I distributed earlier this week defines the
necessary subtype relationships and associations among
ClassificationScheme, ClassificationItem (or Node), and
ClassificationLevel. The diagram is missing the notion that a
ClassificationItem could reference another ClassificationScheme, but that
is an upward compatible extension.

-- Len

p.s.  May I again respectfully submit that Managed <space> Object be called
a RegisteredObject, or possibly a ManagedObject if that fits better with
ebXML terminology in other working groups, and that what's currently called
ManagedObject be re-named a RegistryEntry. This would relieve untold
confusion!  Too many people think of the managed object being the Profile
or BusinessProcess that is registered instead of the metadata for that object.




At 06:18 PM 9/28/00 , Nieman, Scott wrote:
>The dialup information is:
>USA: 800.892.0357  
>Sorry no toll-free for International callers: usa 612.352.7899
>25 call-in sites are reserved	
>
>Meeting ID: 0942
>
>Agenda:
>1) registry service v0.8 review
>2) repository information model
>
>Scott

**************************************************************
Len Gallagher                             LGallagher@nist.gov
NIST                                      Work: 301-975-3251
Bldg 820  Room 562                        Home: 301-424-1928
Gaithersburg, MD 20899-8970 USA           Fax: 301-948-6213
**************************************************************
Follow-Ups:
- Re: Problems with Classifications
  - From: Sam Hunting <shunting@ecomxml.com>
References:
- ebXML Registry and Repository Teleconference 10/6/200 1:00 P.M. C DT-- TWO HOURS MAX!
  - From: "Nieman, Scott" <Scott.Nieman@NorstanConsulting.com>