ebxml-regrep message

Subject: Topic 1: Terminolgy alignment
From: Farrukh Najmi - JavaSoft East <Farrukh.Najmi@East.Sun.COM>
To: ebxml repository <ebxml-regrep@lists.ebxml.org>
Date: Thu, 21 Sep 2000 01:19:23 -0400
Background
--------------

As I had indicated there were 4 major areas identified in the excellent review meetings we have had on the 2 draft specs. We need team 
discussion before we reach agreement on 4 broad areas for discussion:

To recap, these 4 ares where:

1. Termonilogy alignment
2. Managed Object Model
3. Association Model
4. Classification Model

This intent is for us to start separate threads of discussion on each of these major areas with the goal of resolving the issues in each area. If 
an issue happens to overlap threads then lets put it in the more specific area. This email starts the discussion on the first topic

------------------------

Rationale And Philosophy
------------------------------
First I would like to give the detailed rationale in the thought process that led me to the terminology choices made in the Repository 
Information Mode (RIS) document. Much of this thought process was developed in earlier drafts of RS spec (v01. to 0.4) that were reviwed 
by the team.

I used all the relevant work that I knew about (ISO 11179, OASIS, UDDI, UML, BP, CC, TP working groups, past experience) as input, 
with the understanding that I would pick the gems from each, improve as needed and avoid the mistakes I perceived they had made. I tried to 
be consistent in terminology where it made sense and improved it where it made sense. I found the OASIS document to be better than the 
ISO document. So I borrowed more from OASIS.

Some examples of exact terminology from OASIS are (description, globalName, mimeType, Organization, Classification, ClassificationItem, 
ClassificationScheme, ClassificationLevel, Query, Request, Association, SubmittingOrganization). As I said earlier, OASIS should be quite 
pleased at the amount of work that is already leveraged in our current specs.

Some examples where we could obviously avoid past mistakes were (regStatus => registryStatus), since it is more obvious. Is regStatus 
registry status or registration status? Well if it is actually registration status then I have made my point that I misunderstood what it meant. 
And besides what is the cost of a few characters in a class definition?

Some basic conventions I followed where those that transcend RegRep but are established standards today. For example I used whats called 
camel case naming convention where class name begins with upper case (upper camel case) and attribute names begin with lower case 
(lower camel case). It is called camel case beacuse of the humps; Multi-word names always use upper case in subsequent names after the 
first (e.g. 'registryStatus' for attribute and 'RegistryStatus' for class name). So for example description was with lower case rather than upper 
case as Len suggested it should be. I am a little surprised by that level of expected consistency ('d' Vs. 'D') with OASIS or anything else. I 
hope that the team does not expect us to use OASIS verbatim.

Team share your thoughts about the correctness of above philosophy.

Attribute name choices
---------------------------
> Excerpt from Len's suggestions

> 16) I'm OK with the attributes for RegistryItem in Section 3.2 as a
> starting point, but think ebXML will want to add more registry-specific
> attributes similar to what OASIS does. In particular, the following maps
> the proposed ebXML attribute names to OASIS attribute names:

General issue I feel stronlgly is that attributes start with lower case and follow camel case. This is established convention and it makes it easy 
to distinguish attributes from classes.

>
>             id  -->  RAitemId

RAitemId is not obvious. I assume it means Registry Authority Item Id. But what it really is I assume is the id for the RegisteredItem in 
OASIS. If so it should be called Registered Item Item Id. But then we have two items. My rational was simply that the ManagedObject class 
has an id so lets call it 'id'. In fact I have since changed it for v0.2 to 'guid' based on feedback from Tech Architecture (TA) team that 
ebXML TA wants to institutionalize Global Unique ID across the board. GUID are also an established standard. I liked that idea a lot. So I 
recomend we change id to guid and we will have alignement with TA for a reason that I can support in good consience.

>            uri  -->  ObjectLocation

uri is more obvious in this age of the internet. It gives you an idea of the data type and underlying concept without even saying it. Team share 
thoughts.

>           type  -->  DefnSource, PrimaryClass, SubClass

I actually would recommend getting rid of type all together. The class already knows its type. That is why we have a class hieracrchy and not 
just ManagedObject. Also, we had established in earlier meetings that the OASIS PrimaryClass, SubClass was inadequate. My 
approach is to identify tyTeam share thoughts. 

>           name  -->  CommonName
I will next recommend we get rid of globalName. If that is done there is only one name associated with the object in its metadata and calling it 
name should suffice. After all what does 'Common' add or convey? Team share thoughts.


>     globalName  -->  AssignedURN

In the review I pointed out that OASIS had this mixing of 2 distinct concepts and needs:

-The need for unique identification for reference puproses (ala database primary key)
-The need for a user friendly name associated with that object

I pointed out the the user friendly name does not need to be unique. Only the guid needs to be. ebXML TA has an established position on 
GUIDs as I said earlier and one that I strongly endorse. I will go on a long limb here and say we get rid of globalName altogether. Team 
share thoughts.

>    description  -->  Description
>       mimetype  -->  MimeType

Camel case is the clear choice. This sort of critique makes me uneasy. OASIS should be happy we are consistent to the degree that this 
document already is.


>   majorVersion  -->  partially to Version
>   minorVersion  -->  partially to Version

This is not a naming issue but a modeling one but.... I chose to not have a separate Version object in the model which encapsulated the two 
objects because it seemed to be unnecessary overhead. That would have led to a reference from a ManagedObject to a Version object. In a 
relational DB implementation it would be an extra join for no good reason that I could think of. So unless we identify a good reason I vote we 
opt for simplicity (fewer classes in model, fewer joins, less comlicated queries in implementations). Team share thoughts.

> registryStatus  -->  RegStatus

I have already made the point on obviousnes. Is RegStatus registry status or registration status? Whichever it is, why dont we spell it out? 
What do we save by abbreviating? 4 bytes in a repository that is likely to hold terabytes? FWIW, I originaly called it just plain status. I 
borrowed what made sense from OASIS that it is better to say it is a registryStatus so we should use the better name from OASIS, only get 
rid of the abbreviation. This is a good example of borrowing a good thing but improving upon it if needed.

ExternalData Vs. RelatedData
------------------------------------
> 9) With the above clarifications, I'm OK with the content of Section 2.3.2,
> although I'd prefer a term for these objects as something other than
> "External Object".  How about "Related Data".  Note that there's no
> requirement that these things even have a persistent object identifier,
> just a Name and a URL is sufficient.

The concept is exactly that of OASIS RelatedData. I thought it was a good idea, and I borrowed. It is a named hyperlink with none of the 
overhead of the managed object.

It was a good idea but I felt it needed improvement. My problem with RelatedData was that it was ambiguous. The key concept is that the 
object is external and that it's life cycle is not managed by RS. Related is not the key concept because two objects that are associated could 
be considered as related. A TPA that use a Process could be considered Related to the innocent bystander. External was my first stab. I am 
open to other suggestions. However, Related is ambiguous and ambiguity is something I would like us to avoid.

The 'Data' part did not feel right because I felt that everything in the model is an Object (more on this in topic 2 tommorow). Having a 
common base class for everything in the model has been a very valuable thing in SmallTalk and Java compared to C++. It gives you a place 
to put functionality that you want available consistently in the entire model.

Some alternatives that I had thought of were UnmanagedObject, HyperLink, LinkedObject. In retrospect I think I actually prefer 
LinkedObject more than ExternalObject. Team share your thoughts.


Term Object
---------------
I believe it was Scott or Len who said in meeting that we should get rid of the use of 'Object' every where.

In initial drafts I used the term Document instead of object. So there was a DocumentManager instead of a ObjectManager. It was pointed 
out in previous reviews that based on decisions made in Brussels meeting, the term Object was better since it generalized documents and 
allowed for content that was not documents (e.g. a java jar file with code implementing some process). I immediately loved the excellent idea 
and changed all references from Document to Object. IMHO, Nothing represents and instance of 'stuff' better than 'Object'. It is more 
obvious than 'item' or 'component' oe 'entry'. It is intuitively obvious to most programmers. BTW we need to make sure that we have a 
process in place that does not revisit old decisions without a solid reason. It will slow us down otherwise. Scott please take note of process 
issue.

'Managed Object' Vs. 'Registered Object'
-------------------------------------------------

> 4) In Section 3, I'm having some difficulty with the treatment of nearly
> everything as a Managed Object. Clearly, everything the information model
> talks about is an object managed by the Registry/Repository, but not
> everything has the attributes that are specified for Managed Objects.
> Figure 4 says that things like Associations, Classifications,
> ClassificationLevels, etc., are all subclasses of ManagedObject, but
> clearly not every instance of one of these items has all of the atrtributes
> specified for Managed Objects in Section 3.2.  I think we need to scrap the
> notion of Managed Object as being too broad.  A better notion is Registered
> Object.  Then we could say that every instance of a class in the UML model
> is a managed object, but not every managed object is a registered object.

> 5) I'd prefer to make a further distinction between Registered Object and
> an item in the Registry. Normally, every item in the registry would point
> to a registered object - but there are exceptions!  A registered object may
> be withdrawn, but there still is an item in the registry explaining what it
> was.  Other applications may be pointing to a registry item and we want
> that pointer to make sense, even if the registered object itself is
> withdrawn.  The registry item would still have metadata, giving the status
> of the registered object as "withdrawn" and the effective date of the
> withdrawal. Even after an item is replaced, deprecated, or withdrawn, a
> user could ask "does my registered DTD point to any registered objects that
> have since been modified, enhanced, or withdrawn?".
>
>
> 6) I think the distinction between registered object and registry item can
> be clarified by doing something analogous to what OASIS does.  I.e. a
> registry item is an instance of the RegistryItem class defined in the UML
> information model, and a registered object is an instance of some virtual
> Repository class defined elsewhere. The only access to the Repository class
> is through Registry Services. In my subsequent comments, I'll use the terms
> "registered object" and "registry item" with this meaning. I'll also assume
> that there's been a global replacement in the specification of "Registry
> Item" for "Managed Object", of "registry item" for "managed object", and in
> Section 2.3 of "registry item" for "object".  I believe all of this is
> consistent with the basic definition in Section 2.3 of managed object as an
> instance of metadata, not the content of a registered object.

These detailed comments have overlaps with topic 2, 3 and 4. However, I will try and address the terminology aspects here with details on 
later topics tomorrow.

Here is my thought process  behind why ManagedObject.

I view two types of content in the repository. Content whose lifecycle is managed by the repository (i.e. submitted objects) and content 
whose life cycle is not managed by the repository. The first, I refer to as a managed object. The second I refer to as a external object (which 
we have already discusses earlier). ManagedObject is also consistent with Object Management Service in RS. It is a service that manages 
the life cycle of managed objects.

I think the real issue is that there is confusion between the use of the term 'managed object' when refering to:

-the actual content (e.g. a DTD, a TPA) which in many cases is an XML document and not an instance of a class in the model
-The class in the model that provides meta-data for the actual content, which is in the model and may be implemented as a relational table, a 
java class, a class in an OODB in an implementation of RR.

I actually tried to avoid the confusion in lines 155-158 of RIF v0.1. I guess I failed. I also tried to use 'ManagedObject' Vs. 'managed object' 
where the former was the metadata while the later was the actual content. FWIW, earlier version of RS spec refered to ManagedObject as 
ObjectMetaData. Then I changed for the following reasons:

-It is really ManagedObjectMetadata
-Once we have sub-classes of ManagedObject (e.g. Process, TPA) then do we now have ProcessMetaData, TPAMetaData? It seemed 
that if we could use the same name the relationship would be more obvious.

I see a simple solution to the above confusion. We make sure that we state the ManagedObject' Vs. 'managed object' convention in the 
General Conventions section and make sure all use is consistent with the convention. Team share your thoughts.

Closing Remarks
--------------------

Len (and others) has made a ton of excellent suggestions in his very detailed comments (e.g. Organization has roles of RA SO etc. and not a 
base class for those concepts). Many of those comments fall in the future topics for discussion. Those will be thought provoking discussions 
and will not be obvious or easy to resolve.

However, he has also made quite comments in this terminology topics which I frankly feel where overly prescriptive. I was thinking that the 
OASIS team would be very supportive of v0.1 because of the obvious salute to the quality of their work in the number of concepts, 
terminologies and ideas borrowed. I was actually quite surprised by the level of prescriptivity and expectation of conformance to the level of 
'd' Vs. 'D'.

If the expectation is that we should adopt OASIS then that is not just an RR issue. It has huge implications on almost all WGs because they 
would then have to align their meta-model work to be consistent. Decisions they have made such as TAs guid decision, would have to be 
scrapped.

IMHO, we will do ebXML a dis-service with an approach like that. It will unravel much of the progress we have made so far. The philosophy 
I followed of beg/borrow whatever makes sense and improve as needed, is one that the other WGs are also following.

Finally, I implore the team to focus on the real major issues of getting the modeling issues resolved (next few topics) many of which are in the 
rest of Len's email. They are complex issues that need our collective energies. We need to bring our lifes experiences to help get this right, 
but we also need to bring an open mind to do things better than we have done in our past projects. So let us resolve to quickly resolve the 
terminology issues and begin the hard work ahead.

--

Humbly submitted,
Farrukh
Follow-Ups:
- Re: Topic 1: Terminolgy alignment
  - From: Len Gallagher <LGallagher@nist.gov>
- RE: Topic 1: Terminolgy alignment
  - From: Matthew MacKenzie <matt@xmlglobal.com>