Subject: FW: re "What is a Repository?"
<Response> <Intro>Yes, everyone I have been on vacation, and will be on vacation tomorrow 12/31 as well. I will TRY to respond to some the messages today. Terry, here are some responses to your Notes. And YES, this MAY seem like technical mumbo jumbo, but I believe we need to get down to this level to deliver a specification/ system that provides SIMPLE interfaces. The SIMPLER the interface, quite often the more difficult the implementation - "information hiding" at its best. </Intro> <Subject>Mapping to the M3 level <Terry>"unnecessary work to do the decomposition"</Terry> <Scott>I do not agree that it is unnecessary to map to M3. If a goal is to support the software development life cycle (SDLC [ISO 11179]), it seems mandatory to me unless you have an infinite budget. As Giuseppe Facchetti from IBM pointed out in his XMI presentation, there is a significant time, labor, and cost savings to map from M2 to M3. Mathematically this is combinational theory: there are [n!]/[r!*[n-r]!] mappings to map M2 to M2 where n is the number of M2 levels to map taken r at a time where r=2. Therefore the M2 to M2 approach ultimately results in unmanageable spaghetti. By mapping an M2 instance to M3 the total number of maps is 2n. In other words, if I wanted to generated Java code from XML Schema, I only need to map XML Schema to the M3 level ONCE and let the M3 to Java mapping accomplish the code generation. Therefore, once I have developed a map from the XML Schema metamodel to the M3, I am done with my work and I can use a transformation tool to persist an XML Schema document into the repository. Otherwise I would be busy mapping XML Schema to Java syntax one day (more like months - which I see some XML vendors attempting to do this now -- oops!), C++ next, Smalltalk, Perl, Python, and so forth. This approach suggests that it also becomes the responsibility of the Java community to manage the M3 to Java mapping (which they already have signed up for from a MOF perspective). Same for the C++ community etc., everyone manages their own destiny. DTDs and Schema could easily be served up in the M3 form, but they would likely be a parsed and reconstructed on demand, and if performance becomes an issue, parsed AND stored in their original form as a blob or file. Finally, I believe, even though this is out for debate, that a truly bidirectional isomorphoric repository cannot exist at the M2 level since one M2 metamodel may produce more or less artifacts than another M2 metamodel [ref: Object Oriented Metamethods, Brian Henderson-Sellers 1997]. That means the M2 repository has a specific, limited focus, instead of focusing on the SDLC. It is also not extensible. Therefore a M2 level based repository may not be able generate a mapping to another M2 metamodel since the M2 repository may not be able to completely hold specific information required by the other M2 metamodel. An M3 level repository allows one to "fill in the blanks" and store the required information for the targeted M2 physically in the repository. Another M2 metamodel may never need that stored information so it would be ignored in its mapping from the M3. I will try to put a page up on the site after the new year trying to clarify this. </Scott> </Subject> <Subject>Scope of ebXML <Terry>"For ebxml purposes, which I take to be the storage of schemas for defined e-commerce documents"</Terry> <Scott>I would like to relate this to the eCo Framework if possible. In that context, I believe our domain is the Services, Interactions, Documents, and Information Items, not just documents. Certainly, TMWG has been focused on the Services and Interaction layers through the promotion of UML business process modeling, as well as generating the Documents and Information Items from these models. (Personally I view the eCo Framework as the closest 'physical' thing to the ISO 14662 Open-edi Reference Model, if minor tweeks are made to it.) In the formation of the ebXML project, there was much discussion surround reuse of the OASIS repository effort, to see if it could also fit the needs of TMWG for storing UML business process models. One could argue that we may need to be responsible for all the eCo "type registries" as well, if the Architecture group wants to adopt eCo. </Scott> </Subject> <Subject>Why <Terry> "I am not sure that "why" is required, although I can see the point of it."</Terry> <Scott>Well, to be honest, this uncertainty is one of the very reasons why traditional EDI has failed, because the EDI abuser-base needs to know what context data should be used and what problem it solves (the why). We learned that the best way to put data into context was though process modeling, and that business process improvements can be made through analysis of these models and evaluation of alternatives. This type of analysis is why larger, established organizations have been able to eliminate the transmission of purchase orders, invoices, etc. I relate the "why" to the "Services" and "Interaction" layers of eCo. I want to be able to post the question: "I want to buy a DSL modem, using my corporate purchase card." Parsing this, the "DSL modem" relates to the "services layer", and the "buy" and "corporate purchase card" relate to the "interaction layer". "Why" comes into play because additional information may be required since the corporate purchase card is not exactly a VISA card and we may need to know who the corporation is and qualify them (another interaction potentially). </Scott> </Subject> <Subject>How <Terry>"Even "how" isn't obviously required."</Terry> <Scott>The "how" allows traceability throughout the SDLC to the "tools" that were used in refining a particular model. For example: Someone had an old SGML DTD (M1 level), and loaded it into an M3 repository. I now realize through searching that some of that DTD is useful to extend my UML model since it covers a specialization that is new to me. I add its contribution to the model and I can accurately track my UML models history through versioning. It could also identify an XML Schema was manually created using an editor from one that was autogenerated off a UML model. </Scott> </Subject> <Subject>Component Diagram / Locator Service <Terry>"I can't see any need for the M2 if the objects in the M3 have identifiers". It's true if the M1 uses, for example, URNs, it needs a URN resolution service, but I don't see why that's an M2.</Terry> <Scott>I knew this would throw people off. Associated with the URN is a fragment of the metadata, right? (e.g. a tagname like "firstName" or a DTD itself). ANYTIME you are storing M1 level metadata instances in a database, you need a DB schema that describes "data about metadata". This DB schema is at the M2 level which is defined as a "metamodel" (X3.285 is accurately titled). The URN is a pointer/identifier about that metadata instance to the specific repository instance. Therefore the resolution/locator service has certain metadata search capabilities and links to repository instances if more information is needed. The locator service contains enough of a subset of the metadata to enable this search AND contains the URN to point at the repository. The registry is purely administrative, storing ONLY information like "Joe Young from Norstan Consulting registered XYZ DTD on December 28, 1999" etc. Its database is at the M1 level since its instances are actual company, people, date/time object instances. What we are trying to show through the component diagram is since there are different types of information to be stored (M0 administrative object instances and M1 metadata object instances), it is best to partition them into separate software components since their functional scope is unique and their software interfaces are going to be different (which will be clearer as the modeling continues). They MAY physically be deployed on the same server that may be considered the "registry", but that is not necessarily the case since different technologies may be used in its implementation. </Scott> </Subject> </Response> -----Original Message----- From: Terry Allen [mailto:email@example.com] Sent: Thursday, December 23, 1999 12:57 PM To: firstname.lastname@example.org Subject: re "What is a Repository?" Notes on "What is a Repository Anyhow" (metalevels.html) Terry Allen This makes sense to me; I take note that the goal is "to store the model so that a development tool could use the information", which is more specific that what OASIS is doing, and requires, as the OASIS spec does not, that the contents of data element dictionaries be decomposed into a common format - or, in the words of this document, "mapped to the M3 meta-metamodel level." OASIS isn't requiring this because one of our goals is to enable DTDs and schemas to be served for parsing of instances, and it seems like unnecessary work to do the decomposition. For ebxml purposes, which I take to be the storage of schemas for defined e-commerce documents, the decomposition makes sense (although for e-commerce in general there will be some documents that are treated as attachments and the schemas for which needn't be stored in this repository). "The question is whether these documents should be parsed into their subatomic artifacts and stored into indexed tables in a relational database, or do we need to wait for query technologies such as XQL to enable high performance search capabilities?" Not my area of expertise, though we can ask for opinions from others of the OASIS Regrep TC. It isn't a new question, and I believe the SGML world has learned to live with relational databases just fine. I do know at least one person who's not using a database at all (and it's not me using my file system and grep ...). The issue is probably one of what tech is available for use at our target date (whenever that is). "The Registry - The main aspect is that a registry implies "to register" meaning: what am I registering, who am I, when did I register it (also implies what version), how was it created, why did I create it (what problem domain) and where is this information located. The "who", "when" and "how" are administrative functions. The complications occur with the "what" and the "where". If the "what" really resides in a repository "somewhere", there must be sufficient information about the "what" to point to the "where" WITHOUT a complete replication of the repository or multiple repositories for that matter (try stating that 100 times fast). Specifically some metamodel representations such as UML can generate great VOLUMES of rich, semantic information. What is the answer to limiting the amount of information in the registry, but ensuring enough to find the appropriate information within the repository?" I am not sure that "why" is required, although I can see the point of it. Even "how" isn't obviously required. For OASIS we made this the 11179 administrative metadata plus a couple classifications (see the some of the .ent files at the OASIS site for these). I suppose the answer here relates to "what do you want to see in the interface to the registry?" X3.285 and metamodels. I agree it's confusing, and in fact I'm waiting for the next Open Forum to get caught up on X3.285 again. Intellectually I'm uncomfortable with the OMG requirement for a M3 level, but not enough to worry me. Especially as others seem to like it! The diagram with M1, M2, and M3 doesn't not make sense to me; I can't see any need for the M2 if the objects in the M3 have identifiers. It's true if the M1 uses, for example, URNs, it needs a URN resolution service, but I don't see why that's an M2. Reading the description below the diagram, I have to say I think the "locator service" is just part of the registry (the M1). Why isn't it? As for the conclusions: 1. the UML Use Case modeling continue to serve as the basis of the ebXML Registry and Repository effort without jumping into an implementation, 2. X3.285 information utilized as much as possible with potential convergence of the repository functionality to a pure meta-metamodel such as the Meta Object Facility, and 3. the revised work effort be submitted to SC32 and X3L8 for review and incorporation into ISO 11179. are all fine by me. I believe that what's in the OASIS spec is a bottom-up approach to the problem that ebxml is attacking from the top down, and that as we're in agreement on the core metadata, we'll be able to hook up in the middle, perhaps with some adjustments on either end. I'm still concerned about XMI, but I'm sure I'll learn more in Santa Fe, if not sooner. regards, Terry
Powered by eList eXpress LLC