Bizcodes - was (Re: SubmissionPackage DTD)

ebxml-regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Subject: Bizcodes - was (Re: SubmissionPackage DTD)
From: David RR Webber <Gnosis_@compuserve.com>
To: Jon Bosak <bosak@boethius.eng.sun.com>
Date: Tue, 23 May 2000 16:55:37 -0400
Jon,

Somewhere in between a few too many Belgium Leffe beers in
Brussels this whole message got totally twisted around and
screwed up.  Some major clarifications and corrections are needed.

Hopefully I can clarify all this new.  Here are the facts:

1) Bizcodes are explicitly designed to let end users name/label
     business entities ANYTHING they like - therefore they are
     quintessentially mnemonic enabling in nature.

2) From 1) it follows this is an instant Tower of Babel, and this is
     the historical problem EDI has faced. To solve it EDI did
     folding of multiple definitions into single business elements.  Now
     you have to know CONTEXT in order to deduce meaning and
     usage.  This as EDI discovered is both expensive to maintain
     and difficult to use (i.e. need expensive consultant to map your
     local usage to the 'standard' usage.  Ken Steel coined the
     term 'dispersed semantics' and designed a repository system
     to solve this.  Unfortunately while Ken's solution is part of the
     answer its' not all of it, and in fact using XML to its fullest
     potential is necessary - to couple meaning, context, and 
     process into a repository reference and retrieval system.

     Worse - people spend years arguing over names and 
      mnenonics exactly because they see they have implicit
     meanings, and they want to avoid two names meaning
     different things colliding out of context.

3) Historical backdrop - DOI - this is an attempt to provide 
    globally addressable labelling for HTML content - see
    http://www.doi.org

4) OK - now we begin to see the premise behind Bizcodes, but
     there's more.  Let's examine current W3C Schema syntax.
     We see overhead, tons of it.  Instead of a simple 
     <!ELEMENT personsname (#PCDATA) >  
     we have potentially lines and lines of syntax to specify all
     kinds of properties and behaviours.

     Now 4GL vendors have seen this all before.  The problem is
     simple.  What if I want to make 'personsname' 55 bytes long
     now instead of 50 bytes?   Ooops, I have to track down all
     those schemas everywhere that have hardcoded definitions
     wired into them.   Therefore - we need to DECOUPLE the 
     definition of the actual entity from its use in a schema.  No
     surprises here.

5) Human usability quotient:  humans remember things based
     on relevance to their own domain, and that the items are
     intuitive and simple.  So as Jon noted we want things in
     context, with nice meaningful names.  This is EXACTLY the
     system that Bizcodes is designed to empower.  Now - back
     to Bizcodes again.  I give you the English dictionary and
     ask "give me a word that means 'complicated'".  You offer me
     50 different words, and some generic ones like '#$%@ed up'.
     Which do I choose?  Here's where we need to know context.
     The XML Topic Map work here is vital.  Coupled this to a 
     Repository - now the repository is NOT a jumble of words in
     alphabetic order - but instead words based on context of use
     and ordered and arranged accordingly.   Now we can use our
     Bizcodes.  For each domain - I want to know what set that label
     belongs to.   If you tell me it is 'stockMaturityDate' I have some
     clues.  But is it wine, cheese, beer, or mutual funds we are talking
     about?  If I expose the Bizcode as MFB01001 the MFB tells me
     it is a mutual fund and the 01001 references its definition.

    So I would expect MFB01002 to be related, and so on.   Notice
    also that if 6 months from now someone decides stockMaturityDate
     really is too unclear, and that this really is assetMaturityDate, how
     simple this becomes to re-map - leave the Bizcode as MFB01001
     and simple change the human readable label.

6) Assigning, managing and utilizing Bizcodes.   Industry groups,
     standards associations and large corporates can develop
     Bizcode based definitions - just like they do today with barcodes.
     You don;t go into a store and ask for a '4520-1200-79001'.  But
      every computer system that touches it thinks it is!  Notice the
      packaging can change, the size, weight price, et al.  That is the
     power of a neutral labelling scheme.   Every Bizcode prefixed 
     with UCC tells me it is the grocery industry domain, and so on.

      We picked 3 bytes alpha and 5 numeric as a manable limit,
      and also because 8 bytes seems a good pointer length 
      for legacy systems and the right order of combinations.
      Notice in the UCC case there are about 7 million barcodes, 
      but that about 3,500 Bizcodes are needed to describe all
      the business attributes of those 7 million (weight, colour, size,
      price, etc) because they are re-usable.

      Notice you can also assign Bizcodes in a logical way 
      (UNSPSC codes is an example - but not one I'm wildly in love
        with - too many #'s - not enough alphas!), and this is in part
        what Jon is after - don't give me a random string - give me
       something a warehouse quartermaster will love to classify
       with.... 'all those racks over there have got 10-93's on them....'.

       The more you do this however - the more expensive the
       codes themselves become to assign, maintain and 
       validate.  This is why the barcode system of simple
       sequential number is the simplest.  If people want to add
       implied meaning - then they have to accept that cost
       and overhead themselves - see http://www.udef.org

7)  Now we come back to the Schema syntax burden.  What if
      only line of the schema syntax was needed to reference
      each element?  And what if that line always had the same 
      format?  And what if the line was parameterized by namespace?
      Wow!  We have something which is simple, consist, and
       predictable.  That is how Bizcodes work.

       There's lots of ways of doing this - here's one using simple
       attribute in a DTD: so back to our ELEMENT definition:

       <!ELEMENT personsname (#PCDATA) >
       <!ATTLIST personsname
                              bizcode CDATA #FIXED "UNE03004">

      Also - when migrating legacy EDI dictionaries we can
      use automated processes to assign Bizcodes, rather
      than an expensive manual process being needed.
       Then there is those lines of COBOL copybooks & 
      CICS maps to tackle too....

8)  Repository linkage with Bizcode and Parser.

      Armed with the fact that 'personsname' is UNE03004 
      I can now query the repository via its API and return 
      whatever semantic definitions (in XML) that I require.

      Notice if instead (as some are proposing) I had used
      the 'personsname' I would have a much tougher time.
      
      First 'personsname' may occur in multiple repositories
      with different meaning, and versions.  UNE030041 tells
       me that the owner is UNEDIFACT - so I go straight to
       the right repository, and entry 030041.

       Using XLink I can insert selected pieces of the XML
       definition retrieved from the Repository (Bizcode being
       an 8 byte label means it is XML ID name compliant) 
       straight into the DOM where application software can then
       use it.  This then dovetails beautifully into what ebXML core 
       components are looking to allow software to do.

In conclusion, please see http://www.bizcodes.org for tons
of related materials and more details.   

Thanks, DW.
======================================================      

Message text written by Jon Bosak
>Having said that, I would like to venture a modest proposal that
may help to allay the concerns of both those who fear a
terminology struggle between EDI factions as well as those who
seek political correctness in the choice of language.  I suggest
that wherever we need a list of terms in circumstances where the
choice of language is computationally insignificant, we use Latin.
The character set is even smaller than US ASCII; it's as
politically neutral a choice as can be achieved in terms that are
mnemonic to anyone; and if the terms are intelligently selected,
they will be mnemonic to almost everyone actually involved in this
work.

I originally put this proposal forward as a joke, but compared to
the absurd conclusion that the optimum solution is the one that
inconveniences the greatest number of people, it's starting to
look genuinely attractive.

Jon

<
Follow-Ups:
- Re: Bizcodes - was (Re: SubmissionPackage DTD)
  - From: Jon Bosak <bosak@boethius.eng.sun.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]