ebxml-core message

Subject: RE: Getting Back to Basics - How to describe Dates and Times andEvents?
From: Todd Boyle <tboyle@rosehill.net>
To: "William J. Kammerer" <wkammerer@foresightcorp.com>,ebXML Core <ebxml-core@lists.ebxml.org>
Date: Mon, 09 Apr 2001 23:32:34 -0700
W Kammerer said, regarding my date object,
> <transactionDate> snippet still has remnants of specificity in the
> name - the "transaction" should be dropped from the label, or maybe it
> should be called <DateTime>.

That's right but here's how that happened: there's a transactionDate
on the header and entryDate on each row.  I want the GL messages
to be read or written by very low-tech parsers.  If every element 
name is unique, you can search for a tag and know exactly what 
you're dealing with when you find it. 
 
> Todd is "...still not convinced to support multiple 8601 datetime
> formats, or the xsd:datetime strings."  But he doesn't have to
> explicitly support multiple formats - the ISO 8601 Date Time format can
> unambiguously represent all sorts of dates, times, Date-Times or
> periods - there's no need for qualifiers.  I suspect we can get by just
> defining a sub-element <value> as xsd:dateTime. 

That may be true but I want the string parser to be able to find
dates and segment qualifiers without having to support many kinds 
of datetime formats.  When you're actually writing code to interface
the GL schema with some desktop app, you are dead in the water if 
you have to support all the format options in xsd:datetime or
ISO 8601.   Stubbornly I insist on treating these XML ledger messages
as just another ascii-delimited format, except that it is also 
forward-compatible to be accessible by XML parsers.  I could be wrong. 
Any comments are welcome. 

> follow along in http://www.unece.org/trade/untdid/d01a/content.htm.
> 
> The <docRef> example avoids having separate derivations for
> "invoiceNumber", "poNumber", etc. and instead uses something like
> Edifact 1153 strings:
> 
> <docRef>
>   <docType>Invoice</docType>       (required)
>   <docNum>ABC-12345</docNum>       (required)
> </docRef>
> 
> The <docType> element presumably is somehow derived from EDIFACT D.E.
> 1153.  For simplicity, the ultimate type should use an EDIFACT code
> value (rather than the English name of the code).  And documents are
> better described by D.E. 1001  (Document name code) as used in composite
> C002 DOCUMENT/MESSAGE NAME, rather than D.E. 1153 (Reference code
> qualifier) as used in the composite C506 REFERENCE.


Yes, my docType element is intended to enclose a fixed, finite set of
strings having normative meanings such as EDIFACT 1001 document types.
Did I say 1153?  Thanks for the correction.

So we have something which is actually metadata, but it is outside the 
tags.  I don't see a problem with this and relieved to see XEDI and others
using this approach.

Frankly the users of my GL schema would agree on some strings fewer 
than the full selection of EDIFACT 1001 document types.  I hope 
ebXML or EWG identifies a smaller subset.  I hope to identify
20 or 30 document types as starter set for SMEs.  The lightweight,
fast, cheap model blows up if it has to support ten lists of 25K
each, over 500 items like "Declaration regarding the inward and 
outward movement".  Remember we're hoping for something that does
not require customization *in most cases.*  This is not the same
goal as EDIFACT "all encompassing".  (If its so bloated it cannot
be handled by SMEs resources then that's not "all encompassing" 
anyway.)  

For a reality check notice that Intuit has 80 or 90% of the SME 
market and has published their QBXML vocabulary, having a core 
vocabulary of 391 words (5.4K)  Their whole vocabulary is less than the 
579 metadata types just in D.E. 1001 Document types (15.4K). 

If 391 elements is enough for 90% of the SME market, that's 
good enough for me.  Nobody needs a GL with a vocabulary 
of 10,000 words.

But the docType class will do EDIFACT D.E. 1001 if you want to.

> Todd's last example:
> 
> <code>
>   <codeList>DUNS</codeList>              (required)
>   <codeValue>999999999</codeValue>       (required)
> </code>
> 
> looks more like a Party ID based on a DUNS.   So instead of the generic
> name <code>, it should be something like <PartyID>.  

Oh yes - of course.  The "code" class is used for product and a bunch
of other crap.  I believe the same "code" class needs to be used internally
by a webledger or web app for most codes, because they are flowing across
company lines now.  FOr example many POs have things like

 <buyerPartNum/>
 <sellerPartNum/> 

Why shouldn't the values be namespace:value pairs.   Why shouldn't
everybody just start using shorthand, like DUNS:2342342342 or 
EAN:1231123123?  This would simplify everything so much.  I thought
this was what XML Namespaces was supposed to give us, back in 1999.  
The technology is just sitting there.   Oh well.  Not within my
power to influence these things at all.  So I chisel away with
my tag structures.

Anyway, my idea of a party is that it uses the same "code" class
as product and a lot of other components, something like

<party>
   <code>
     <codeList>DUNS</codeList>              (required)
     <codeValue>999999999</codeValue>       (required)
  </code>
</party>

So, you and I are still using the specific word, "party"! Do you
feel unclean?  You could be meta meta meta and have a three-column
general ledger like this:

Date
Amount
Code+   and define the code class to have (codetype, codevalue, codelist)

CodeType would include "accountCode", "partyCode", "journalCode", 
"taxCode", everything in the GL. Why arent everybody doing this... :-O

> The EDIFACT
> composite C082 PARTY IDENTIFICATION DETAILS depends exclusively on the
> ubiquitous 1131/3055 D.E. pair to specify the type of Party ID.  For
> example, if you have D.E. 3055 Code list responsible agency code set to
> '9' - EAN (International Article Numbering association), it's clear that
> you're using a UCC/EAN GLN (Global Location Number - kind of like a
> D-U-N-S). Alternatively, if D.E. 3055 is '16', we're relying on Dun &
> Bradstreet to supply the DUNS.

This is interesting-- I am too tired to get all the implications of that,
right now.  The one thing that pushes my buttons is the substitution of
EDIFACT alphanumeric codes for the underlying string.  

A word about compression.  If todays CPUs and compression algorithms
were available when EDI was conceived, does anybody seriously think
EDI would look the same as it does today?  Especially if we had
XML with fixed vocabulary of maybe 100 words, in the SME subset of
my GL schema.  The compression software would not even need to 
index the known tags, it could tokenize them on the fly and it would
only have to compress the string values in the XML file.  Thing would
scream.  And the files would be smaller than even a compressed 
edifact file. Because your EDIFACT compression would have to both
index and compress the 2 and 3 digit codes.  IF my schema has 100
words it just substitutes a 1 byte token for each tag or endTag,
because it knows them all.  

Anyway the readability dominates the equation.   My belief is the
readability of the XML document is so important, especially over 
the coming years of implementation, that the original strings should
be used.   

Everything will not go smoothly when SMEs start sending and receiving
orders, invoices and payments over the internet.  Your office PC guru
has to be able to find and fix things.  Ordinary developers who don't 
know EDIFACT need never to learn it, or even to know the document 
metadata is subset of EDIFACT. This saves Billions in training costs. 

Human readable XML also may have some comfort factor for SMEs, 
which is important in this adoption phase.  For example, they will
want to know what's going in/out to the internet.  With XML, they
don't have to trust the vendor quite as much; they can read it 
with a browser.  And apply non-EDI tools for logging, reports etc.

I want to see lots of people write simple little interfaces on their
business applications, and amortize the cost over 10 years of stable
use, just like EDI users today. An invoice is an invoice.  an amount
is an amount. 

Thanks William for all your tips and references whew, you have me
working overtime this week.

Todd
References:
- Re: Getting Back to Basics - How to describe Dates and Times andEvents?
  - From: "William J. Kammerer" <wkammerer@foresightcorp.com>