ebxml-transport message

Subject: resend: Re: Feedback on Reliable Messaging v0-06
From: Christopher Ferris <chris.ferris@east.sun.com>
To: ebxml transport <ebxml-transport@lists.ebxml.org>,Marty Sachs <mwsachs@us.ibm.com>
Date: Mon, 14 Aug 2000 06:06:11 -0400
Joe,

We now have a tech writer focusing on the
Message Services spec (formerly message hdrs and
packaging). 

More comments below.

Chris

Joe Lapp wrote:
> 
> Wow, I have to say that this document seems to be the most organized and
> most readable of the bunch.  And the diagrams are terrific!  I've so far
> been withholding my comment that we should have a tech writer pretty things
> up and reorg things before declaring anything public and final.  So
> consider it said for the Packaging and Header specs (sorry guys -- it is
> great material, though).  :O
> 
> Here are my comments and questions.  The only overlap I could find with
> David's feedback was on the issue of retry counts.
> 
> 1) Why does the Reliability spec say that it "changes" the "existing ebXML
> specifications?"  Is implementing the reliability spec optional?  If it is
> optional to support the AtMostOnce property found in the header, the Header
> spec should probably say this.  (See Reliability lines 62 and 86, verbage
> such as "new element" on lines 87 and 88, and all of section 3 "Changes to
> Existing ebXML Specifications."

This qwuestion has been raised this week as feedback from
the POC and in other discussions we've been having regarding
optionality.

We are going to be focusing on a more carefully chosen usage
of the RFC2119 terms (MUST, SHOULD, OPTIONAL, etc.) in the
edits applied this week to our various outstanding documents.
Still not perfect, but we're trying very hard;-)

Anyway, it would seem to me that optional in the sense of
reliable messaging is was meant to mean that one need not
USE reliable messaging features. We are going to be looking
for cases where this isn't clear and cleaning them up. We
hope that we have handled this in the version of the MS spec
that Ralph sent out last night. If there are still any questions
in this regard, we need to raise them as issues and we'll
correct them as part of the final revision process with
the other feedback received.

> 
> 2) Why does ebXML support both a message unique identity and a recovery
> number?  I?m mostly just curious, but part of me is a little concerned.  If
> ever the identifier and the recovery number get out of synch (an error, I
> suppose), and if communicating endpoints are not both synchronizing on the
> same value, havoc could be loosed.  Also, although I think the definition
> is complete as is, it might be helpful to clarify that the identity doesn?t
> change on retransmissions (that is, to say that the identity is associated
> with the message, not with the message-transmission).

This will most certainly be addressed. Jim should be sending out
a revised version of the RM spec next week incorporating feedback
and outstanding issues raised here this week. 

> 
> 3) The ACK request must ride in a normal request.  What happens if too few
> messages are sent in a window to get an ACK in the time required by the
> application?  That is, the application must move on with full knowledge
> that the messages were delivered.  Do we need a way to request an ACK
> without requiring it to ride in a normal request?

Jim, could you add this to the issues list.

> 
> 4) Why define maximum message size? I?m sure there?s a good reason.  More
> working implementations, perhaps, since few people know how to accommodate
> arbitrarily sized messages?  (See note on line 153 in 2.4 "Message Transfer
> Sequence.")

In fact, the max message size is something which belongs in the
TPA. In terms of why it is important w/r/t RM, since persistance is
involved, message size could be a factor.


Suffice to say that the max msg size should be an agreed upon
value reflected w/r/t both participants in a TPA.

I'll forward this to Marty Sachs for inclusion in the TPA
requirements for TR&P.

> 
> 5) The verbage of section 2.5 "Error Detection" (line 163) assumes that
> there is a timeout period for waiting for an ack.  The spec probably ought
> to assert that there shall be a timeout period so that it is a normative
> requirement of the protocol.  And what is the timeout period for a recovery
> message?  Is this left to the TPA to specify?  In BizTalk the timeout
> period is given by the delivery deadline; but BizTalk has a separate time
> for message expiration.  (See line 163 of 2.5 "Error Detection.")
> 
> 6) Why are all messages in the window resent on a transport error?  Won?t a
> transport error apply to just one message?  (See line 175 in section 2.6
> "Window Recovery Sequence.")

I understand that all messages are resent if the ack isn't
received in the expected timeframe. In the case where an
error is detected and successfully transmitted to the
original sender only the unreceived messages would be resent.

> 
> 7) Section 1.7 "Detection of Repeated Messages by the Receiver" (lines
> 187+) outlines a "suggested" implementation.  For clarity sake, we should
> probably separate normative from non-normative material.  That way it
> becomes easy to identify compliance requirements.  Might want to do this
> across all specifications.  The W3C does this.

Agreed. We discussed this today. I think that explicit identification
of what is non-normative (such as a suggested implementation approach
to help clarify) is a good idea.

> 
> 8) What if the transport errors don?t go away?  The spec, as is, indicates
> that the protocol handler would go into an infinite loop.  Likewise, what
> if the recovery sequence never succeeds?  We might want the spec to assert
> that the receiver may engage in multiple recovery retries.  Likewise, what
> if the receipts are never received?  Should there be a maximum retry count
> that applies to all kinds of errors?  Should the protocol state that such a

Yes, the max retries has been discussed. This should be included in the
TPA. I think that it may already do so, if not, I'll add it to the
list of required elements for the TPA and send to Marty.


> maximum must exist, but not state what that maximum should be?  I assume
> this information would be available in the TPA, and no self-respecting TPA
> would be without a maximum retry count, but I think the protocol really
> must constrain the TPA.  I?d think the protocol should at least state the
> kinds of errors to which retry counts should be applied.
> 
> 9) Regarding section 2.8 "Garbage Collection":  First, I?d like to suggest
> renaming the section to something else, because garbage collection has a
> very definitive meaning in distributed communications -- the recycling of
> remotely referenced objects -- and this is not the meaning intended here.

hmmmm, yeah, I guess so.... How about 'Persistance Handling'?

> Second, why isn?t the expiration of a counter sufficient for its removal?
> Regardless of what else is going on, if it has expired, it must be trashed.
>  I don?t understand why the other conditions must also be satisfied.  Maybe
> I?m just not understanding something.  Third, why can?t a counter be
> removed after the window has closed?  Is this because the receiver can?t
> know that the sender will get the ack in time or even get the ack at all?
> 
> 10) I realize that this is an "open issue," but I?m really concerned about
> time synchronization.  This is also one of my concerns about BizTalk.  The
> BizTalk spec claims that time synchronization is not an issue because of
> the latencies involved in net access.  Well, it is possible to engineer
> those latencies down to nearly whatever is required, so I don?t buy the
> BizTalk answer.  Time synchronization is a really really tough issue.

I thoroughly concur! David and I have discussed this as well and
I think that we are in agreement that there needs to be something
which addresses time synchronization. It simply cannot be ignored IMHO.

> 
> 11) BizTalk has the notion of a "processing deadline," which is the time by
> which the recipient application must complete processing the message.  The
> message must carry its processing deadline.  BizTalk also provides a
> "delivery deadline," which appears to be the same as the ebXML "Message
> Expiration Timestamp."  Basically, the sender heeds the delivery deadline,
> taking action on failure to meet the deadline, while the receiver ignores
> it.  The receiver heeds the processing deadline, refusing to ack or forward
> messages that fail to meet the deadline, while the sender ignores it.  I
> think the idea is that the only real deadline is an application-level
> deadline, and the intervening middleware and even the applications should
> be given freedom to apply heuristics in order to try to meet the deadline.
> Does this approach have any merit?

For purposes of RM, you need some notion of a timeout for receipt of
an ack so that you can begin retry/recovery processing.

> 
> 12) Are acks to be sent for messages that arrive at the recipient after the
> message expires?  The spec says that such messages are dropped, suggesting
> that no ack should be sent, but we should make an explicit statement.
> Otherwise some implementations may ack and others won?t.  It seems
> problematic to provide an ack for a message that gets dropped.

We discussed this, but I don't recall a resolution. Jim, if this isn't
in the issues list (and hasn't been addressed yet) would you please
add it? Thanks!

> 
> 13) When a message is retransmitted, must the entire packaged retransmitted
> message be identical to the originally transmitted message?  Some

Yes!

> implementations may be inclined to insert a new timestamp somewhere, or
> perhaps reorder elements, or maybe there is some custom header that the
> protocol is using that it is inclined to update.  Should there be an
> explicit requirement that a retransmitted message be identical to the
> original?  BizTalk goes out of its way to assert this requirement.

It should be made explicitly clear and unambiguous, IMHO.

> 
> 14) We should consider identifying all the kinds of information that must
> be passed to an application by an TR&P engine the fronts an application.
> I?ll call this the "TR&P processor."  Section 2.5 "Error Detection" line
> 169 seems to be the only place that provides such information.  It says
> that when a message is lost the loss must be reported to the sending
> application.  What if the recipient receives an invalid message?  What
> about when retry counts are exceeded, say by the sender sending normal
> messages or the receiver failing to get retransmits after attempting error
> recoveries?

I think that this deserves some discussion. However, it isn't clear
to me that this is in our charter and scope. Firstly, it suggests
an implementation. Secondly, in a synchronous scenario, I can see
error reporting as being handled at the API level. It isn't clear
to me that there's any common handling of this in an asynchronous
case which would be the case for most b2b exchanges.

If a message is undeliverable, what is the application to do with 
this? How will it be reported in a way that ALL applications can
do something meaningful? I think that this is something that you
guys (EAI vendors) would be providing as adaptors to the 
ebXML Messaging Service for the wide variety of applications
that exist.

> 
> What should I tackle next?  I'm not even sure where to find the specs.
> I've been pulling these out of this discussion list.  I need one that
> covers synchronous vs. asynchronous processing, along with one that
> describes error messages.

TPA. Look in the mail list archives if you haven't already got
a copy. 1.0.6 would be the latest draft. It is definitely TR&P related
but is being handled as a separate WG and deliverable. It is
key that we (TR&) all understand TPA and its relationship to
TR&P.


> 
> Hope you're having fun at the F2F.  Sorry I can't be there.  Too much fish
> to fry in too short amount of time.

Maybe you can make it to our interim F2F in Dallas w/o 9/25?

> 
> - Joe
-- 
    _/_/_/_/ _/    _/ _/    _/ Christopher Ferris - Enterprise Architect
   _/       _/    _/ _/_/  _/  Phone: 781-442-3063 or x23063
  _/_/_/_/ _/    _/ _/ _/ _/   Email: chris.ferris@East.Sun.COM
       _/ _/    _/ _/  _/_/    Sun Microsystems,  Mailstop: UBUR03-313
_/_/_/_/  _/_/_/  _/    _/     1 Network Drive Burlington, MA 01803-0903
Follow-Ups:
- Re: resend: Re: Feedback on Reliable Messaging v0-06
  - From: Jim Hughes <jfh@fs.fujitsu.com>