ebxml-transport message

Subject: Re: resend: Re: Feedback on Reliable Messaging v0-06
From: Jim Hughes <jfh@fs.fujitsu.com>
To: Christopher Ferris <chris.ferris@east.sun.com>,Joe Lapp <jlapp@webMethods.com>
Date: Tue, 15 Aug 2000 21:58:40 -0700
Chris & Joe,

Below are my comments on your comments... Don't have all the answers but 
some of this may help!

Jim

At 06:06 AM 8/14/00 -0400, Christopher Ferris wrote:
>Joe,
>
>We now have a tech writer focusing on the
>Message Services spec (formerly message hdrs and
>packaging).
>
>More comments below.
>
>Chris
>
>Joe Lapp wrote:
> >
> > Wow, I have to say that this document seems to be the most organized and
> > most readable of the bunch.  And the diagrams are terrific!  I've so far
> > been withholding my comment that we should have a tech writer pretty things
> > up and reorg things before declaring anything public and final.  So
> > consider it said for the Packaging and Header specs (sorry guys -- it is
> > great material, though).  :O
> >
> > Here are my comments and questions.  The only overlap I could find with
> > David's feedback was on the issue of retry counts.
> >
> > 1) Why does the Reliability spec say that it "changes" the "existing ebXML
> > specifications?"  Is implementing the reliability spec optional?  If it is
> > optional to support the AtMostOnce property found in the header, the Header
> > spec should probably say this.  (See Reliability lines 62 and 86, verbage
> > such as "new element" on lines 87 and 88, and all of section 3 "Changes to
> > Existing ebXML Specifications."
>
>This qwuestion has been raised this week as feedback from
>the POC and in other discussions we've been having regarding
>optionality.
>
>We are going to be focusing on a more carefully chosen usage
>of the RFC2119 terms (MUST, SHOULD, OPTIONAL, etc.) in the
>edits applied this week to our various outstanding documents.
>Still not perfect, but we're trying very hard;-)
>
>Anyway, it would seem to me that optional in the sense of
>reliable messaging is was meant to mean that one need not
>USE reliable messaging features. We are going to be looking
>for cases where this isn't clear and cleaning them up. We
>hope that we have handled this in the version of the MS spec
>that Ralph sent out last night. If there are still any questions
>in this regard, we need to raise them as issues and we'll
>correct them as part of the final revision process with
>the other feedback received.

I agree with your sense: implementors of Messaging Services SHALL implement 
the reliable messaging functions (for interoperability), but the actual use 
of the functions is dependent on the applications.

> >
> > 2) Why does ebXML support both a message unique identity and a recovery
> > number?  I?m mostly just curious, but part of me is a little concerned.  If
> > ever the identifier and the recovery number get out of synch (an error, I
> > suppose), and if communicating endpoints are not both synchronizing on the
> > same value, havoc could be loosed.  Also, although I think the definition
> > is complete as is, it might be helpful to clarify that the identity doesn?t
> > change on retransmissions (that is, to say that the identity is associated
> > with the message, not with the message-transmission).
>
>This will most certainly be addressed. Jim should be sending out
>a revised version of the RM spec next week incorporating feedback
>and outstanding issues raised here this week.

Global identity (based on MessageID) is defined in the Messaging Service 
Specification. SequenceNumber is unique to the Sender-Receiver-Transport 
combination of Message Service Handlers processing the message; the reason 
for using SequenceNumber is that it can result in faster testing for 
duplication and is easier to use. However, there was some objection that 
using a SequenceNumber might be an unnecessary implementation constraint, 
so the Receiver can use MessageID if it prefers to detect duplicates.

I'm not sure of your point on "retransmissions": MessageID is fixed by the 
sending PartyID for a particular message, and SequenceNumber is fixed by 
the Message Service Handler. If, because of error situations, the Sender 
MSH must retransmit the message to the Receiver, the MessageID is not 
touched ("we never alter the Message Header in Reliable Messaging"), and 
the same Sequence Number is reused (because the Receiver must be able to 
check again for this message under the same localized id). Does this help?

> > 3) The ACK request must ride in a normal request.  What happens if too few
> > messages are sent in a window to get an ACK in the time required by the
> > application?  That is, the application must move on with full knowledge
> > that the messages were delivered.  Do we need a way to request an ACK
> > without requiring it to ride in a normal request?
>
>Jim, could you add this to the issues list.

TRP hasn't yet defined the semantics/syntax of the interface between the 
Messaging Service Handler and the upper layer. The sending MSH will know 
that one or a group of message have been sent reliably to the receiving 
MSH; I don't know if this information is expected to be propagated upwards. 
Of course, I am speaking of MSH-level ACKs.

I think you may be talking about business-process ACKs, which are at a 
higher level than MSH. If so, then the BP-ACK would be a Normal Message 
sent in the reverse direction, and any discussion of it would be a business 
level discussion... Does this make sense?

> > 4) Why define maximum message size? I?m sure there?s a good reason.  More
> > working implementations, perhaps, since few people know how to accommodate
> > arbitrarily sized messages?  (See note on line 153 in 2.4 "Message Transfer
> > Sequence.")
>
>In fact, the max message size is something which belongs in the
>TPA. In terms of why it is important w/r/t RM, since persistance is
>involved, message size could be a factor.
>
>
>Suffice to say that the max msg size should be an agreed upon
>value reflected w/r/t both participants in a TPA.
>
>I'll forward this to Marty Sachs for inclusion in the TPA
>requirements for TR&P.

Reliable Messaging doesn't introduce any new ideas on message size. The 
number of messages which could be in a Reliable Messaging Group (formerly, 
"Window") might be important to the Receiver, to limit table size...


> >
> > 5) The verbage of section 2.5 "Error Detection" (line 163) assumes that
> > there is a timeout period for waiting for an ack.  The spec probably ought
> > to assert that there shall be a timeout period so that it is a normative
> > requirement of the protocol.  And what is the timeout period for a recovery
> > message?  Is this left to the TPA to specify?  In BizTalk the timeout
> > period is given by the delivery deadline; but BizTalk has a separate time
> > for message expiration.  (See line 163 of 2.5 "Error Detection.")
> >

We need to discuss this. The Sender's MSH needs to set a timeout to catch 
errors on sending the final message in a group of messages. How the Sender 
finds out about a suitable timeout value is undecided...

> > 6) Why are all messages in the window resent on a transport error?  Won?t a
> > transport error apply to just one message?  (See line 175 in section 2.6
> > "Window Recovery Sequence.")
>
>I understand that all messages are resent if the ack isn't
>received in the expected timeframe. In the case where an
>error is detected and successfully transmitted to the
>original sender only the unreceived messages would be resent.

The protocol is for the Receiver to tell the Sender about messages actually 
received, if the Receiver gets a message indicating that all messages have 
been sent but in fact the Receiver has gotten fewer messages. The Sender 
could selectively send the missing messages, but there was some comment in 
the meeting last week that it is easier for the Sender just to send *all* 
messages in the message group. The algorithm instructs the Receiver to 
throw away duplicate messages, so this kind of massive response would work.

> >
> > 7) Section 1.7 "Detection of Repeated Messages by the Receiver" (lines
> > 187+) outlines a "suggested" implementation.  For clarity sake, we should
> > probably separate normative from non-normative material.  That way it
> > becomes easy to identify compliance requirements.  Might want to do this
> > across all specifications.  The W3C does this.
>
>Agreed. We discussed this today. I think that explicit identification
>of what is non-normative (such as a suggested implementation approach
>to help clarify) is a good idea.
>
> >
> > 8) What if the transport errors don?t go away?  The spec, as is, indicates
> > that the protocol handler would go into an infinite loop.  Likewise, what
> > if the recovery sequence never succeeds?  We might want the spec to assert
> > that the receiver may engage in multiple recovery retries.  Likewise, what
> > if the receipts are never received?  Should there be a maximum retry count
> > that applies to all kinds of errors?  Should the protocol state that such a
>
>Yes, the max retries has been discussed. This should be included in the
>TPA. I think that it may already do so, if not, I'll add it to the
>list of required elements for the TPA and send to Marty.
>
>
> > maximum must exist, but not state what that maximum should be?  I assume
> > this information would be available in the TPA, and no self-respecting TPA
> > would be without a maximum retry count, but I think the protocol really
> > must constrain the TPA.  I?d think the protocol should at least state the
> > kinds of errors to which retry counts should be applied.
> >
> > 9) Regarding section 2.8 "Garbage Collection":  First, I?d like to suggest
> > renaming the section to something else, because garbage collection has a
> > very definitive meaning in distributed communications -- the recycling of
> > remotely referenced objects -- and this is not the meaning intended here.
>
>hmmmm, yeah, I guess so.... How about 'Persistance Handling'?

Or, just delete the section since it isn't normative nor an issue for 
interoperability... That's what I did!


> > Second, why isn?t the expiration of a counter sufficient for its removal?
> > Regardless of what else is going on, if it has expired, it must be trashed.
> >  I don?t understand why the other conditions must also be satisfied.  Maybe
> > I?m just not understanding something.  Third, why can?t a counter be
> > removed after the window has closed?  Is this because the receiver can?t
> > know that the sender will get the ack in time or even get the ack at all?
> >
> > 10) I realize that this is an "open issue," but I?m really concerned about
> > time synchronization.  This is also one of my concerns about BizTalk.  The
> > BizTalk spec claims that time synchronization is not an issue because of
> > the latencies involved in net access.  Well, it is possible to engineer
> > those latencies down to nearly whatever is required, so I don?t buy the
> > BizTalk answer.  Time synchronization is a really really tough issue.
>
>I thoroughly concur! David and I have discussed this as well and
>I think that we are in agreement that there needs to be something
>which addresses time synchronization. It simply cannot be ignored IMHO.
>
> >
> > 11) BizTalk has the notion of a "processing deadline," which is the time by
> > which the recipient application must complete processing the message.  The
> > message must carry its processing deadline.  BizTalk also provides a
> > "delivery deadline," which appears to be the same as the ebXML "Message
> > Expiration Timestamp."  Basically, the sender heeds the delivery deadline,
> > taking action on failure to meet the deadline, while the receiver ignores
> > it.  The receiver heeds the processing deadline, refusing to ack or forward
> > messages that fail to meet the deadline, while the sender ignores it.  I
> > think the idea is that the only real deadline is an application-level
> > deadline, and the intervening middleware and even the applications should
> > be given freedom to apply heuristics in order to try to meet the deadline.
> > Does this approach have any merit?
>
>For purposes of RM, you need some notion of a timeout for receipt of
>an ack so that you can begin retry/recovery processing.
>
> >
> > 12) Are acks to be sent for messages that arrive at the recipient after the
> > message expires?  The spec says that such messages are dropped, suggesting
> > that no ack should be sent, but we should make an explicit statement.
> > Otherwise some implementations may ack and others won?t.  It seems
> > problematic to provide an ack for a message that gets dropped.
>
>We discussed this, but I don't recall a resolution. Jim, if this isn't
>in the issues list (and hasn't been addressed yet) would you please
>add it? Thanks!

We deleted the concept of message expiration timestamp for the purposes of 
Reliable Messaging. It could be reinstated for general messaging, of 
course. Starting from the sending of the initial message of a group, the 
sequence is deterministically bound by either (a) the Receiver sending a 
MSH-level ACK back to the Sender after the final message, or (b) the Sender 
detecting [TBD] number of timeouts when trying to send the final message.

> > 13) When a message is retransmitted, must the entire packaged retransmitted
> > message be identical to the originally transmitted message?  Some
>
>Yes!
>
> > implementations may be inclined to insert a new timestamp somewhere, or
> > perhaps reorder elements, or maybe there is some custom header that the
> > protocol is using that it is inclined to update.  Should there be an
> > explicit requirement that a retransmitted message be identical to the
> > original?  BizTalk goes out of its way to assert this requirement.
>
>It should be made explicitly clear and unambiguous, IMHO.

Asserted in the revised spec. The only change would be in a recovery 
sequence where the Sender is retransmitting one missing message, and that 
message was not the final message of a group of messages. In this case, the 
Sender's MSH needs to set the RM-GroupCount to the number of messages in 
the group so that the Receiver knows that it is the last message in the 
group and to check again for completeness.


> >
> > 14) We should consider identifying all the kinds of information that must
> > be passed to an application by an TR&P engine the fronts an application.
> > I?ll call this the "TR&P processor."  Section 2.5 "Error Detection" line
> > 169 seems to be the only place that provides such information.  It says
> > that when a message is lost the loss must be reported to the sending
> > application.  What if the recipient receives an invalid message?  What
> > about when retry counts are exceeded, say by the sender sending normal
> > messages or the receiver failing to get retransmits after attempting error
> > recoveries?
>
>I think that this deserves some discussion. However, it isn't clear
>to me that this is in our charter and scope. Firstly, it suggests
>an implementation. Secondly, in a synchronous scenario, I can see
>error reporting as being handled at the API level. It isn't clear
>to me that there's any common handling of this in an asynchronous
>case which would be the case for most b2b exchanges.
>
>If a message is undeliverable, what is the application to do with
>this? How will it be reported in a way that ALL applications can
>do something meaningful? I think that this is something that you
>guys (EAI vendors) would be providing as adaptors to the
>ebXML Messaging Service for the wide variety of applications
>that exist.
>
> >
> > What should I tackle next?  I'm not even sure where to find the specs.
> > I've been pulling these out of this discussion list.  I need one that
> > covers synchronous vs. asynchronous processing, along with one that
> > describes error messages.
>
>TPA. Look in the mail list archives if you haven't already got
>a copy. 1.0.6 would be the latest draft. It is definitely TR&P related
>but is being handled as a separate WG and deliverable. It is
>key that we (TR&) all understand TPA and its relationship to
>TR&P.
>
>
> >
> > Hope you're having fun at the F2F.  Sorry I can't be there.  Too much fish
> > to fry in too short amount of time.
>
>Maybe you can make it to our interim F2F in Dallas w/o 9/25?
>
> >
> > - Joe
>--
>     _/_/_/_/ _/    _/ _/    _/ Christopher Ferris - Enterprise Architect
>    _/       _/    _/ _/_/  _/  Phone: 781-442-3063 or x23063
>   _/_/_/_/ _/    _/ _/ _/ _/   Email: chris.ferris@East.Sun.COM
>        _/ _/    _/ _/  _/_/    Sun Microsystems,  Mailstop: UBUR03-313
>_/_/_/_/  _/_/_/  _/    _/     1 Network Drive Burlington, MA 01803-0903
References:
- resend: Re: Feedback on Reliable Messaging v0-06
  - From: Christopher Ferris <chris.ferris@east.sun.com>