ebxml-transport message

Subject: Feedback on Reliable Messaging v0-06
From: Joe Lapp <jlapp@webMethods.com>
To: ebxml-transport@lists.ebxml.org
Date: Mon, 07 Aug 2000 18:55:36 -0400
Wow, I have to say that this document seems to be the most organized and
most readable of the bunch.  And the diagrams are terrific!  I've so far
been withholding my comment that we should have a tech writer pretty things
up and reorg things before declaring anything public and final.  So
consider it said for the Packaging and Header specs (sorry guys -- it is
great material, though).  :O

Here are my comments and questions.  The only overlap I could find with
David's feedback was on the issue of retry counts.

1) Why does the Reliability spec say that it "changes" the "existing ebXML
specifications?"  Is implementing the reliability spec optional?  If it is
optional to support the AtMostOnce property found in the header, the Header
spec should probably say this.  (See Reliability lines 62 and 86, verbage
such as "new element" on lines 87 and 88, and all of section 3 "Changes to
Existing ebXML Specifications."

2) Why does ebXML support both a message unique identity and a recovery
number?  I’m mostly just curious, but part of me is a little concerned.  If
ever the identifier and the recovery number get out of synch (an error, I
suppose), and if communicating endpoints are not both synchronizing on the
same value, havoc could be loosed.  Also, although I think the definition
is complete as is, it might be helpful to clarify that the identity doesn’t
change on retransmissions (that is, to say that the identity is associated
with the message, not with the message-transmission).

3) The ACK request must ride in a normal request.  What happens if too few
messages are sent in a window to get an ACK in the time required by the
application?  That is, the application must move on with full knowledge
that the messages were delivered.  Do we need a way to request an ACK
without requiring it to ride in a normal request?

4) Why define maximum message size? I’m sure there’s a good reason.  More
working implementations, perhaps, since few people know how to accommodate
arbitrarily sized messages?  (See note on line 153 in 2.4 "Message Transfer
Sequence.")


5) The verbage of section 2.5 "Error Detection" (line 163) assumes that
there is a timeout period for waiting for an ack.  The spec probably ought
to assert that there shall be a timeout period so that it is a normative
requirement of the protocol.  And what is the timeout period for a recovery
message?  Is this left to the TPA to specify?  In BizTalk the timeout
period is given by the delivery deadline; but BizTalk has a separate time
for message expiration.  (See line 163 of 2.5 "Error Detection.")

6) Why are all messages in the window resent on a transport error?  Won’t a
transport error apply to just one message?  (See line 175 in section 2.6
"Window Recovery Sequence.")

7) Section 1.7 "Detection of Repeated Messages by the Receiver" (lines
187+) outlines a "suggested" implementation.  For clarity sake, we should
probably separate normative from non-normative material.  That way it
becomes easy to identify compliance requirements.  Might want to do this
across all specifications.  The W3C does this.

8) What if the transport errors don’t go away?  The spec, as is, indicates
that the protocol handler would go into an infinite loop.  Likewise, what
if the recovery sequence never succeeds?  We might want the spec to assert
that the receiver may engage in multiple recovery retries.  Likewise, what
if the receipts are never received?  Should there be a maximum retry count
that applies to all kinds of errors?  Should the protocol state that such a
maximum must exist, but not state what that maximum should be?  I assume
this information would be available in the TPA, and no self-respecting TPA
would be without a maximum retry count, but I think the protocol really
must constrain the TPA.  I’d think the protocol should at least state the
kinds of errors to which retry counts should be applied.

9) Regarding section 2.8 "Garbage Collection":  First, I’d like to suggest
renaming the section to something else, because garbage collection has a
very definitive meaning in distributed communications -- the recycling of
remotely referenced objects -- and this is not the meaning intended here.
Second, why isn’t the expiration of a counter sufficient for its removal?
Regardless of what else is going on, if it has expired, it must be trashed.
 I don’t understand why the other conditions must also be satisfied.  Maybe
I’m just not understanding something.  Third, why can’t a counter be
removed after the window has closed?  Is this because the receiver can’t
know that the sender will get the ack in time or even get the ack at all?


10) I realize that this is an "open issue," but I’m really concerned about
time synchronization.  This is also one of my concerns about BizTalk.  The
BizTalk spec claims that time synchronization is not an issue because of
the latencies involved in net access.  Well, it is possible to engineer
those latencies down to nearly whatever is required, so I don’t buy the
BizTalk answer.  Time synchronization is a really really tough issue.

11) BizTalk has the notion of a "processing deadline," which is the time by
which the recipient application must complete processing the message.  The
message must carry its processing deadline.  BizTalk also provides a
"delivery deadline," which appears to be the same as the ebXML "Message
Expiration Timestamp."  Basically, the sender heeds the delivery deadline,
taking action on failure to meet the deadline, while the receiver ignores
it.  The receiver heeds the processing deadline, refusing to ack or forward
messages that fail to meet the deadline, while the sender ignores it.  I
think the idea is that the only real deadline is an application-level
deadline, and the intervening middleware and even the applications should
be given freedom to apply heuristics in order to try to meet the deadline.
Does this approach have any merit?

12) Are acks to be sent for messages that arrive at the recipient after the
message expires?  The spec says that such messages are dropped, suggesting
that no ack should be sent, but we should make an explicit statement.
Otherwise some implementations may ack and others won’t.  It seems
problematic to provide an ack for a message that gets dropped.

13) When a message is retransmitted, must the entire packaged retransmitted
message be identical to the originally transmitted message?  Some
implementations may be inclined to insert a new timestamp somewhere, or
perhaps reorder elements, or maybe there is some custom header that the
protocol is using that it is inclined to update.  Should there be an
explicit requirement that a retransmitted message be identical to the
original?  BizTalk goes out of its way to assert this requirement.

14) We should consider identifying all the kinds of information that must
be passed to an application by an TR&P engine the fronts an application.
I’ll call this the "TR&P processor."  Section 2.5 "Error Detection" line
169 seems to be the only place that provides such information.  It says
that when a message is lost the loss must be reported to the sending
application.  What if the recipient receives an invalid message?  What
about when retry counts are exceeded, say by the sender sending normal
messages or the receiver failing to get retransmits after attempting error
recoveries?

What should I tackle next?  I'm not even sure where to find the specs.
I've been pulling these out of this discussion list.  I need one that
covers synchronous vs. asynchronous processing, along with one that
describes error messages.

Hope you're having fun at the F2F.  Sorry I can't be there.  Too much fish
to fry in too short amount of time.

- Joe