ebxml-transport message

Subject: Re: Comments on Reliable Messaging Specification, Aug. 11, 2000
From: mwsachs@us.ibm.com
To: Jim Hughes <jfh@fs.fujitsu.com>
Date: Wed, 16 Aug 2000 17:21:09 -0400
Jim,

I am generally pleased with your responses to my comments.  I do have a few
rejoinders, embedded in the following extracts from your posting.

Regards,
Marty

*************************************************************************************

IBM T. J. Watson Research Center
P. O. B. 704
Yorktown Hts, NY 10598
914-784-7287;  IBM tie line 863-7287
Notes address:  Martin W Sachs/Watson/IBM
Internet address:  mwsachs @ us.ibm.com
*************************************************************************************



Jim Hughes <jfh@fs.fujitsu.com> on 08/16/2000 01:02:36 AM

To:   Martin W Sachs/Watson/IBM@IBMUS
cc:   ebxml-transport@lists.ebxml.org
Subject:  Re: Comments on Reliable Messaging Specification, Aug. 11, 2000



Marty,

Inserted below are my comments on your email, especially how I resolved
them in the latest version of the RM spec. Thanks for the comments...

Jim


>  NOTE WELL:  because each item in the window is a complete
>application-level message, any implementation limit on the window size
sets
>a limit on the maximum application-level message size, which may be
>unacceptable.  We must be very careful about imposing message size limits
>on the application.  The application design may prevent splitting one
>message into smaller messages; hence window size limits could prevent
>support of some applications.  Reliable transport protocols deal with this
>issue by segmenting the messages underneath the application and windowing
>the segments.  Think about IP underneath TCP and the sliding window
>protocols in HDLC and the LLC layer of the LANs.

Again, we are not covering logical message splitting in this RM spec.

MWS:  I agree with not covering logical message splitting.  My concern is
that implementation limits on the message size or total storage capacity of
the RM-group may arise, in which case the spec will have to at least
provide guidance.  The suggestion that maximum message size may have to be
specified in the TPA troubles me because, as I said, that becomes a matter
of what applications cannot be supported.  I do not have a good solution
other than using transparent segmentation of messages such that the total
message size remains an application matter.  Perhaps an editor's note
warning of a possible message size issue might be appropriate in order to
get people thinking.

>
>Line 123, item 7:  Observation: The usual sliding window protocols are
full
>duplex with regards to messages and ACKs, and there is a pause only on
>detection of a lost message.  The protocol specified in this document is
>not a sliding window at all;  it is more like a "jumping window"
>protocol - it is half duplex and there is a pause on every window.  That
is
>a serious degradation of message latency and throughput compared to
sliding
>window protocols.

Another reason why I changed the name to "RM-Group".

MWS:  If "sliding" is also gone, I am content except for the latency
question.


>Line 136, item 9:  Please replace "For only the last message..." by "To
>detect loss of the last message..."  The statement in the specification is
>an implementation statement.  For example, the sender could choose to set
a
>deadline for each message and slide the deadline forward until the last
>message of the window.  This would enable early detection of "hard"
>failures.  My suggested change avoids stating a requirement that the
>timeout may only be set on the last message.

Change made at beginning of the sentence.

The reason for saying that a timeout is specified for *only* the last
message of an RM-Group is to avoid having timeouts for *all* messages in
the RM-Group. The Sender finds out that messages (other than the last) in
an RM-Group never arrived by getting an error message in response to the
last message. The Sender recovers from non-delivery of the last message by
using the timeout.

MWS:  My concern is about appearing to constrain the application.  If the
change eliminated the word "only", then I am satisfied.


>Line 137, item 9  ("information from the TPA"):  It is not obvious that a
>separate timeout is needed for reliable messaging.  The existing
>transport-level timeout as defined in tpaML section 2.6.4 may serve the
>purpose.  However, this point requires considerably more thought. As it
>stands, it is not clear to me that the complexity of the window timeout is
>worth the value added.  A much simpler solution for this 1-out-of-N case
>(loss of the last message) is to rely on the normal transport-level
timeout
>(e.g. the time to the HTTP response).  Simply terminate the window.  The
>messaging service will simply time out at the transport level and re-send
>the message, starting a new window. This, however, leads to the following
>considerations:

One of the major rationales of this proposal is to make *no* assumptions on
the underlying transport (the "carrier pigeon model"). Thus, we don't
introduce the concept of a "normal transport-level timeout". If we lift
this assumption, then obviously other solutions are possible...

MWS:  But ACKs are a fact of life for at least some of the transports,
including HTTP. Some discussion of possible interactions between the
reliable messaging protocol and the underlying transport is needed. A
clarification is needed, for example, as to if a RM-group size of 1 is
used, that there will be both a RM ACK and the HTTP ACK.  A recommendation
is needed about whether the transport ACK should be supporessed when
reliable messaging is used, for protocols which permit suppressing the
transport ACK.

>In this protocol, there seem to be two possibilities regarding the
timeout:
>
>    The normal per-message transport-level timeout is not used with
reliable
>    messaging - but this extends the time to retry a lost message to the
>    time to fill the window.

Yes, you are correct. However, the Sender MSH can minimize the number of
messages in an RM-Group if this is a problem (or even turn off RM functions
in the MSH layer if he wants to just use known transport layer functions
and not expect any kind of RM-layer ACK/error message from the receiving
MSH. I would expect that scenario if the transport is inherently reliable.

MWS:  Some discussion of this should be in the specification.  Perhaps an
Editor's note on the need to add this discussion in the future would be
advisable.

>    The per-message transport-level timeout is still used on top of the
>    reliable messaging protocol.  In this case, the reliable messaging
>    protocol must NEVER retransmit a message in the window if it was
>    successfully received since the upper level already knows that the
>    message was successfully received. (Perhaps discarding the duplicate
is
>    sufficient; I am not certain of this.)
>

I haven't formed a firm opinion on the TPA and its use in ebXML
transactions, but I am troubled by its size and complexity. How do we
implement things such as "it is strongly recommended that the framework
implement and end-to-end acknowledgment" (Note, section 2.6.7.3)?
Especially, it seems to me that the TPA is present to describe the profiles
of two parties, and there is no TPA mandate that the parties SHALL
implement some kind of reliability function or other protocol... that's the
function of other documents.

MWS:  "Strongly recommended" is an informative (non-normative) statement.
It may indeed be that a lot of the text in the tpaML proposal really
belongs in other documents.  Given the scope of ebXML, a document which
provides guidance to implementers of the messaging service would be a very
valuable document. I view the RosettaNet Implementation Framework document
as an example of such a document. I felt it important to capture all these
points in my proposal until I could eventually determin where they below.
As to TPA size and complexity, our experience in IBM Research was that all
these elements are needed for B2B between large enterprises.  The TP team
will need to determine how to structure the specification to not be
forbidding to SMEs (e.g. by making virtually all elements optional in the
XML sense).  In addition, part of the complexity problem can be addressed
by a tpa-aware authoring tool which guides the tpa writer.  My research
team prototyped such a tool.

MWS:  Incidentally, Reliable Messaging probably makes it unnecessary to
implement the tpaML "strongly recommended" end to end ACK with SMTP.  SMTP
is one case where reliable messaging is a clear win.

If both MSHs operate on a "persist and ACK" each message, as you describe,
then you just need to define if the ACK is a transport-ACK or an RM-ACK. In
the latter case, we would use RM functions and set the RM-Group size to 1.
Does this make sense?

MWS:  The discussion in tpaML relates to a supposed implementation does not
have the reliable messaging function that we are defining.  Your proposal
sounds good.  Some explanatory words would be useful.

>
>Line 173, editor note 12:  As discussed earlier, the window count should
>not be visible to the parties.  It must be established and managed by the
>message service handlers.

This is not entirely true. The From-Party (see Figure 1) may have valid
reasons to tell the Sending MSH that a group of messages must be sent
reliably, and it would have nothing to do with the characteristics of the
underlying transport. Quite possibly the From-Party is interested to know
only when the group of messages was reliably sent. We need to define the
interface to the From-Party to lock this down.

MWS:  I agree with "send this message reliably".  I would prefer that the
applications not have to deal with the RM-Group count which, as noted
above, I view as a function of the characteristics of the underlying
transport and perhaps implementation factors.  I view "send reliably" as
something to indicate for each message via the as-yet-undefined BP to TRP
service interface.  The first message without "send reliably" would
terminate the final RM-group without error.  That way, the number of
messages to send reliably is not dependent on knowing the RM-group size.
Saying "send the next N messages reliably" is a problem since after saying
"send the next N reliably", the application could take different paths with
different numbers of messages.  Another alternative would be to turn on
reliable messaging once and keep sending until an explicit turning off of
reliable messaging.  However there is still the need to deal with a short
final RM-group.

>2.3.3  Routing Header
>
>Line 179, Editor Note 13:  If it is intended that the messages in a single
>window can be from various TPAs and various conversations, then the
message
>service instance must be identified.  Be careful, however, because the
>latency created by such a window affects all TPAs and conversations,
>especially when retries are performed.  If there is a separate message
>service instance for each conversation, then the window can be smaller and
>retries in one window need not delay other conversations.  In this case,
>the conversation ID is sufficient to identify the message service
instance.

I'm not sure what you are proposing here. RM doesn't know about
conversations and other items identified in the Header.

MWS:  I agree that RM as currently specified doesn't know about
conversations, which is why I am concerned that the RM-group latency
affects all conversations.  It appears, from the current spec, that once
reliable messaging is turned on by one application, it is applied to all
concurrently running conversations over that transport channel in all
applications, whether they want it or not.


>Regards,
>Marty
>*************************************************************************************

>
>IBM T. J. Watson Research Center
>P. O. B. 704
>Yorktown Hts, NY 10598
>914-784-7287;  IBM tie line 863-7287
>Notes address:  Martin W Sachs/Watson/IBM
>Internet address:  mwsachs @ us.ibm.com
>*************************************************************************************
Follow-Ups:
- Re: Comments on Reliable Messaging Specification, Aug. 11, 2000
  - From: Jim Hughes <jfh@fs.fujitsu.com>