ebxml-transport message

Subject: Re: Comments on Reliable Messaging Specification, Aug. 11, 2000
From: Jim Hughes <jfh@fs.fujitsu.com>
To: mwsachs@us.ibm.com
Date: Wed, 16 Aug 2000 14:54:57 -0700
Marty -

Comments below.

Jim

At 05:21 PM 8/16/00 -0400, mwsachs@us.ibm.com wrote:

> >  NOTE WELL:  because each item in the window is a complete
> >application-level message, any implementation limit on the window size
>sets
> >a limit on the maximum application-level message size, which may be
> >unacceptable.  We must be very careful about imposing message size limits
> >on the application.  The application design may prevent splitting one
> >message into smaller messages; hence window size limits could prevent
> >support of some applications.  Reliable transport protocols deal with this
> >issue by segmenting the messages underneath the application and windowing
> >the segments.  Think about IP underneath TCP and the sliding window
> >protocols in HDLC and the LLC layer of the LANs.
>
>Again, we are not covering logical message splitting in this RM spec.
>
>MWS:  I agree with not covering logical message splitting.  My concern is
>that implementation limits on the message size or total storage capacity of
>the RM-group may arise, in which case the spec will have to at least
>provide guidance.  The suggestion that maximum message size may have to be
>specified in the TPA troubles me because, as I said, that becomes a matter
>of what applications cannot be supported.  I do not have a good solution
>other than using transparent segmentation of messages such that the total
>message size remains an application matter.  Perhaps an editor's note
>warning of a possible message size issue might be appropriate in order to
>get people thinking.

Let's see first if we have a need to discuss "maximum message size" in the 
context of the draft Messaging Service Spec, before we look at 
modifications of the RM Spec. Presuming that a Receiver (Messaging Service 
Handler) will pass upwards any received message, the only RM implementation 
constraint is to have enough local storage to store the message Sequence 
Numbers (not the message contents) during processing. I think the Editor's 
Note you suggest is appropriate for the basic Messaging Service Spec...

> >Line 136, item 9:  Please replace "For only the last message..." by "To
> >detect loss of the last message..."  The statement in the specification is
> >an implementation statement.  For example, the sender could choose to set
>a
> >deadline for each message and slide the deadline forward until the last
> >message of the window.  This would enable early detection of "hard"
> >failures.  My suggested change avoids stating a requirement that the
> >timeout may only be set on the last message.
>
>Change made at beginning of the sentence.
>
>The reason for saying that a timeout is specified for *only* the last
>message of an RM-Group is to avoid having timeouts for *all* messages in
>the RM-Group. The Sender finds out that messages (other than the last) in
>an RM-Group never arrived by getting an error message in response to the
>last message. The Sender recovers from non-delivery of the last message by
>using the timeout.
>
>MWS:  My concern is about appearing to constrain the application.  If the
>change eliminated the word "only", then I am satisfied.

If you're only interested to remove the "only", then we may not be in 
agreement. The proposed algorithm has the Sender setting a timeout for 
*only* the last message of the RM-Group. If the application wants to 
timeout each message, then it should request that each RM-Group size is 
exactly 1. Does this resolve your question?

> >Line 137, item 9  ("information from the TPA"):  It is not obvious that a
> >separate timeout is needed for reliable messaging.  The existing
> >transport-level timeout as defined in tpaML section 2.6.4 may serve the
> >purpose.  However, this point requires considerably more thought. As it
> >stands, it is not clear to me that the complexity of the window timeout is
> >worth the value added.  A much simpler solution for this 1-out-of-N case
> >(loss of the last message) is to rely on the normal transport-level
>timeout
> >(e.g. the time to the HTTP response).  Simply terminate the window.  The
> >messaging service will simply time out at the transport level and re-send
> >the message, starting a new window. This, however, leads to the following
> >considerations:
>
>One of the major rationales of this proposal is to make *no* assumptions on
>the underlying transport (the "carrier pigeon model"). Thus, we don't
>introduce the concept of a "normal transport-level timeout". If we lift
>this assumption, then obviously other solutions are possible...
>
>MWS:  But ACKs are a fact of life for at least some of the transports,
>including HTTP. Some discussion of possible interactions between the
>reliable messaging protocol and the underlying transport is needed. A
>clarification is needed, for example, as to if a RM-group size of 1 is
>used, that there will be both a RM ACK and the HTTP ACK.  A recommendation
>is needed about whether the transport ACK should be supporessed when
>reliable messaging is used, for protocols which permit suppressing the
>transport ACK.

Again, a fundamental principal is transport agnosticism. Maybe we could 
make an exception that in cases where the RM-Group size is 1 message, and 
the Sending/Receiving MSHs can determine that the underlying transport 
provides reliable acknowledgment of each message, then the MSHs would be 
permitted to substitute that ACK sequence for the prescribed one at the 
Messaging Service Layer...

> >Line 173, editor note 12:  As discussed earlier, the window count should
> >not be visible to the parties.  It must be established and managed by the
> >message service handlers.
>
>This is not entirely true. The From-Party (see Figure 1) may have valid
>reasons to tell the Sending MSH that a group of messages must be sent
>reliably, and it would have nothing to do with the characteristics of the
>underlying transport. Quite possibly the From-Party is interested to know
>only when the group of messages was reliably sent. We need to define the
>interface to the From-Party to lock this down.
>
>MWS:  I agree with "send this message reliably".  I would prefer that the
>applications not have to deal with the RM-Group count which, as noted
>above, I view as a function of the characteristics of the underlying
>transport and perhaps implementation factors.  I view "send reliably" as
>something to indicate for each message via the as-yet-undefined BP to TRP
>service interface.  The first message without "send reliably" would
>terminate the final RM-group without error.

Initially, we had proposed something similar: once an RM-Group started, 
only reliable messages were sent until the end of the RM-Group, and the 
Sequence Numbers assigned to these messages were monotonically increasing 
integers. It made it easier for the Receiver to quickly check for errors.

But, there was pushback on this in our meetings, so in the current spec it 
is possible to intermix non-reliable with reliable (the non-reliable 
messages are just passed through and don't affect the RM algorithm), and 
Sequence Numbers just need to be unique with respect to the 
Sender-Receiver-Transport triple.

>That way, the number of
>messages to send reliably is not dependent on knowing the RM-group size.
>Saying "send the next N messages reliably" is a problem since after saying
>"send the next N reliably", the application could take different paths with
>different numbers of messages.  Another alternative would be to turn on
>reliable messaging once and keep sending until an explicit turning off of
>reliable messaging.  However there is still the need to deal with a short
>final RM-group.
>
> >2.3.3  Routing Header
> >
> >Line 179, Editor Note 13:  If it is intended that the messages in a single
> >window can be from various TPAs and various conversations, then the
>message
> >service instance must be identified.  Be careful, however, because the
> >latency created by such a window affects all TPAs and conversations,
> >especially when retries are performed.  If there is a separate message
> >service instance for each conversation, then the window can be smaller and
> >retries in one window need not delay other conversations.  In this case,
> >the conversation ID is sufficient to identify the message service
>instance.
>
>I'm not sure what you are proposing here. RM doesn't know about
>conversations and other items identified in the Header.
>
>MWS:  I agree that RM as currently specified doesn't know about
>conversations, which is why I am concerned that the RM-group latency
>affects all conversations.  It appears, from the current spec, that once
>reliable messaging is turned on by one application, it is applied to all
>concurrently running conversations over that transport channel in all
>applications, whether they want it or not.

Yes, it applies to all traffic between the Sender's MSH and the Receiver's 
MSH on that particular transport link. RM enhances the *transport* of 
messages between the two MSHs. If two upper-level "From-Party"s are wanting 
to send reliable messages on the same Sender/Receiver/Transport link, the 
proposal would support it - the Sender would multiplex the messages onto 
the link and the Receiver would know how to send the (reliably received) 
messages at the other end.

I do agree that we need to get working on the BP-MSH interface, so we know 
what options are available to the BP.

We also took a conscious decision to keep it simple in this first spin, so 
I wouldn't be surprised if we needed to improve the design at the BP level 
as well as deal with the upcoming "multi-node" issues!

Jim
References:
- Re: Comments on Reliable Messaging Specification, Aug. 11, 2000
  - From: mwsachs@us.ibm.com