[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Comments on Reliable Messaging Specification, Aug. 11, 2000
1.1 Purpose and Scope Line 55: This paragraph should state whether all implementers shall provide reliable messaging or it is optional. This statement will be an important clarification if the reliable messaging spec is moved into the ebXML messaging spec. 2.1 Base Concepts line 105, Editor note 5: The reliable messaging protocol details should be transparent to the sending and receiving parties. Therefore, the service interface should not be concerned with the window width. The service interface might provide an abstract quality of service parameter but the details of window size etc. should be determined by the message service handler based on the requested quality of service and the details of the selected low-level transport. The low-level transport details are invisible to the two parties other than the information which must be stated in the TPA. If there are any details which must be agreed to between the message service handlers, these might be stated in the TPA although since they don't directly concern the parties, it would be preferable to exchange initialization messages between message service handlers in order to reach the agreement. line 112, Editor Note 7: If there is an implementation limit to the window size, this has to be agreed to by the two message service handlers and perhaps by the two parties. The agreement is stated in the TPA, if it has to be visible to the parties, or it is arrived at by means of an exchange of initialization messages between the message service handlers when they first make contact with each other. There are three interacting variables related to the window size: (1) maximum buffer size for the window; (2) desired number of messages in the window; (3) maximum message size which the application requires. Note that (3) could be very large. (1) is probably an implementation limit that the parties need not know but the message service handlers must set it to the smaller of their two capabilities. Given limit (1), the window negotiation can be based on (2) or (3), each of which sets a limit on the other. NOTE WELL: because each item in the window is a complete application-level message, any implementation limit on the window size sets a limit on the maximum application-level message size, which may be unacceptable. We must be very careful about imposing message size limits on the application. The application design may prevent splitting one message into smaller messages; hence window size limits could prevent support of some applications. Reliable transport protocols deal with this issue by segmenting the messages underneath the application and windowing the segments. Think about IP underneath TCP and the sliding window protocols in HDLC and the LLC layer of the LANs. If we really need a windowing protocol in the message service handler, the windowing protocol should segment the messages in order to avoid restricting the application message sizes. The segmentation could be accomplished by enveloping the message header inside the routing header and adding to the routing information whatever identifiers and other information are needed for the windowing protocol. The windowing is done with these segments rather than with complete application-level messages. If I am right in the foregoing, then we may have reached the point where reliable messaging is adding complexity which may not be needed, given that most transport protocols are inherently reliable, being built on TCP. The major exception for us is SMTP. See section 2.6.7.3 of the IBM tpaML proposal for a discussion of SMTP and a suggested means of layering an end to end ACK on top of SMTP to achieve end to end at-most-once delivery. Please also note the suggestion in section 2.6.6 that a received message be hardened before returning the transport-level ACK. This appears to be sufficient to assure guaranteed delivery and failure recovery even with HTTP. We should look at what reliability gaps currently exist at the message service level and see if we can deal with them in a much simpler way. The discussions in the tpaML proposal (section 2.6) may provide guidance. Line 123, item 7: Observation: The usual sliding window protocols are full duplex with regards to messages and ACKs, and there is a pause only on detection of a lost message. The protocol specified in this document is not a sliding window at all; it is more like a "jumping window" protocol - it is half duplex and there is a pause on every window. That is a serious degradation of message latency and throughput compared to sliding window protocols. Line 134, Editor Note 9: The persistent store used for reliable messaging is (conceptually) independent of the long-lived persistent stores needed at the message service level for managing conversation state and long-term logging. An implementation may choose to use the same store for both purposes or use separate stores. If an implementation uses the same store, then statements in this specification about discarding messages from the persistent store must not be normative. An implementation which uses the same store for both purposes may need to mark messages as "windowing processing complete" but it cannot actually erase the messages. Line 136, item 9: Please replace "For only the last message..." by "To detect loss of the last message..." The statement in the specification is an implementation statement. For example, the sender could choose to set a deadline for each message and slide the deadline forward until the last message of the window. This would enable early detection of "hard" failures. My suggested change avoids stating a requirement that the timeout may only be set on the last message. Line 137, item 9 ("information from the TPA"): It is not obvious that a separate timeout is needed for reliable messaging. The existing transport-level timeout as defined in tpaML section 2.6.4 may serve the purpose. However, this point requires considerably more thought. As it stands, it is not clear to me that the complexity of the window timeout is worth the value added. A much simpler solution for this 1-out-of-N case (loss of the last message) is to rely on the normal transport-level timeout (e.g. the time to the HTTP response). Simply terminate the window. The messaging service will simply time out at the transport level and re-send the message, starting a new window. This, however, leads to the following considerations: In this protocol, there seem to be two possibilities regarding the timeout: The normal per-message transport-level timeout is not used with reliable messaging - but this extends the time to retry a lost message to the time to fill the window. The per-message transport-level timeout is still used on top of the reliable messaging protocol. In this case, the reliable messaging protocol must NEVER retransmit a message in the window if it was successfully received since the upper level already knows that the message was successfully received. (Perhaps discarding the duplicate is sufficient; I am not certain of this.) It is essential that we understand what additional reliability is provided by this protocol over the much simpler one described in the tpaML proposal - persist each message and then ACK it. Note that with the exception of SMTP, the transport-level ACKs are present whether or not reliable messaging is used, so for transports which have their own ACKs, reliable messaging seems only to delay the retry of a missing message until the end of the window. In addition to increasing latency, the retry causes the retried message to be out of order, which may cause trouble higher up in the system. Aside from retries, the protocol in this specification increases latency by preventing a message from being passed upward in the receiving system until the window is filled. This protocol may have some value for SMTP but, as mentioned earlier, the tpaML proposal suggests a much simpler means of adding reliability to SMTP. 2.2 Features Line 161 (High Performance): As mentioned earlier, the protocol in this specification is not a sliding window. It is a batching protocol which increases the latency for all messages except the last one in each window. See the above discussion. 2.3 Message Envelope Elements line 167, title: Shouldn't this be "Message Header Elements"? 2.3.2 Message Header - Reliable Messaging Info Element Line 173, editor note 12: As discussed earlier, the window count should not be visible to the parties. It must be established and managed by the message service handlers. 2.3.3 Routing Header Line 179, Editor Note 13: If it is intended that the messages in a single window can be from various TPAs and various conversations, then the message service instance must be identified. Be careful, however, because the latency created by such a window affects all TPAs and conversations, especially when retries are performed. If there is a separate message service instance for each conversation, then the window can be smaller and retries in one window need not delay other conversations. In this case, the conversation ID is sufficient to identify the message service instance. 2.4 Message Transfer Sequence Line 212, Editor Note 14: So far, the only payload in the message is the application payload. The error message should be expressed using elements in the routing header. Line 213, Item 5: It should be made clear that the persistent store described in this specification is logically distinct from any persistent storage used to store message state and logging information. 2.8 Garbage Collection Line 254 and following: Non-normative implementation text is useful when it helps to explain the protocol. I believe that this section just describes a storage management algorithm. The basic rule that should be described is that messages MAY be eliminated from the conceptual persistent store after they are acknowledged. It should be made clear that the store used for reliable messaging is logically distinct from the higher level long term persistent store but there is nothing preventing an implementation to use one store for both purposes. 5. References Lines 296 and 298: Please replace these two references by a reference to the combined specification. Regards, Marty ************************************************************************************* IBM T. J. Watson Research Center P. O. B. 704 Yorktown Hts, NY 10598 914-784-7287; IBM tie line 863-7287 Notes address: Martin W Sachs/Watson/IBM Internet address: mwsachs @ us.ibm.com *************************************************************************************
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC