[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Feedback on Reliable Messaging v0-06
Wow, I have to say that this document seems to be the most organized and most readable of the bunch. And the diagrams are terrific! I've so far been withholding my comment that we should have a tech writer pretty things up and reorg things before declaring anything public and final. So consider it said for the Packaging and Header specs (sorry guys -- it is great material, though). :O Here are my comments and questions. The only overlap I could find with David's feedback was on the issue of retry counts. 1) Why does the Reliability spec say that it "changes" the "existing ebXML specifications?" Is implementing the reliability spec optional? If it is optional to support the AtMostOnce property found in the header, the Header spec should probably say this. (See Reliability lines 62 and 86, verbage such as "new element" on lines 87 and 88, and all of section 3 "Changes to Existing ebXML Specifications." 2) Why does ebXML support both a message unique identity and a recovery number? I’m mostly just curious, but part of me is a little concerned. If ever the identifier and the recovery number get out of synch (an error, I suppose), and if communicating endpoints are not both synchronizing on the same value, havoc could be loosed. Also, although I think the definition is complete as is, it might be helpful to clarify that the identity doesn’t change on retransmissions (that is, to say that the identity is associated with the message, not with the message-transmission). 3) The ACK request must ride in a normal request. What happens if too few messages are sent in a window to get an ACK in the time required by the application? That is, the application must move on with full knowledge that the messages were delivered. Do we need a way to request an ACK without requiring it to ride in a normal request? 4) Why define maximum message size? I’m sure there’s a good reason. More working implementations, perhaps, since few people know how to accommodate arbitrarily sized messages? (See note on line 153 in 2.4 "Message Transfer Sequence.") 5) The verbage of section 2.5 "Error Detection" (line 163) assumes that there is a timeout period for waiting for an ack. The spec probably ought to assert that there shall be a timeout period so that it is a normative requirement of the protocol. And what is the timeout period for a recovery message? Is this left to the TPA to specify? In BizTalk the timeout period is given by the delivery deadline; but BizTalk has a separate time for message expiration. (See line 163 of 2.5 "Error Detection.") 6) Why are all messages in the window resent on a transport error? Won’t a transport error apply to just one message? (See line 175 in section 2.6 "Window Recovery Sequence.") 7) Section 1.7 "Detection of Repeated Messages by the Receiver" (lines 187+) outlines a "suggested" implementation. For clarity sake, we should probably separate normative from non-normative material. That way it becomes easy to identify compliance requirements. Might want to do this across all specifications. The W3C does this. 8) What if the transport errors don’t go away? The spec, as is, indicates that the protocol handler would go into an infinite loop. Likewise, what if the recovery sequence never succeeds? We might want the spec to assert that the receiver may engage in multiple recovery retries. Likewise, what if the receipts are never received? Should there be a maximum retry count that applies to all kinds of errors? Should the protocol state that such a maximum must exist, but not state what that maximum should be? I assume this information would be available in the TPA, and no self-respecting TPA would be without a maximum retry count, but I think the protocol really must constrain the TPA. I’d think the protocol should at least state the kinds of errors to which retry counts should be applied. 9) Regarding section 2.8 "Garbage Collection": First, I’d like to suggest renaming the section to something else, because garbage collection has a very definitive meaning in distributed communications -- the recycling of remotely referenced objects -- and this is not the meaning intended here. Second, why isn’t the expiration of a counter sufficient for its removal? Regardless of what else is going on, if it has expired, it must be trashed. I don’t understand why the other conditions must also be satisfied. Maybe I’m just not understanding something. Third, why can’t a counter be removed after the window has closed? Is this because the receiver can’t know that the sender will get the ack in time or even get the ack at all? 10) I realize that this is an "open issue," but I’m really concerned about time synchronization. This is also one of my concerns about BizTalk. The BizTalk spec claims that time synchronization is not an issue because of the latencies involved in net access. Well, it is possible to engineer those latencies down to nearly whatever is required, so I don’t buy the BizTalk answer. Time synchronization is a really really tough issue. 11) BizTalk has the notion of a "processing deadline," which is the time by which the recipient application must complete processing the message. The message must carry its processing deadline. BizTalk also provides a "delivery deadline," which appears to be the same as the ebXML "Message Expiration Timestamp." Basically, the sender heeds the delivery deadline, taking action on failure to meet the deadline, while the receiver ignores it. The receiver heeds the processing deadline, refusing to ack or forward messages that fail to meet the deadline, while the sender ignores it. I think the idea is that the only real deadline is an application-level deadline, and the intervening middleware and even the applications should be given freedom to apply heuristics in order to try to meet the deadline. Does this approach have any merit? 12) Are acks to be sent for messages that arrive at the recipient after the message expires? The spec says that such messages are dropped, suggesting that no ack should be sent, but we should make an explicit statement. Otherwise some implementations may ack and others won’t. It seems problematic to provide an ack for a message that gets dropped. 13) When a message is retransmitted, must the entire packaged retransmitted message be identical to the originally transmitted message? Some implementations may be inclined to insert a new timestamp somewhere, or perhaps reorder elements, or maybe there is some custom header that the protocol is using that it is inclined to update. Should there be an explicit requirement that a retransmitted message be identical to the original? BizTalk goes out of its way to assert this requirement. 14) We should consider identifying all the kinds of information that must be passed to an application by an TR&P engine the fronts an application. I’ll call this the "TR&P processor." Section 2.5 "Error Detection" line 169 seems to be the only place that provides such information. It says that when a message is lost the loss must be reported to the sending application. What if the recipient receives an invalid message? What about when retry counts are exceeded, say by the sender sending normal messages or the receiver failing to get retransmits after attempting error recoveries? What should I tackle next? I'm not even sure where to find the specs. I've been pulling these out of this discussion list. I need one that covers synchronous vs. asynchronous processing, along with one that describes error messages. Hope you're having fun at the F2F. Sorry I can't be there. Too much fish to fry in too short amount of time. - Joe
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC