I increased my packet size to 5000 bytes, and the test takes forever, meaning I ran out of patience! However, I got no sequence errors in the first few 10,000s!
So I reduced the packet size to 1000, and the number of packets to 10,000.
Here is the picture:
The generated packets go through 2 separate sockets in sequence, and I am using non-blocking sends and receives - this means, I assume, that the sockets are doing the buffering for me. Joe has convinced me that the TCP layer looks after sequencing, and my tests seem to bear this out, so I probably don't need to add my own sequencing... But it probably doesn't hurt!
Elapsed time: 16 secs.
Changing WriteToConsole? to Discard: elapsed time 12 secs.
I then changed the socket components to blocking sends and receives, and only increased the elapsed time to 14 secs, which isn't bad.
Just for fun, I tried sending (and waiting for) the acknowledgement every 20 packets, and the elapsed time is back down to 12 secs. again. Which I think is actually a fairly good simulation of a real bounded buffer! So I think I am non-blocking for 19 packets out of every 20, and blocking for 1 out of 20. If I am right, this might give you a reasonable simulation of bounded buffers, and still get decent performance.
PS Similar results for C# - sorry to say it's faster than Java (with Discard instead of WriteToConsole?)! 7 secs for fully blocked; 5 secs. for unblocked, or blocking every 20 messages.
Does this make sense to you all? I really hope so :-)