Timing issue with datagrams

bigjosh · August 2, 2020, 3:12pm

This is the documented and expected behavior. I promise.

// Each datagram sent is received at most 1 time.

github.com

bigjosh/Move38-Arduino-Platform/blob/main/cores/blinklib/blinklib.h#LC72:~:text=Datagram processing


      
          
          /* --- Datagram processing */
          
          // A datagram is a set of 1-IR_DATAGRAM_MAX_LEN bytes that are atomically sent over the IR link
          // The datagram is sent immediately on a best efforts basis. If it is not received by the other side then
          // it is lost forever. Each datagram sent is received at most 1 time. Once you have processed a received datagram
          // then you must mark it as read before you can receive the next one on that face. 
          
          // Must be smaller than IR_RX_PACKET_SIZE
          
          #define IR_DATAGRAM_LEN 16
          
          // Returns the number of bytes waiting in the data buffer, or 0 if no packet ready.
          byte getDatagramLengthOnFace( uint8_t face );
          
          // Returns true if a packet is available in the buffer
          boolean isDatagramReadyOnFace( uint8_t face );
          
           // Returns a pointer to the actual received datagram data
           // This should really be a (void *) so it can be assigned to any pointer type,
           // but in C++ you can not cast a (void *) into something else so it doesn't really work there

This is also expected behavior. Let’s say that for your blinks on your desk with your battery levels, you would expect statistically that 1 in 1,000 datagrams will get dropped. If you are sending 100 datagrams per second (10ms between sends), I would expect to wait an average of 10 seconds between dropped datagrams. If you increase the delay to 100ms between datagrams, now I would expect to wait an average of 100 seconds between dropped datagrams.

The real effect is probably not so linear, but could explain what you are seeing qualitatively.

I would not consider this a “bug” - it is the explicitly documented behavior.

I can think of many possible physical and logical reasons that the chance of a datagram getting dropped would increase with the number of concurrent datagrams happening on other faces. I have spent many dozens of hours of my life staring at oscilloscope traces and logic analyzers looking at these cases!

But fundamentally according to Claude, there is no such thing as a ~~free lunch~~ reliable communication channel. The best we can ever do is pick how we’d like to trade-off speed, fidelity, latency, and complexity.

The logical communications channel between two touching blinks is a surprisingly noisy one. We are using LEDs meant only for transmitting also as receivers. There are no less than 6 air-to-polycarbonate interfaces that each extract their dB toll on every passing photon. There is a sub-$1 MCU running at a pitiful 8Mhz that is solely responsible for managing the constant concurrent bidirectional communications on all these LEDs while also blinking the 18 visible LEDs fast enough to look like there is a range of brightnesses, and managing the charge pump, and keeping track of the button states, and monitoring the battery voltage… and running the game!

Like almost all modern stacks, the blinks’ network layer explicitly does not guarantee delivery. That is left to higher level protocols because doing it at the network layer would add complexity, latency, and non-determinism. Higher level protocols are a better place to make these decisions to suit their use cases. The blinklib transport layer uses redundancy rather than acknowledgement to ensure delivery of the data because this is simple and has good latency and low jitter- which is a good fit for many games.

If you need an ACK based transport layer then the best place to do this is directly on top of the network layer. The game downloading mechanism works like this - it uses sequence number-based request scheme with timeouts to ensure that the blocks are delivered and delivered in order. Alternately you could also do a sliding window based system like TCP to deliver an in-order byte stream across the link. It all depends on what is the best fit for the problem you are trying to solve. Let’s figure out how to best get you the communication services you want rather than figuring out why the one that you’ve got is such a bad fit!