This is the documented and expected behavior. I promise.
// Each datagram sent is received at most 1 time.
This is also expected behavior. Let’s say that for your blinks on your desk with your battery levels, you would expect statistically that 1 in 1,000 datagrams will get dropped. If you are sending 100 datagrams per second (10ms between sends), I would expect to wait an average of 10 seconds between dropped datagrams. If you increase the delay to 100ms between datagrams, now I would expect to wait an average of 100 seconds between dropped datagrams.
The real effect is probably not so linear, but could explain what you are seeing qualitatively.
I would not consider this a “bug” - it is the explicitly documented behavior.
I can think of many possible physical and logical reasons that the chance of a datagram getting dropped would increase with the number of concurrent datagrams happening on other faces. I have spent many dozens of hours of my life staring at oscilloscope traces and logic analyzers looking at these cases!
But fundamentally according to Claude, there is no such thing as a free lunch reliable communication channel. The best we can ever do is pick how we’d like to trade-off speed, fidelity, latency, and complexity.
The logical communications channel between two touching blinks is a surprisingly noisy one. We are using LEDs meant only for transmitting also as receivers. There are no less than 6 air-to-polycarbonate interfaces that each extract their dB toll on every passing photon. There is a sub-$1 MCU running at a pitiful 8Mhz that is solely responsible for managing the constant concurrent bidirectional communications on all these LEDs while also blinking the 18 visible LEDs fast enough to look like there is a range of brightnesses, and managing the charge pump, and keeping track of the button states, and monitoring the battery voltage… and running the game!
Like almost all modern stacks, the blinks’ network layer explicitly does not guarantee delivery. That is left to higher level protocols because doing it at the network layer would add complexity, latency, and non-determinism. Higher level protocols are a better place to make these decisions to suit their use cases. The blinklib
transport layer uses redundancy rather than acknowledgement to ensure delivery of the data because this is simple and has good latency and low jitter- which is a good fit for many games.
If you need an ACK based transport layer then the best place to do this is directly on top of the network layer. The game downloading mechanism works like this - it uses sequence number-based request scheme with timeouts to ensure that the blocks are delivered and delivered in order. Alternately you could also do a sliding window based system like TCP to deliver an in-order byte stream across the link. It all depends on what is the best fit for the problem you are trying to solve. Let’s figure out how to best get you the communication services you want rather than figuring out why the one that you’ve got is such a bad fit!