sendDatagramOnFace could be improved

BGA · July 2, 2020, 3:55pm

So, I am playing with network flooding code to be able to both get state form all connected blinks and also synchronize data on them. The basic building block of all this is the ability to send datagrams using the sendDatagramOnFace() function.

This function badly violates the principle of least surprise as you might call it and it might not do what you expect, failing silently. More specifically, if you try to send 2 datagrams in sequence, it will most likely fail and only the second datagram will be delivered (as the second one will overwrite the first one).

Although it is documented on the comments for the function, it seems this did not have to be like that. Currently the function is implemented like this:

void sendDatagramOnFace( const void *data, byte len , byte face ) {

if ( len > IR_DATAGRAM_LEN ) {

    // Ignore request to send oversized packet

    return;

}

face_t *f = &faces[face];

f->outDatagramLen = len;
memcpy( f->outDatagramData , data , len );

}

But it could potentially be implemented like this:

bool sendDatagramOnFace( const void *data, byte len , byte face ) {

if ( len > IR_DATAGRAM_LEN ) {

    // Ignore request to send oversized packet

    return false;

}

face_t *f = &faces[face];

if ( f->outDatagramLen != 0) {

    return false;

}

f->outDatagramLen = len;
memcpy( f->outDatagramData , data , len ); 

return true;

}

I understand that storage space is at a premium, but this seem like a small enough change for a big benefit. Note that this is not even a breaking change as one can ignore the return value and the net effect would be what is there today.

Note that it could even potentially return a byte with an error code so one can differentiate between the packet too big failure and the pending datagram failure.

Is there a reason this change could not be implemented?

bigjosh · July 3, 2020, 7:37pm

Yea, the datagram stuff is admittedly awkward, but the above behaviors are by design. The (again, admittedly inelegant) thinking goes like this…

There are many reasons why a datagram might fail to reach the recipient - a full local buffer is only one of them. On this resource limited platform, better to encourage callers to generally assume a packet could be lost than to give back a specific “success” flag that really means “the packet was successfully copied into a local buffer- but that does not mean it will actually be transmitted, it does not mean that it will make it across the link, it does not mean that the recipient has a free receive buffer, and it does not mean that the recipient user code will read it.” Idiomatic datagram usage typically has some block of state and the most recent version of that state get shared repeatedly without trying to sequence the updates.
Passing in an oversize buffer is a coding failure not a runtime failure. Ideally we’d have a debug compile option that includes parameter checks on all calls and drops into a debugger whenever an assertion fails… but just not possible on this platform. Another strategy could be to instead pass in a fixed sized buffer (in, say, a struct) but this could end up wasting memory for common use cases. We could use some very fancy C++ template meta code to do compile time bounds checks, but fancy C++ stuff does not sit well on this platform and is also very unideomatic Arduino. So we split the baby and do the sanity check to prevent runtime corruption, but then fail silently when things do go wrong and trust the programmer to figure it out. Note that we take this tack repeatedly in this API.

All that said, if you are trying to serialize data or use a sequential program architecture, then this API will likely feel frustrating. While I have some other API models that I’d love to implement some day (like an Actor model where blinks pass guaranteed delivery messages around the mesh), for now this is what we have.

Can you share more about what you are using the datagrams for? We might be able to find an alternate approach that might get along better with this limited API.

Thanks!

bigjosh · July 3, 2020, 7:48pm

BTW, you mentioned network flooding. Here is an example of an idiomatic blinks program that counts the total nodes in the network by flooding…

Any similarities to what you are trying to do?

Note that the first time I wrote this, I used datagrams and was frustrated. But that frustration ended up leading me to what I think is a better overall solution to the problem (at least on this platform).

BGA · July 3, 2020, 10:11pm

Kinda. This is a subset of what I want to do, yes.

So, just to clarify: I am learning the platform and ready-made solutions are great but goes against what I am trying to do here. That being said, I really like what you did with the tools at hand.

So, what I want to do is a more general get information from nodes in the network. One practical example I have for a game that I am working on is that at some point in time you select a “piece” that will be moved to a different position. The “piece” can only move to an adjacent blink (easy to do), which will duplicate it, or it can move one further (then it just jumps there, no duplication). Note that in this case there might be a 1 blink hole between the current position and the one I want to move to. This hole can actually result in an arbitrary long path to be able to locate that spot (for the player, it is just one jump away.

So, counting blinks is one scenario that the code I a, trying to write should achieve. The other is to actually traverse the network finding all blinks that are a possible target from the current position and letting them know they are a possible target (so I also need some payload on the messages being sent). This way I can implement this simply as clicking the origin blink and the target one (if the one I click is not a target, it simply flashes red or something like that to indicate an invalid move).

Concerning the datagram API, I generally agree with what you said but I still think the local buffer full one not being possible to detect is a bit overkill. My main worry with this is not even the case of one blink generating multiple messages by itself, but a blink in the middle of the network somewhere simply routing messages (it could potentially receive 2 messages consecutively and process them one after the other resulting in one getting lost.

bigjosh · July 5, 2020, 1:32am

So it seems that the target location for a jump will always be on the perimeter of the current grouping, correct?

If so, would it be possible to maybe use an edge following algorithm to find valid targets? You would basically send out a probe from the source location that would follow the edge and keep track of the geometry counts as it moved from face to face - kind of like card counting. I have not fully thought this out, but seems possible to then detect any edge location that is 1 away from a location that is 1 away from the source. Make sense? It is also naturally self terminating since the edge is guaranteed to be a closed loop.

LMK if I am missing something or if I just totally misunderstood the problem!

bigjosh · July 5, 2020, 1:52am

Ah yes, believe me I understand your frustration. The programming model for this API sees the world as “state based” and so makes it very frustrating to try to do anything “sequenced based”. This is the first (and so far only publicly available) model because it happens to work well with many games (or maybe we are implicitly selecting which kind of games get built!).

I personally enjoy the challenge of writing code to this “state based” model - mostly because it is so different than all the other code I have written in my life. It sometimes forces me to find a deeper solution to the problem than I would have if I could have just written the code the “normal way”. That counting program above is a good example of this, and this is part of how Jonathan (move38 founder) roped me in on this project! He showed me some games he had written that had so little code… and I was hooked.*

All that said, there are some problems that are just sequential in nature and trying to shoehorn them state based will always be awkward. For example, the code that downloads a new game across the network is best expressed as a series of sequential packets and state changes. To do it state based would be like emulating UDP on top of TCP on top of UDP.

If you have a problem that really only makes sense as sequential code, you are not locked into the blinklib API. you can talk directly to the BlinkBIOS which naively speaks in atomic, sequential packets. The blinklib API runs on top of this lower-level API. It is defined in the BlinkBIOS.h file. So far no one but me as ever used this lower level API so it is not well documented, but making it easier to use is certainly on my to do list. Eventually I’d love to make a complete alternate “Actor model” sequential style API that ran at the same level as blinklib API but provided a completely different world view. instead of the current main event loop, user programs would provide asyc call back functions that would get called whenever anything happened (timer tick, button press, IR message recieved). LMK if this is a (bumpy) road you’d like to try going down!

Interestingly, the original blinks hardware itself was state-based and each blink continuously shared a single value with neighbors (yes - only a single value was shared on ALL faces at any moment!) which is why this was the programming model we ended up with today. The current blinks use a much more normal communications protocol based on a sequential series of bits, and now the state model is emulated on top of that by repeatedly sending the “state” value inside a stream of packets.

BGA · July 6, 2020, 3:44pm

Correct. If you think it as a circle, it is a circle with radius 2.

Well, that is exactly what I am doing. The issue here is just that there are some failure cases for sendDatagram that could be catch but are not so it might be that I never find this position.

So, assuming 2 perfectly aligned Blinks, is it guaranteed that a datagram will be sent eventually or might it never be sent? I am asking because the code suggests there is validation and some handshaking going on so the answer would be yes. But in practice I have been seeing cases of datagrams being lost.

I was thinking this could be solved with the concept of a datagram ack where sending a datagram (using the existing API would look something like this.

On the send side:

1 - Send a datagram and start a timer to wait for an ACK (what would be a reasonable value for this timer?).
2 - If an ack is received, datagram has been done so we do not need to worry about anything.
3 - Otherwise, if the face value got stale, give up
3 - if timer expires without ack, go to 1 (maybe do this 3 times or so).
4 - If we still did not send, give up.

On the receive side.

1 - Receive a datagram.
2 - If face value got stale. don’t do anything.
3 - Send ack (now, who acks the ack? ).

The problem with doing things like this is that it(mostly) eats storage space. At times like this I really wished a Blink had, maybe, 10k of storage instead of ~5k.

ScarPixel · July 7, 2020, 2:32am

Don’t know if it helps, but I implemented an alternate communications program for my project. It uses the “value on face” protocol instead of datagrams and has handshaking. It has proven to be very reliable in use. The only gotcha is ensuring you don’t overload the interface, but as long as you’re not sending multiple packets every frame it should hold up pretty well.

BGA · July 7, 2020, 5:20am

Hey @ScarPixel.

I actually saw your code already and it was a pretty neat idea. In fact, I tried to combine something like you did with the comparatively simpler approach of using datagrams to send data. I have some work in progress code here:

https://github.com/brunoga/blinks/tree/master/datagram

The code is currently not working and I am puzzled. Basically it seems that when I set the face value to DATAGRAM_RECEIVED, the other end never sees it and none of the debug I added shows up. I will keep diging but in case you have any suggestions, that would be great.

ScarPixel · July 7, 2020, 5:32pm

Ah so using value on face for the handshaking, but datagrams for the payload? That makes a lot of sense and would let you more easily send larger payloads.

I’ll take a look if I get a chance some time this week.

BGA · July 7, 2020, 11:58pm

Got it to work. The only thing missing is a timeout for sending the datagram. There are probably ways to make it smaller too (I am doing at least one check I am pretty sure is redundant). Let me know what you think.

BGA · July 8, 2020, 4:02am

I guess I finished it now. There is no space optimization I can see (extra eyes would help) and I did not even need to use timers (I am indirectly using the built-in timers in the face value API.

I also renamed the directory. This is the new one.

I will update the previous post to point to the right directory.

BGA · July 15, 2020, 5:47am

I updated the code and now it is a lot more useful as it does automatic retries. The main issue is that for the retries I need to keep datagrams cached per face and if you consider the maximum datagram size 16 bytes, that is 96 bytes wasted which is almost 1/3 of all the available memory for programs.

@bigjosh any chance something like this (not necessarily with this implementation, of course, I just worked with the tools I had but if it was done at the blinklib level it would be easier and it already caches datagrams so there would be no extra memory wasted.

Anyway, with this code as is, I improved the reliability of my game (which uses a lot of datagrams) a lot, The only thing not solved is that it does not correctly handle 2 blinks trying to send messages to each other at the same time due to the use of face values for handshake (that would also not be a problem for blinklib as it does not need to resort to this).

Anyway, hope this might be useful to other people. Code at the same repository above.

bigjosh · July 15, 2020, 4:04pm

Yes, the most recent datagram for each face is already cached by the blinklib code. If a new datagram comes in and you have not already called markDatagramRead() then the new datagram is discarded, otherwise it is copied into the buffer. This happens each time loop() returns so you do not have to worry about the buffer changing while inside loop().

So you could mark a datagram on a face as read but continue to look at the buffer and get the old datagram until a new one comes in. At most you would need to remember the length of the datagram in each face’s buffer (assuming the datagrams are variable length).

The relevant code is here…

github.com

bigjosh/Move38-Arduino-Platform/blob/main/cores/blinklib/blinklib.cpp#L543


      
                  
                  viralButtonPressSendOnFaceBitflags = IR_FACE_BITMASK;
          
                  // Prevent warm sleep
                  reset_warm_sleep_timer();
                              
              }    
                  
          }
          
          static void RX_IRFaces() {
          
              //  Use these pointers to step though the arrays
              face_t *face = faces;
              volatile ir_rx_state_t *ir_rx_state = blinkbios_irdata_block.ir_rx_states;
          
              for( uint8_t f=0; f < FACE_COUNT ; f++ ) {
          
                      // Check for anything new coming in...
          
                  if ( ir_rx_state->packetBufferReady ) {

LMK if any questions!

BGA · July 16, 2020, 3:30pm

Thanks for the heads up. I was able to infer this behavior form the code.

I am actually rewriting the code to not use face values at all and, instead, reserve a byte from the maximum datagram payload and using a brute force approach with a three-way handshake to guaranteed delivery:

On the send side:

1 - Send datagram and wait for an SENDACK.
2 - If the SENDACK did not arrive, go to 1 (note there is no timer involved. It does this at every loop iteration. Same for all steps below).
3 - If the SENDACK arrived, send ACK.
4 - Wait for all data coming as datagrams to settle (datagram length on face is 0).
5 - Mark datagram as sent.

On the receiving end:

1 - Try to receive a datagram.
2 - If one is available, send SENDACK.
3 - Wait for ACK. If it does not show up, go to 2 (again, no timer involved).
4 - If ACK was received, marke datagram as received.

There are lots of other things at play that I did not list here (collison control, handling the ack flood, etc) but that is the general idea.

With this in place, I can now guarantee that datagrams are sent by one blink and received by the other. Extending it to all connected blinks, I can now guarantee that a broadcast datagram will be received by all connected blinks (current code can even recover from a blink being disconnected and reconnected.

I assume this might have an impact on battery life (the flood of datagrams while data is being sent, but I guess it is worth the price. Also, the flood only happens when a datagram is in transit and stops completely when there is nothing to do.

BGA · July 16, 2020, 5:31pm

The code rewrite to use datagrams instead of face values for ACK is now live in the same repository. It is not fully working yet but it is mostly finding the corner cases and fixing them. I am confident this will eventually work as expected and provide guaranteed datagram delivery.

It is also using a considerable amount of space. Any suggestions on how to improve that are welcome. OTOH, memory usage is ok (I do not have to keep copies of datagrams anymore).

bigjosh · August 2, 2020, 2:08am

Make a fork of blinklib and hack out all of the shared values stuff. Hack out the code that runs around loop() to build the fiction of a “frozen” world inside of loop().

Now you can write your own normal, async driven code directly the way you want - react to an incoming packet with an outgoing packet immediately. And you will also not be lugging around all that extra code that is using up space and fighting against you.

I always meant to publish a library exactly like this, I just have not had time and there has not been much motivation since the current crop of games seem to fit well into the blinklib worldview (or maybe we are currently selecting for those types of games and suppressing others?). Additionally, I imagined that this library would mesh with a real compiler rather than the ugly Arduino IDE stuff. I actually think it could be a great fit for a golang SDK, although significantly more work since you’d have to rewrite some of the blinklib stuff.

Let me know if you want to pursue this! I’ll do as much as I can to support your efforts!