Here’s some strategies I used to reduce storage usage. I’ll try to generalize my learnings, but will include examples as I remember them while coding Terrarium. Sorry this is a bit rambling!
Avoid calls to expensive functions
The compiler is pretty smart. If you don’t call a function anywhere then it won’t be included in the compiled output. So think hard when you call a potentially expensive function, especially when an alternative exists.
In my case, I was using makeColorHSB
as I rendered face colors. HSB is a nice color space to work in, but makeColorHSB
is an expensive function both computationally and code size. I eventually switched back to makeColorRGB
, which is much more storage efficient.
Reduce corner cases
If you can design/structure some system to avoid a corner case, you will have saved an if/else statement, which translates into storage savings. Even if you can’t avoid the corner case entirely, as long as it’s a “don’t care” then you can avoid writing special case code to handle it.
In the original Terrarium, each tile relied on knowing the amount of water contained in each neighbor’s face. As water flows, it is passed from one tile to the next through communication messages. The source tile needs to know how much water the destination tile can accept so that it doesn’t send more than it can handle. Every time a face gained or lost water, it would transmit its current quantity to its neighbor. The neighbor would remember this and only send water if it could fit.
In the new code, each face instead tracks its own sense of being “full” or “not full”. It’s this full flag that gets transmitted to the neighbor, not the actual quantity.
Also, the max water a face can hold is not a strict limit. It is able to go above this amount if it is given extra water due to communications delay.
These modifications allowed me to remove corner case and range checking, reducing the code footprint in the new version.
This also brings up a related topic…
All comm packets are expendable
This is specific to my project, but is interesting. In the original code, packets sent between tiles were guaranteed to be received. Each face has a fix sized outgoing queue. If circumstances caused more packets to be queued than it could handle then an error code would be put on the face and the only way to clear it was to disconnect the two tiles and reconnect. As such, I was careful not to send too many packets at once.
The new code treats all packets as expendable. If a command is sent to the queue, but the queue is full, the command is simply dropped. The simulation is structured in such a way that the sender doesn’t care if it never gets there. It might result in odd behavior, but it won’t break things.
For instance, when passing water between tiles, if water is sent from one tile, but never received then so what? Those units of water would cease to simply exist, but it won’t break the simulation.
Another example. If a bug is flying within one tile and tries to transfer to a neighbor, a request packet is sent to that neighbor. If it’s okay then that neighbor sends an acknowledgement back and turns on its bug code. The original tile receives the ack, knows the bug was taken, and turns off its bug code.
Either one of those packets could fail to arrive. For the original transfer attempt, if it gets lost then the originating tile just keeps the bug as if the transfer was rejected. The more interesting case is if the acknowledgement packet never arrives. Then the source tile would never turn off its bug code, resulting in bug duplication ahhhh. I’ve never seen this in practice, but if it ever happens then no worries. It doesn’t break anything.
This eliminated all the error checking code around the transmit queues, as well as the rendering of the error conditions.
Trading code for data
If you have an abundance of data, but are running out of code space, then perhaps there is a way you could trade one for the other.
I first did this during an optimization pass on the water system in the original version. Before, the algorithm to flow water between faces and tiles was done purely in code. I knew there was a lot of repetitive code, though. Water flowing from face 0 to face 3 wasn’t that different than from face 3 to the neighbor tile below, or even sideways from face 0 to face 1. I created a data format that let me parameterize these differences, then made a simple loop that iterated through the table to do its magic. The table is waterFlowSequence[]
if you’re curious.
I applied this same optimization when I iterated on the plant code in the current version. I created a table that defines a plant’s state at any given point during its growth. While this did save code space, I found myself quickly running out of data storage! Each entry in the table took four bytes, which consumed far too much data as I tried to add new plant types. I iterated a ton on the data format as I shaved bits out of it and eventually got it down to two bytes per entry. It’s still pretty tight, but I have room for one more plant type!
Butcher blinklib
Getting desperate? Dive into the blinklib source and see what you can excise. For instance:
- The
dim
function is expensive. Even if your own code doesn’t use it, the blinklib itself does. Rather than remove it entirely, I changed it to be essentially a nop. This saves > 100 bytes. It does sacrifice the nice fade-out/fade-in as tiles go to sleep and wake up, but these are desperate times!
Color dim( Color color, byte brightness) {
#if 1
return color;
#else
return MAKECOLOR_5BIT_RGB(
(GET_5BIT_R(color)*brightness)/255,
(GET_5BIT_G(color)*brightness)/255,
(GET_5BIT_B(color)*brightness)/255
);
#endif
}
- The
setColorOnFace
function passes in aColor
struct by value. Every time it is called, the compiler has to pack that struct into the function call. Instead, I createdsetColorOnFace2
that accepts theColor
as a pointer. This only saved ~30-40 bytes for me, but every bit helps.
Sometimes better, sometimes worse
I have macros that give me a face relative to another face, rotated CW or CCW. Also for giving the opposite face. I have tried several different algorithms. Some perform better in certain situations, but worse in others. I haven’t come across a perfect solution.
#define CCW_FROM_FACE(f, amt) (((f) - (amt)) + (((f) >= (amt)) ? 0 : 6))
#define CW_FROM_FACE(f, amt) (((f) + (amt)) % FACE_COUNT)
#define OPPOSITE_FACE(f) (((f) < 3) ? ((f) + 3) : ((f) - 3))
It is these three that I would be tempted to dive into inline assembly to try to get something super compact.