I did not investigate this deeper yet but it was a surprising find. Imagine you have a program that does not call setColor() (not that unlikely as one could be using setColorOnFace() exclusively although I think this is not common). I will use as example a completely empty sketch (it has the same effect even in actual programs it seems):
Sketch uses 1888 bytes (32%) of program storage space. Maximum is 5888 bytes.
Global variables use 690 bytes (67%) of dynamic memory, leaving 334 bytes for local variables. Maximum is 1024 bytes.
Now lets do a small adjustment with no actual noticeable change:
Sketch uses 1706 bytes (28%) of program storage space. Maximum is 5888 bytes.
Global variables use 690 bytes (67%) of dynamic memory, leaving 334 bytes for local variables. Maximum is 1024 bytes.
Somehow, ADDING a call to setColor() results in a 4% IMPROVENT in storage usage! That is what I call something counter-intuitive. I wonder what the compiler is doing.
And, BTW, this is what I currently get with my upcoming custom blinklib.
Without setColor():
Sketch uses 1776 bytes (30%) of program storage space. Maximum is 5888 bytes.
Global variables use 690 bytes (67%) of dynamic memory, leaving 334 bytes for local variables. Maximum is 1024 bytes.
With setColor():
Sketch uses 1584 bytes (26%) of program storage space. Maximum is 5888 bytes.
Global variables use 690 bytes (67%) of dynamic memory, leaving 334 bytes for local variables. Maximum is 1024 bytes.
Ok, this one was interesting enough to tempt me to break my general “stop worrying and trust the compiler” rule. Luckily is is an easy one.
TL;DR
The compiled setColor() function is called 5 times internally in the blinklib code, and each time it is inlined. Adding the 6th call in the setup() function causes the compiler to stop inlining it and instead have only one copy of the function and 6 calls. Since the function is 30 bytes long, this ends up being smaller in total size even with the overhead of the calls.
void __attribute__ ((noinline)) setColor( Color newColor) {
…and the size of an empty sketch drops from 1888 to 1696 bytes.
So should we add the noline to the next version of blinklib?
Probably not. I’ve spent countless hours trying to manipulate the compiler, with mixed results at best. You need to look no further than this very function to see my past failures…
// This at least gets the semantics right of coping a snapshot of the actual value.
blinkbios_pixel_block.pixelBuffer[face].as_uint16 = newColor.as_uint16; // Size = 1940 bytes
// This BTW compiles much worse
// *( const_cast<Color *> (&blinkbios_pixel_block.pixelBuffer[face])) = newColor; // Size = 1948 bytes
I also specifically remember trying a memcpy() here and it was both bigger and also has the wrong semantics.
Also, realistically adding the noinline is unlikely to make any practical program smaller since it only matters when there are no other calls to setColor(). It is possible that it could also unintentionally make real programs longer. It is also ugly since it is compiler specific and ad-hoc.
If I had it all to do over again, maybe I would use some templates for the color conversion stuff so that the computations would compile away in the cases of compile-time known colors. But who knows, maybe that would be worse.
I fully agree with this BUT tweaking compiler/linker flags to control inlining (and other things) is a perfect valid way to achieve the same benefits (that might even extend to other functions, not only to setColor()). Unfortunatelly it appears that unless you want to go the makefile way, there is no way to pass extra flags using Arduino IDE (there might be with VSCode but I did not try it yet).
Check out the platform.txt file. you should be able to easily customize compiler and linker settings in there. gcc also has many “modifiers” that you can use to change the way individual functions are compiled. LMK if you find any impactful win-win changes so we can propagate them upstream!
So, there are only 2 flags I think might be considered:
-fno-inline-small-functions: As the name implies this trades speed for size by not inlining small functions but, in a platform like Blinks, speed is usually not the most pressing aspect. I enabled this and will be trying different programs to see if I notice anything. This will have the effect of what you did manually (setColor() will not be inlned…
-fno-tree-scev-cprop: This has small benefits whenever you have nested loops. It juts vectorizes the outer loop. There are really no drawbacks in this one (and it saves 2 extra bytes on blinklib ).
Both flags will also help with more complex games that would have a considerable amount of functions and all kinds of face-iterating loops.
Ok, after more testing, forget it. These flags actually make the situation worse on bigger programs (where space is actually a bigger issue). As we can not set flags per project in an easy way, I guess pursuing this is a wild goose chase as most optimizations only really work well for one type of program than another and the ones that would help generally are usually already enabled with -Os.