Mecrisp Stellaris: Code Size Optimization for Constants
I picked up a few more interesting bits of insight in working on this I2C driver project. Well, interesting to me at least.
Loading 32 bit numbers in registers
With the ARMv6-M architecture you can't simply drop a 32-bit value into a register, even though the registers are 32-bit registers. What I mean is, you can't drop the value in as an IMMEDIATE value, that is to say, as part of your processor instruction. You can load a 32-bit value from a memory address, say with the LDR instruction. But if you want an immediate value dropped in, you are limited to 8-bits, which is the size of the immediate field in encoding T2 of the ADDS instruction in ARM's Thumb instruction set.
What is the difference? Calls to memory are (I'm told) much more expensive, when compared to doing this with only direct manipulations on the register. If you want to load a 32-bit value without the memory call, the trick is to drop in the first byte, shift over the bits in the register by one byte, add the second byte, shift again, and so forth. So, for example, here is a simple word that drops DEADBEEFh on the stack, along with the dis-assembled code:
The first two lines of assembly are storing the previous TOS value, and the last line is the return jump, and the rest is the adding/shifting which I described.
It is not so bad, though, if your constant is smaller, because than the Mecrisp compiler doesn't have to do as many add/shifts:
That is really only one instruction, not counting adjusting the stack and returning from the subroutine.
Constants in FORTH simply do what is described above:
I was surprised to find out that, in Stellaris, this constant code is automatically inlined. So, take this little word I coded, which uses a register constant, and helper word I2C_TAR' (which does not interested us here)...
Here is the disassembled code:
From a performance perspective this is good, but it is not good for code size: half of the instructions in the word (fourteen instructions, to be precise) are devoted to dropping two values on the stack!
The helpful folks at #email@example.com suggested, as one option, overriding CONSTANT with the following, which is not inlined:
Now, after recompiling i2c_tar, the constants are only four-byte calls to other code: