Mecrisp Stellaris: Code Size Optimization for Constants
I picked up a few more interesting bits of insight in working on this I2C driver project. Well, interesting to me at least.
Loading 32 bit numbers in registers
With the ARMv6-M architecture you can't simply drop a 32-bit value into a register, even though the registers are 32-bit registers. What I mean is, you can't drop the value in as an IMMEDIATE value, that is to say, as part of your processor instruction. You can load a 32-bit value from a memory address, say with the LDR instruction. But if you want an immediate value dropped in, you are limited to 8-bits, which is the size of the immediate field in encoding T2 of the ADDS instruction in ARM's Thumb instruction set.
What is the difference? Calls to memory are (I'm told) much more expensive, when compared to doing this with only direct manipulations on the register. If you want to load a 32-bit value without the memory call, the trick is to drop in the first byte, shift over the bits in the register by one byte, add the second byte, shift again, and so forth. So, for example, here is a simple word that drops DEADBEEFh on the stack, along with the dis-assembled code:
: foo $deadbeef ; ok.
see foo
200203BE: 3F04 subs r7 #4
200203C0: 603E str r6 [ r7 #0 ]
200203C2: 26DE movs r6 #DE
200203C4: 0236 lsls r6 r6 #8
200203C6: 36AD adds r6 #AD
200203C8: 0236 lsls r6 r6 #8
200203CA: 36BE adds r6 #BE
200203CC: 0236 lsls r6 r6 #8
200203CE: 36EF adds r6 #EF
200203D0: 4770 bx lr
ok.
The first two lines of assembly are storing the previous TOS value, and the last line is the return jump, and the rest is the adding/shifting which I described.
It is not so bad, though, if your constant is smaller, because than the Mecrisp compiler doesn't have to do as many add/shifts:
: foo $aa ; Redefine foo. ok.
see foo
2002040A: 3F04 subs r7 #4
2002040C: 603E str r6 [ r7 #0 ]
2002040E: 26AA movs r6 #AA
20020410: 4770 bx lr
ok.
That is really only one instruction, not counting adjusting the stack and returning from the subroutine.
Constants
Constants in FORTH simply do what is described above:
$deadbeef constant myconst ok.
see myconst
200203EA: 3F04 subs r7 #4
200203EC: 603E str r6 [ r7 #0 ]
200203EE: 26DE movs r6 #DE
200203F0: 0236 lsls r6 r6 #8
200203F2: 36AD adds r6 #AD
200203F4: 0236 lsls r6 r6 #8
200203F6: 36BE adds r6 #BE
200203F8: 0236 lsls r6 r6 #8
200203FA: 36EF adds r6 #EF
200203FC: 4770 bx lr
ok.
I was surprised to find out that, in Stellaris, this constant code is automatically inlined. So, take this little word I coded, which uses a register constant, and helper word I2C_TAR' (which does not interested us here)...
$40044004 constant I2C0_IC_TAR
$40048004 constant I2C1_IC_TAR
: i2c_tar ( -- ) cr
." I2C0 " I2C0_IC_TAR i2c_tar'
." I2C1 " I2C1_IC_TAR i2c_tar' ;
Here is the disassembled code:
see i2c_tar
2000B2CE: B500 push { lr }
2000B2D0: F7F7 bl 200027AA --> cr
2000B2D2: FA6B
2000B2D4: F7F7 bl 20002878 --> .' I2C0 '
2000B2D6: FAD0
2000B2D8: 4905
2000B2DA: 4332
2000B2DC: 2030
2000B2DE: 3F04 subs r7 #4
2000B2E0: 603E str r6 [ r7 #0 ]
2000B2E2: 2680 movs r6 #80
2000B2E4: 0336 lsls r6 r6 #C
2000B2E6: 3688 adds r6 #88
2000B2E8: 02F6 lsls r6 r6 #B
2000B2EA: 3604 adds r6 #4
2000B2EC: F7FF bl 2000B238 --> i2c_tar'
2000B2EE: FFA4
2000B2F0: F7F7 bl 20002878 --> .' I2C1 '
2000B2F2: FAC2
2000B2F4: 4905
2000B2F6: 4332
2000B2F8: 2031
2000B2FA: 3F04 subs r7 #4
2000B2FC: 603E str r6 [ r7 #0 ]
2000B2FE: 2680 movs r6 #80
2000B300: 0336 lsls r6 r6 #C
2000B302: 3690 adds r6 #90
2000B304: 02F6 lsls r6 r6 #B
2000B306: 3604 adds r6 #4
2000B308: F7FF bl 2000B238 --> i2c_tar'
2000B30A: FF96
2000B30C: BD00 pop { pc }
ok.
From a performance perspective this is good, but it is not good for code size: half of the instructions in the word (fourteen instructions, to be precise) are devoted to dropping two values on the stack!
The helpful folks at #mecrisp@irc.hackint.org suggested, as one option, overriding CONSTANT with the following, which is not inlined:
: constant <builds , does> @ ;
Now, after recompiling i2c_tar, the constants are only four-byte calls to other code:
see i2c_tar
2000ABBE: B500 push { lr }
2000ABC0: F7F7 bl 200027AA --> cr
2000ABC2: FDF3
2000ABC4: F7F7 bl 20002878 --> .' I2C0 '
2000ABC6: FE58
2000ABC8: 4905
2000ABCA: 4332
2000ABCC: 2030
2000ABCE: F7FF bl 2000AB72 --> I2C0_IC_TAR
2000ABD0: FFD0
2000ABD2: F7FF bl 2000AB20 --> i2c_tar'
2000ABD4: FFA5
2000ABD6: F7F7 bl 20002878 --> .' I2C1 '
2000ABD8: FE4F
2000ABDA: 4905
2000ABDC: 4332
2000ABDE: 2031
2000ABE0: F7FF bl 2000AB9A --> I2C1_IC_TAR
2000ABE2: FFDB
2000ABE4: F7FF bl 2000AB20 --> i2c_tar'
2000ABE6: FF9C
2000ABE8: BD00 pop { pc }
ok.