Separate documentation tends to get
- separated, as in not always readily accessible
(Murphy: when direly needed)
- out of sync - well, that's a problem even with in-line comments
→ document, in the code (for every part created for a separate reason)
- what it is good for
- where it got inspired (there may be easily accessible explanations enlightening to someone unfamiliar with the problem at hand or the approach used: even name dropping may help find such)
- what has been the incentive to write itbecome
- separated, as in not always readily accessible
(Murphy: when direly needed) - out of sync - well, that's a problem even with in-line comments
→ document, in the code (for every part created for a separate reason)- what it is good for
- where it got inspired (there may be easily accessible explanations enlightening to someone unfamiliar with the problem at hand or the approach used: even name dropping may help find such)
- what has been the incentive to write it
Adopting good practices running counter to adverse customs isn't easy and fast.
Much material about assembly programming is pre-1980ies1980s, when there was some reason to have short mnemonics for instructions and operands. (No matter pointing pen or finger at a (printed…if you were lucky) program listing: no pop-up. So better keep things all in one line…)
Please.
Please use telling names. Coding in assembly is no licence not to.
In a division implementation, I'd not imagine problems with R for remainder or Q for quotient. Resist any impulse to outsmart everyone with the likes of DVsor. N for numerator wouldn't be bad if talking about fractions, but if
N2 and N1 in addition - all three in H and L flavours - weren't bad enough, along comes ;NOTE: N2 <=255 so NxH = 0, also P < 2^16 so we can discard upper byte of DH * NxL.
;NOTE: N2 <=255 so NxH = 0, also P < 2^16 so we can discard upper byte of DH * NxL
P is mentioned in the ALGORITHM OVERVIEW.
In one comment, you switched from
sum = sum + term2
sum = sum + term2 + term3
to
sum = sum + term2
sum = term3 + sum + term2
(I'd prefer
sum += term2
sum += term3
sum = sum + term2 sum = sum + term2 + term3
to
sum = sum + term2 sum = term3 + sum + term2
Even then, anyway.)I'd prefer
sum += term2
sum += term3
I am looking for ways to reduce either
- the code size,
- lookup table size,
- or number of clock cycles
- the code size,
- lookup table size,
- or number of clock cycles
One source of inspiration on how to code integer arithmetic is libgcc:
A "non-performing" division would be slightly faster than a non-restoring one, but hardly faster than about 120 cycles.
Rather
Rather than trying to understand the algorithm you sketch in OVERVIEW and thinking up shortcuts myself, I scrutinised the code presented. Did you write it from scratch, or did you take some compiler output for inspiration?
Catching
Catching my eye:
- "register order" differs from the one implied by mul or the GCC calling convention, preventing the 1 cycle&word advantage each movw offers. As this is not included in The constraints, change either one.
- The critical ("normal"?) path turns out to be taken branches mostly. With AVR, non-taken branches are faster.
As it turns out, table access is on the critical path. While it would seem possible to save at least 127 bytes of R1H_TBL, it would cost speed.
- "register order" differs from the one implied by
mulor the GCC calling convention, preventing the 1 cycle&word advantage eachmovwoffers. As this is not included inThe constraints, change either one. - The critical ("normal"?) path turns out to be taken branches mostly. With AVR, non-taken branches are faster.
As it turns out, table access is on the critical path. While it would seem possible to save at least 127 bytes ofR1H_TBL, it would cost speed.