Chapter 14.1: Rocks -- Bit Multiply
Julia caught my eye with a puzzled look.
"Gotta question?"
She nodded. "The 6805 doesn't have a multiply instruction."
"True."
"Neither does the 6800."
"Right."
"Tiny BASIC and Forth don't multiply 12 by 10 by adding twelve up ten times, do they?"
"No, ..."
"Does it have something to do with that bit multiply you were talking about?"
I looked around at the group of engineers, managers, legal staff, and students, and asked, "Can we take a detour?"
Bill leaned back with an expectant smile, his hands behind his head, and nodded.
Motorola's Bob said, "Go ahead."
"Can I borrow that Forth listing back?"
Bill picked it up and handed it to me without comment.
I opened it up and leafed through it until I found the USTARS routine on page 17 (SCR 23). I read through the routine, thought for a moment, then set it back down, reaching in my shirt pocket for a pencil that wasn't there.
Julia turned her notepad to a blank page and handed the notepad and her pencil to me.
"Thanks. Say we want to multiply two eight-bit numbers. I'm going to arbitrarily pick eleven and five." I wrote the two numbers down in binary and decimal, vertically, for multiplying by hand, then proceeded to work the product out. "I'll use the method we usually use for decimal, multiplying the multiplicand on top by each column of the multiplier on bottom, and adding them up:
00001011 => 11 (8+2+1)
x 00000101 => 5 (4+1)
-- -------- -------- ---
c
00001011 => 11
0 00000000 => 0
00 00101100 => 44 (32+8+4)
(Abbreviating the zeroes.)
-- -------- -------- ---
0 00000000 00110111 => 55 (32+16+4+2+1)
Julia held her hand out and I gave her back her notepad and pencil.
She proceeded to write out a decimal product:
5678
x 4321
---------
5678
113560
1703400
22712000
---------
24534638
Mike grumbled, "Grade school."
Julia gave him a glare. "Looking at the fundamentals so I can understand what the computer has to do, Mike!"
He shrugged.
She turned back to me. "Okay, I think I can see how you're doing basically the same thing both ways. So multiplying each column in binary is what you are calling a bit multiply?"
"Sort-of. Maybe. Perhaps more a bit multiply-and-accumulate instruction."
She shook her head with a blank look and handed me back her notepad and pencil.
"Hmm. Let's look at how the Forth multiply routine works. It says it multiplies the top two 16-bit words on the stack, and puts the low 16 bits of the result back on the stack, keeping the high 16 bits in A and B." I picked the Forth Listing back up and copied the routine out, modifying the comments:
USTARS
LDA A #16 ; bits in a word (two bytes)
PSH A ; counter in temporary on stack
CLR A ; ready to accumulate the product
CLR B ; clears carry
TSX ; point X to the parameter stack
USTAR2 ; top of loop
ROR 5,X ; shift multiplier, pull last carry in from result
ROR 6,X ; leaves low bit of multiplier in carry
DEC 0,X ; count down -- leaves carry alone
BMI USTAR4 ; counted out?
BCC USTAR3 ; skip add if low bit is 0
ADD B 4,X ; low bit is 1, add
ADC A 3,X ; now carry is carry from add
USTAR3
ROR A ; shifts carry from add into result
ROR B ; shifts low bit of accumulator into carry
BRA USTAR2 ; next bit
USTAR4 ; counted out
INS ; remove counter from stack
RTS
Julia shook her head.
I grinned. "Yeah, it looks like it's out of phase with itself, but that's because it's reusing the multiplier variable to pick up the low bits of the result. Saves space on stack and saves some shifting instructions."
"Out of phase?" Now she gave me a moue.
"Like the loop starts part-way through, because it kind of does. Hmm. Let's write a loop that would look more like what we do on paper." I stopped to think, then started to write and erase and rewrite.
"To avoid confusion, let's not use any tricks. In fact, let's not use the S stack for parameters, either. But there will be one sort-of-trick, shifting the value down in the accumulator is the same as shifting the calculation window up."
"Oh-kay ..." But she looked even more perplexed.
Here's what I ended up with:
* Multiplicand at 2,X:3,X
* Multiplier at 0,X:1,X
USTAR
LDX PSP ; parameter stack
DEX ; allocate work space
DEX
DEX
DEX
DEX
STX PSP ; just in case
* Multiplicand at 7,X:8,X
* Multiplier at 5,X:6,X
LDAA #15
STAA 4,X ; bit count
CLR 3,X
CLR 2,X ; result low word
CLR 1,X
CLR 0,X ; accumulator, result high word
USTARL
CLC ; known state
LDAA #1
BITA 6,X ; low bit of multiplier
BEQ USTARN
LDAB 1,X
ADDB 8,X ; multiplicand low byte
STAB 1,X
LDAB 0,X
ADCB 7,X ; multiplicand high byte
STAB 0,X
USTARN
DEC 4,X
BMI USTARD
* Relativity --
* shifting the contents right is same as
* shifting the calculation window left.
ROR 0,X ; moves carry into accumulator
ROR 1,X
ROR 2,X ; shift into low word
ROR 3,X
LSR 5,X ; shift multiplier down
ROR 6,X ; for next bit test (relative shift)
BRA USTARL
USTARD
LDAA #3 ; 4 bytes to copy
USTARC ; copy result back into stack
LDAB 0,X
STAB 5,X
INX
DECA
BPL USTARC
INX ; drop count from stack
STX PSP ; restore parameter stack pointer
RTS
She looked the code over. "Do you have to save and load accumulator B every time through? Nothing else seems to be happening to it and it would save instructions and time."
"Nope. True. And yep. It would."
"And," she paused, "could you use an ANDA instead of a BITA to test the low bit of the multiplier, since you reload it each time through?"
"Sure. Or, if we had another 16-bit accumulator to hold it, we could use the bit test instruction and rotate the bit to test instead of rotating the multiplier down."
"Hmmm. In the Forth code, shifting the multiplier right and testing the carry is another way of testing the lowest bit, isn't it?"
"That's right, and it does save a few more instructions."
"On the 6805, you could use a branch if set instruction, couldn't you?"
"I was afraid you were going to ask that."
"It wouldn't work?"
"Well, the multiplier has to be addressed as a direct page variable if you use the BRSET or BRCLR instruction. I assume you would use BRCLR. Other than that, it should work."
She thought for a minute. "Okay, so using global parameters instead of passing them on the stack, ...," and started writing.
ORG $80
MCAND RMB 2
MPLIER RMB 2
BITCT RMB 1
ACCM RMB 4
*
ORG $200
USTAR
LDA #15
STA BITCT
CLR ACCM
CLR ACCM+1
CLR ACCM+2
CLR ACCM+3
USTARL
CLC ; known state
BRCLR #0,MPLIER,USTARN
LDA ACCM+1
ADD MCAND+1
STA ACCM+1
LDA ACCM
ADC MCAND
STA ACCM
USTARN DEC BITCT
BMI USTARD
ROR MCAND
ROR MCAND+1
ROR MCAND+2
ROR MCAND+3
LSR MPLIER
ROR MPLIER+1
BRA USTARLUSTARD
RTS
"And an 8-bit multiply probably wouldn't have to save and load the accumulator?"
"I think it wouldn't. This code looks good, let's test it later."
She thought some more and started writing again.
USTAR LDX PSP ; parameter stack
DEX ; allocate work space
DEX DEX
* Multiplicand at 5,X:6,X* Multiplier at 3,X:4,X
LDAA #15
STAA 2,X ; bit count
LDD #0
STD 2,X ; result low word
STD 0,X ; accumulator, result high word
USTARL
LSR MPLIER
ROR MPLIER+1
BCC USTARN
BEQ USTARN
ADDD 7,X
USTARN DEC 4,X
BMI USTARD
RORA ; moves carry into accumulator RORB
ROR 0,X ; shift into low word
ROR 1,X
BRA USTARLUSTARD
STD 3,X
LDD 0,X
STD 5,X
INX
INX
INX ; drop count from stack
RTS
"Did I get that right for the 6801?"
"I think so. Pretty close if not."
"So would an instruction that adds the source to the accumulator if the carry is set be your bit multiply?"
"Yes, but. Part of the reason I want the bit multiply is to save instructions extending the multiply to 32 bits or more. But if the bit multiply is add if carry set, I'm pretty sure you'll have conflicting uses of the carry."
"Oh. Adding the multiplicand is going to give us the carry from the addition, and overwrite the state of bit 0 that we put in the carry flag." She paused to think. So it would only work once."
"Myep. So it looks like the bit multiply would need to be based on the branch if bit 0 clear or bit test immediate 1 instruction, instead of testing the carry. And we'd want to be able to specify both the multiplier and multiplicand independently, too, if it's going to be extendable."
"And all it'd replace is just the branch and the add, so it would really only speed things up a little."
"Maybe so. If it's going to speed things up significantly, it also has to shift the multiplier and the accumulator. And the carry out of the bottom bit of the accumulator has to go somewhere, so there's a third memory argument. And shifting into low-order fields after the low-order fields have already been added and shifted is going to double-shift the low-order fields. The bits have to be fed forward the full width of the multiply. Maybe trying to make a single bit multiply that can be extended as far as we want is just not going to work."
"What if you don't shift the accumulator fields in memory?"
"I think that's going to make it hard to do one bit at a time, because the whole reason multiplying a bit at a time works is that we're moving that window across the inner products."
Ms. Philips cleared her throat. "Bob, have these two just shared things that should have been trade secrets?"
I looked up. "I'm sure your engineers have been down this road before."
Greg nodded. "We have. I must admit, it took me longer, but I was working by myself and making sure I had enough notes to explain to my manager why not, if I couldn't get a circuit that would work."
"Gate counts, power dissipation, that kind of thing?"
"Right."
Jesse leaned forward with a grin. "Dang. And you two do this kind of thing for fun."
I grinned back.
Julia sighed. "He does." And she smirked. "Oh, it's kind of fun watching him go down the rabbit holes, and sometimes going down there with him."
That got more whistles and some "Hey, hey, hey!" comments.
[Backed up at https://joel-rees-economics.blogspot.com/2020/09/bk-33209-rocks-bit-multiply.html.]
No comments:
Post a Comment