Scanning large tables
Scanning tables larger than 256 bytes is more involved than scanning small tables, because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:
begin = ... ; address of first byte to clear
count = ... ; number of bytes
.zeropage
addr: .res 2
.code
lda #<begin
sta addr
lda #>begin
sta addr+1
lda #0
; X and Y form 16-bit index, with Y holding low byte
ldx #0
ldy #0
loop: ; Clear one byte
sta (addr),y
; Go to next byte
iny
bne nohigh
inx
inc addr+1
nohigh:
; See if low and high bytes of index match count
cpy #<count
bne loop
cpx #>count
bne loop
Using the same approach used in Efficient Forwards Scanning, we can make this more efficient.
First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends. If we wanted to clear just the last $103 bytes of this 64K block, we'd start with the values the clear would have when it only has $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE to addr+1. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:
lda #<begin
sta addr
lda #>(begin + $FE00)
sta addr+1
lda #0
ldx #$FE
ldy #$FD
loop: sta (addr),y
iny
bne nohigh
inx
inc addr+1
nohigh:
cpy #0
bne loop
cpx #0
bne loop
Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). Thus, we can use it to clear a block of $103 bytes anywhere in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes anywhere in memory:
lda #<(begin - $FD)
sta addr
lda #>(begin - $FD)
sta addr+1
lda #0
ldx #$FE
ldy #$FD
loop: sta (addr),y
iny
bne loop
inc addr+1
inx
bne loop
Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.
The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:
.code
lda #<(begin - <-count)
sta addr
lda #>(begin - <-count)
sta addr+1
lda #0
ldx #>-count
ldy #<-count
loop: sta (addr),y
iny
bne loop
inc addr+1
inx
bne loop
If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:
.zeropage
addr: .res 2
size: .res 2
.code
; Clears memory from addr through addr+size-1.
; Clears all memory if size=0.
; Adjust addr. Subtracting low byte of negated size is the same
; as adding the low byte of size and then a high byte of $FF,
; which greatly simplifies this calculation.
lda addr
clc
adc size
sta addr
lda addr+1
adc #$FF
sta addr+1
; Subtract size from 0 to negate it, and put that into X and Y
lda #0
sec
sbc size
tay
lda #0
sbc size+1
tax
lda #0
loop: sta (addr),y
iny
bne loop
inc addr+1
inx
bne loop
An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:
; Clears memory from addr through addr+size-1.
; Clears nothing if size=0.
lda #0
ldy #0
; Load page count and handle case where it's zero
ldx size+1
beq final
; Do full pages first
pages: sta (addr),y
iny
bne pages
inc addr+1
dex
bne pages
final: ldx size
beq done
ldy #0
loop: sta (addr),y
iny
dex
bne loop
done:
This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.