After doing a little reading about the way the memory banks are
mapped, it looks like this code is going to grow. Separate it into
it's own file.
While we're at it, make gb_mem_read() a proper function instead of a
callback. Because these functions are used so frequently, this
corresponds to a ~10-20% performance benefit (due to LTO).
Instead of calling cpu_cycle() and video_cycle() once per emulated
cycle, call cpu_cycle() once per emulated instruction. This should not
have any obvious effects on the emulation (as currently written),
because all of the memory reads and writes are done in the first
"cycle" of the instruction.
This patch results in a substantial performance gain (>100%, if I
recall correctly).
Previously, gbdb had to check several signals, including the current
PC, to determine when do_run() should stop. This code was hotpath and
unecessarily slow.
Instead, leverage a n undefined instruction (0xd3) as a breakpoint instruction.
When the CPU emulation encounters this instruction, it will call a callback
which is implemented by gbdb. This can set a simple flag which is less expensive to
query.
Finally, both the signal handler and the breakpoint callback set specific
"paused" flags and a generic "pause" flag. Now, do_run() can simply check
the generic "paused" flag, then use the specific flags to determine the
stop reason.
This change increased performance by ~10% on Raspberry Pi.