There are many ROMs available that test an emulator for inaccuracies.
There is a substantial archive of test roms available at https://github.com/christopherpow/nes-test-roms
- NEStress partially tests PPU, CPU, and controller operation (old; some tests seem to always fail).
- Blargg's test ROMs partially test APU, misc PPU behavior, sprite 0 hit, and MMC3 operation. Refer to PPU frame timing for new information that the PPU ROMs test.
- nestest fairly thoroughly tests CPU operation. This is the best test to start with when getting a CPU emulator working for the first time. Start execution at $C000 and compare execution with a log from Nintendulator, whose CPU works (apart from some details of the power-up state).
- instr_test tests official and unofficial CPU instructions and lists which ones failed. It will work even if emulator has no PPU and only supports NROM, writing a copy of output to $6000 (see readme). This more thoroughly tests instructions, but can't help you figure out what's wrong beyond what instruction(s) are failing, so it's better for testing mature CPU emulators.
- instr_misc tests some miscellaneous aspects of instructions, including behavior when 16-bit address wraps around, and dummy reads.
- instr_timing tests timing of all instructions, including unofficial ones, page-crossing, etc.
- cpu_interrupts_v2 tests the behavior and timing of CPU in the presence of interrupts, both IRQ and NMI; see CPU interrupts.
- cpu_reset tests CPU registers just after power and changes during reset, and that RAM isn't changed during reset.
- Sprite 0 Hit test ROMs.
- Misc PPU Tests.
- ppu_vbl_nmi tests the behavior and timing of the NTSC PPU's VBL flag, NMI enable, and NMI interrupt. Timing is tested to an accuracy of one PPU clock.
- PPU sprite overflow flag timing tests ($2002 bit 5), covering general operation, timing, and obscure pathological behavior (discussion).
- tvpassfail: NTSC color and NTSC/PAL pixel aspect ratio test ROM.
- apu_test tests many aspects of the APU that are visible to the CPU. Really obscure things are not tested here.
- apu_mixer verifies proper operation of the APU's sound channel mixer, including relative volumes of channels and non-linear mixing. recordings when run on NES are available for comparison, though the tests are made so that you don't really need these.
- apu_reset tests initial APU state at power, and the effect of reset.
- volume_tests plays tones on all the APU's channels to show their relative volumes at various settings of $4011. Package includes a recording from an NES's audio output for comparison.
- apu_sweep tests the sweep unit's add, subtract, overflow cutoff, and minimum period behaviors.
- mmc3_test tests the MMC3 scanline counter and IRQ generation, not much else currently.
- BNTest tests how many PRG banks are reachable in BxROM and AxROM.
- test28 tests the Action 53 mapper exhaustively.
- Holy Diver Batman by Tepples detects over a dozen mappers and verifies that all PRG ROM and CHR ROM banks are reachable, that PRG RAM and CHR RAM can be written and read back without error, and that nametable mirroring, IRQ, and WRAM protection work.
- FME-7 IRQ acknowledge test by Tepples checks some IRQ acknowledgment behiaviors of Sunsoft FME-7 that emulators were getting wrong in 2015.
It's best if your emulator can automatically run a suite of tests at the press of a button. This allows you to re-run them every time you make a change, without any effort. Automation can be difficult, because the emulator must be able to determine success/failure without your help.
The first part of automated testing is support for a "movie" or "demo", or a list of what buttons were pressed when. An emulator makes a movie by recording presses while the user is playing, and then it plays the movie by feeding the recorded presses back through the input system. This not only helps automated testing but also makes your emulator attractive to speedrunners.
To create a test case, record a movie of the player activating all tests in a ROM, take a screenshot of each result screen, and log the time and a hash of each screenshot. The simplest test ROMs won't require any button presses. ROMs that test more than one thing are more likely to require them, and an actual game will require a playthrough. Then to run a test case, play the movie in fast-forward (no delay between frames) and take screenshots at the same times. If a screenshot's hash differs from that of the corresponding screenshot from when the test case was created, make a note of this difference in the log. Then you can compare the emulator's output frame-by-frame to that of the previous release of your emulator running the same test case.