Modular Monitor: Interactive Debugging With The Non Maskable Interrupt

Convincing The 6502 To Debug Itself At The Push Of A Button

When you’re doing low-level debugging, there’s often a need to “look inside” the computer. Is this routine setting the wrong processor flags? Did that register load correctly? Has the stack overflowed? Your main debugging options are simulators or in-circuit emulators (ICE), neither of which are necessarily useful (or cheap).

Instead, we must turn to using the system to debug itself. Modular Monitor has the facilities to read arbitrary memory locations. I can use those features to type test programs and verify results afterwards. Very handy, and it has helped me track down tricky bugs in Modular Monitor itself. Yet there isn’t necessarily a good way of debugging a running program. If your program is stuck in a loop, it’d be real handy to know where it’s stuck. You can’t do that with the basic editing routines. Nor can you check the stack, flags, or values in particular registers.

The solution is to wire the 6502’s NMI signal to a “panic button”. Upon activation, the currently running program is forcefully paused. All CPU state is preserved, printed, and available for editing. When you’re done tweaking the system, you just hit continue to return to your regularly scheduled program.

Sounds easy- but it never is easy, now is it? Adding the “debugger” button requires a fairly detailed knowledge of how the 6502 handles interrupts. In particular you need to know how the stack works. Also how to make atomic semaphores using the 6502’s limited instructions. It’s not too difficult, but requires attention to detail.

Part of the Modular Monitor Series:

Modular Monitor: Serial Terminals, Hardware Bugs, and You
Modular Monitor: a Flexible ROM Monitor for the 6502
Modular Monitor: Dynamic Dispatch and Code Cleanup
Modular Monitor: Interactive Debugging With The Non Maskable Interrupt
Modular Monitor: Porting To The Single Board Development System
Modular Monitor: Operating System Call Table

Interrupts on The 6502

Debugging this way makes heavy use of the 6502’s interrupts, so it’s a good idea to explain the details before diving in. There are a few traps laid for the unwary.

Interrupts come in three flavours on the 6502: IRQ, NMI, and BRK. All of them have some common behavior:

Push PCH to the stack
Push PCL to the stack
Push P to the stack
Jump to the interrupt handler

RTI does the reverse:

Pull P off the stack
Pull PCL off the stack
Pull PCH off the stack
Resume execution

Regardless of which interrupt was triggered, they all do these steps. RTI is universal, and works with all interrupts.

Note that the 6502 stack grows downward. Pushing decrements SP; pulling increments SP. Critical knowledge if you want to access parameters on the stack. SP always points to the current entry. Push/pull instructions automatically pre-decrement/post-increment SP respectively.

Diagram of 6502 stack responding to interrupts and the RTI instruction.

Interrupts occur “between” instructions. You cannot stop an instruction from finishing by using an interrupt. This is important because it implies the 6502 will not safely recover from, say, a memory fault. The 65816 does have an interrupt (ABORT) that does this; I will not be covering that here.

Note that none of the interrupts save A, X, or Y. If you want to preserve them, you must do that manually. You will almost certainly have to, since the 6502 is a strict accumulator architecture where damn near every instruction clobbers A.

IRQ is simple enough that I can gloss over the basics of interrupt handling. I trust you either know that already or know how to find that information yourself. BRK has it’s own troubles, but for the time being, I have decided not to go down into the BRK rabbit hole. NMI is critical for this module, and has the highest potential for chaos. Let’s talk about why.

NMI, And Reentrance Safety

Normally when you trigger an interrupt, some kind of automatic masking is performed. CPUs vary, but they typically mask out at least the current interrupt. This ensures that interrupts do not interrupt themselves. Interrupts that interrupt themselves usually get into bad trouble. Ergo, the mandatory automatic masking.

NMI is literally named after the fact that is isn’t maskable. Both by CPU settings, and automatic protection. A valid NMI will always trigger, preempting both IRQ/BRK and NMI itself. If you aren’t careful, you can get into a lot of trouble here- all those assumptions you might have had about not being overruled in an interrupt are no longer valid.

Well, technically you can use an external circuit to mask the non-maskable.

I’ll assume your system doesn’t do that, because most don’t.

Prevention is clearly the best option. NMI is edge triggered, so sharing the line is difficult. That means NMI is typically only connected to one device. As long as that device is well-behaved, NMI should pose little trouble. If IRQ/BRK are being used alongside NMI, they should use their own unique variables, etc. NMI will interrupt IRQ/BRK, which can be used to implement two level interrupt priority.

When NMI is driven by a less well behaved device (e.g. a DEBUG pushbutton) that can cause multiple NMI‘s in a short period, you need to take greater care. The best option is to use a semaphore to track NMI frames, and ensure only one proceeds.

Semaphores on The 6502

All 6502s have atomic instructions that can change the value of a memory location without getting interrupted. As long as we can guarantee the CPU holds the bus throughout the entire instruction, of course- ML is provided on some 6502s to guard against this possibility. In a single CPU system, we only need to worry about interrupts.

INC/DEC execute a read-modify-write addition which can implement “fetch and add” type semaphores. Unfortunately they are not true “fetch and add” instructions since they do not return the original value that was modified. Both the N and Z flags are updated, so we can sneak a bit or two out. That’s all we need for a single lock.

“Atomic” in this sense is closely in line with the original meaning of the word: indivisible.

Atomic instructions cannot be interrupted part way. They are guaranteed to run to completion, or “fail safe” in a way that doesn’t change the overall system state.

;INC/DEC based semaphore for NMI
;-1 means double blocked, 0 means blocked, +1 means clear
NMI:
    DEC SEM_NMI  ;Set semaphore
    BMI EXIT     ;If semaphore is now -1 we tried to grab it twice
    ;NMI stuff here
EXIT:
    INC ;If the semaphore was 0, it goes back to +1. If the semaphore was -1, go back to zero.
    RTI

An alternative is the “test and set/reset” instructions. These behave similarly, only they use a bitwise AND function instead of addition/subtraction. They set the zero flag based on this AND, then immediately set/clear the bits in one uninterruptible instruction. Z in this case indicates if the tested byte was equal to the mask before updating the byte to reflect the mask. Multiple bits can be tested simultaneously this way.

Only some 6502s have these bit testing instructions (e.g. the W65C02), so it is unfortunately not a universal solution. I’m also not entirely convinced it’s better than the simpler “fetch-and-add” semaphore- more instructions are needed to get the same result. You can group multiple semaphores in one byte, so that’s something.

;TSB/TRB based semaphore for NMI. Not all 6502s support these instructions!
NMI:
    PHA         ;A gets clobbered no matter what you do
    LDA #MASK   ;Load up the appropriate bitmask for this semaphore
    TSB SEM_NMI ;Guaranteed to succeed
    BNE EXIT    ;Z=0 means semaphore was clear, Z=1 means we already set it
;NMI stuff here
    STZ SEM_NMI ;Clear the semaphore, in full or in part. STZ clears all bits, but does not clobber A and is slightly more efficient. TSR could be used to clear single bits
EXIT:
    PLA
    RTI

Note that I haven’t covered the possibility of double-clearing the INC/DEC semaphore. Usually this isn’t an issue. Releasing a semaphore does not have to be atomic, so adding a test is permissible. Besides, binary semaphores should be claimed by one, and only one, process. TSB/TRB doesn’t suffer from this problem.

There is a possibility of the INC/DEC based semaphore overflowing if one half of the routine is missed or duplicated. I don’t know if there’s an easy way to detect that; once again prevention is the easiest solution.

Using only a small tweak, the INC/DEC binary semaphore can be used as a counting semaphore. Simply initialize the semaphore with a value greater than 1. Now you can have multiple processes claim the semaphore simultaneously. Useful for tracking things like queues where multiple reader/writer processes are permissible. Up to 128 “slots” are available; more than most systems will likely need.

I’m pretty sure the 6502 has no “grace period” after an interrupt, so only the first instruction in the interrupt routine is guaranteed to execute (perhaps not even that). This complicates handling NMI in particular since it’s possible (if highly unlikely) to get multiple NMIs end-to-end. Both types of semaphore have the potential to get into an inconsistent state because of this. As is typical of engineering, we can tolerate an unlikely failure state so long as it fails in a way that is acceptable. Considering this is little more than a vanity project with little depending on it, I can accept the system occasionally crashing due to NMI troubles.

NMI Debugger Modules

With all that heavy background out of the way, we can proceed to the actual modules. There are three of them: REGSAVE, REGDUMP, and CONTINUE. The NMI handler requires some additions too. Oh, don’t forget about the debugging button itself!

All in all, it’s a few hours of work for a very useful addition to Modular Monitor. Well worth the effort.

The Debug Button

This module hinges on the idea that there is a literal, physical button connected to the NMI line. Buttons suffer from bounce, which is troublesome in the best of times. When connected to an edge sensitive line in particular, bounce can case all kinds of havoc.

I have a couple of spare ‘HC132 NAND gates on the minimal development board, so I quickly put together a debouncer:

Schematic of a simple debouncer using 132 Schmitt trigger NAND gate — A very basic debouncer circuit for the NMI debugging button.

Because this is a software-focused article, I’m going to leave things there. For more information on the art of debouncing, check this article out. Best one I’ve found by far.

NMI Handler

Previously the NMI handler merely pointed directly to an RTI instruction. Incidentally, you should always do this with unused interrupts. All it takes is a little electrical noise to give you some mysterious crashes. A single RTI is all it takes to fix them.

Obviously we now need to exploit NMI, so a new stub is written to implement the semaphore I went into so much detail describing. I chose the binary semaphore using INC/DEC because that preserves A which simplifies the REGSAVE/CONTINUE routines. Not by a lot mind you, but an easy optimization is always worth taking.

;NMI handler with reentrancy guard semaphore
NMI:
    DEC SEM_NMI  ;Try to set the semaphore
    BMI NMI_EXIT ;-1 means it was 0 before, so it's already held
    JMP REGSAVE  ;Fall through to our debug routine
NMI_EXIT:        ;Absolute label so we can also return from CONTINUE
    INC SEM_NMI  ;Clear the semaphore.
    RTI

While NMI_EXIT could theoretically go anywhere, from a maintenance perspective it should be very close to the actual NMI handler. Assembly programing is hard, particularly for the 6502, so even minor conveniences help a lot.

REGSAVE

Having confirmed we’re in an NMI state, we proceed directly to REGSAVE. Do not pass go, do not collect $0200. REGSAVE doesn’t actually do much, but what it does is critically important. This is where we store the processor state for later retrieval. I chose to stash the registers in dedicated memory locations, starting at $02C0. This address is not a guarantee, merely the result of my current setup.

Saving A/X/Y is trivial. Visible registers can be directly copied to their backup location. Getting the hidden registers is a bit trickier. You can’t just arbitrarily send them off to the memory location of your choosing; you have to take them off the stack first.

P/PCL/PCH are accessed using the PLA instruction to pull bytes off the stack. At this point, SP is now equal to what it was when the interrupt triggered. Put SP in X, and put it away.

;Save state after an NMI or BRK
REGSAVE:
    STA REGBACK    ;A goes first, in lowest slot...
    STX REGBACK+1  ;...followed by X...
    STY REGBACK+2  ;...then Y.
    PLA            ;SP is currently pointing to P
    STA REGBACK+3  ;So we store P...
    PLA            
    STA REGBACK+4  ;...then PCL...
    PLA
    STA REGBACK+5  ;...then PCH.
    TSX            ;At this point, SP points to the stack before we interrupted
    STX REGBACK+6  ;So we can save that information for later

;No JMP MAIN because this needs to fall through to REGDUMP

All six registers have now been dumped to memory, even the ones you can’t normally get access to, in a known location that shouldn’t cause trouble.

Note that DEBUG does not jump back to MAIN like all the other MM modules. That’s because it feeds directly into the next module: REGDUMP. REGSAVE should never get called except from an NMI. I mean, you could call it from somewhere else but you won’t accomplish much beyond fucking up the stack.

REGDUMP

Simply having the registers saved somewhere is not enough. We want to read them, after all. Yes, you could use READ to do exactly that- but this hinges on a lot of assumptions. You would need to know exactly where the saved registers are (not strictly guaranteed), what order they are in, and you’ll have to do some hexadecimal-to-binary in your head to get the flag bits.

Yeah, I’d much rather the 6502 did all the heavy lifting too. That’s why REGDUMP prints these values to the console, labeled and in human readable form.

I did some layout drawings, which led me to conclude I could get away with displaying three short lines. The first line has the PC and SP, then A/X/Y, then P. Displaying each value isn’t hard, it just involves a lot of JSR BIN2HEX calls.

;Display the registers saved by REGSAVE
REGDUMP:
	;An opening line break is important for readability
	LDA #LF
	STA SER_DAT
	
	;Start with the PC label
	LDA #'P'
	STA SER_DAT
	LDA #'C'
	STA SER_DAT
	LDA #':'
	STA SER_DAT

	;Actually get the PC value and print it
	LDA REGBACK+5 ;PCH goes first
	JSR BIN2HEX
	STX SER_DAT
	STA SER_DAT
	
	LDA REGBACK+4 ;Then PCL
	JSR BIN2HEX  
	STX SER_DAT
	STA SER_DAT

	;Add space for readability
	LDA #' '
	STA SER_DAT

;Repeat this same basic process for SP, A, X, Y. Add spaces and linebreaks as needed.
;Continue onto the special routine to handle P values

I quickly discovered my clever string building routines didn’t work due to addressing mode troubles, so I ended up simply dumping all my data direct to the output. It won’t work forever, but I can see how to make a more general solution when I need it.

P is handled in a special manner compared to the rest of the registers. Flags are binary values, so it’d be nice if they were displayed as such. Rather than use a letter-colon-number format like everything else, I chose to print the letters of each flag. Uppercase means bit set, lowercase means bit clear. A clever solution to the problem, which does not require too much work.

;Used to convert a bit in a memory location into a letter case
;Macro vastly simplifies printing P values!

.MACRO BIT2CASE addr
	.local SKIP             ;Label is strictly local to macro!
		ROL addr	;Shift left into carry
		BCS SKIP	;Bit set means do nothing
		ORA #%00100000  ;bit 6 sets ASCII lowercase
		SKIP:
.ENDMACRO
------------------------------------------------------------------------------------
;Continue from REGDUMP, after all "simple" registers have been printed.
;P is handled a little differently. We use capital letter as bit set, lower as clear
	LDA #'N'
	BIT2CASE REGBACK+3
	STA SER_DAT

	LDA #'V'
	BIT2CASE REGBACK+3
	STA SER_DAT

	ROL REGBACK+3   ;Bit 5 is unimplemented, so we just skip it

	LDA #'B'
	BIT2CASE REGBACK+3
	STA SER_DAT

	LDA #'D'
	BIT2CASE REGBACK+3
	STA SER_DAT

	LDA #'I'
	BIT2CASE REGBACK+3
	STA SER_DAT

	LDA #'Z'
	BIT2CASE REGBACK+3
	STA SER_DAT

	LDA #'C'
	BIT2CASE REGBACK+3
	STA SER_DAT

	ROL REGBACK+3  ;One last rotate to ensure stored P is aligned properly

	
	LDA #LF        ;Add line break for readability
	STA SER_DAT

	CLI            ;Re-enable interrupts in case we were called from IRQ/BRK
	JMP MAIN       ;Return to the MM main prompt

Note the use of ROL instructions. Rotate is non-destructive, which means we don’t clobber P during the display cycle. Thinking about it though, a simple BIT would have worked just as well, with fewer things to go wrong. The main issue of course being where to store partial results since BIT would require A. This technique saves a byte, which may or may not be worth it.

REGDUMP is a massive module, partly because I had to hack it together using inefficient code, but partly because it just has to do a lot of stuff. Even using some macros doesn’t help clean it up much. It’s a prime target for a future clean-up pass.

CONTINUE

The entire point of this exercise is to allow the user to break into a running program, edit/analyze it, then return to said program. Breaking into the program was relatively easy. Continuing it is almost as easy.

CONTINUE is very simple- it is literally REGSAVE in reverse. Just restore the registers as they were, then jump directly to NMI_EXIT. INC/DEC do not clobber any registers, so I don’t even need to account for updating the semaphore!

;Resume execution of an interrupted program
CONTINUE:
    LDA #$FF
    CMP SEM_NMI   ;Check if the semaphore is properly set
    BEQ @SKIP     ;Any nonzero value means trouble
    LDA #<NONMI   ;Load up the "can't continue" error message...
    LDX #>NONMI
    JSR SENDSTR   ;...print it...
    JMP MAIN      ;...and return to MAIN
@SKIP:            ;This is where the magic happens
    SEI           ;Just to make absolutely sure an IRQ doesn't break something. NMI should be blocked by the semaphore
    LDX REGBACK+6 ;Restore SP first, because we may have touched the stack during our journey
    TXS
    LDA REGBACK+5 ;Push PCH...
    PHA
    LDA REGBACK+4 ;...PCL...
    PHA
    LDA REGBACK+3 ;...ending with P
    PHA
    LDY REGBACK+2 ;Restore Y...
    LDX REGBACK+1 ;...X...
    LDA REGBACK   ;...and A
    JMP NMI_EXIT  ;Exit the NMI frame

Note the semaphore must be zero for CONTINUE to proceed. Remember, +1 means the semaphore is clear, while -1 means it’s been double blocked. Only zero indicates it’s safe to proceed.

Testing the NMI Debugging Modules

This is by far the most complex 6502 software I have yet written. Lots to go wrong here. Doesn’t help that I rewrote lots of the underlying code to better exploit ca65’s various features. In a very out-of-character (yet still not unheard of) move, I did not do any testing along the way. EEPROMs can only be erased so many times after all.

You can imagine my surprise when things worked pretty well! That never seems to happen. Well, okay there were still some serious issues. Just not “everything is completely broken” kind of issues.

At first, I was having trouble printing the registers. Eventually I realized I had forgotten some ‘#’ operators, which meant I was accidentally loading a zero page location, not an immediate character. I also forgot to specify an address for one of the ROL instructions, causing an off-by one error in the flag readout.

Having solved the REGDUMP bugs, I moved onto why the NMI semaphore just didn’t work. Like, at all. While investigating the previous problem, I quickly ran across the issue: I was initializing the semaphore before clearing the zero page. Which would also clear the semaphore. So it would never actually allow NMI to proceed! Egg meet face. I also missed another ‘#’ in CONTINUE which would explain a few things.

Next, my hexadecimal output was backwards. I had simply forgotten the return order of BIN2HEX. This ended up requiring a total re-write due to addressing mode problems. I had to scrap my complex string building system for a hard-coded output to SER_DAT. You can’t store the index registers using absolute indexed, indirect, or indirect indexed addressing. Because BIN2HEX puts the least significant digit in A, I would have to use a temp variable to hold it while I stored the most significant digit. Too much faff for an already long module.

Demonstrating the Debugging Modules

This article has almost exclusively been text-based, so let’s get some images in to demonstrate how this system works.

First, hitting NMI will break into any program, including Modular Monitor itself. MM runs out of a very specific address range, so I can confirm the data in this screenshot is accurate:

Demonstration of NMI debugger displaying register state

$E047 is the address where Modular Monitor’s main loop begins. SP is still initialized at $FF, A is idling on the NEW_LIN flag, X holds half a string pointer, and Y is a string index. Everything is exactly as it should be.

By hitting ‘D’ on the MM prompt, you can display the stored registers at any time. Useful for when you want to edit them manually. There are no specific instructions for that; you have to WRITE to their location. I might add a specific interface for this down the line.

‘C’ runs CONTINUE, which returns to the previously running program. I feel this requires a more complex setup to demonstrate properly:

Demonstration of NMI debugger interrupting running program, then continuing running program

$80 $FE is 6502 for “branch to self”. A tight, infinite loop that will never end, not unlike a misbehaving program. Hitting the debug button shows that the registers are broadly as expected. Continuing returns to the loop, with no interruption to the underlying program.

Finishing Up

Being able to modify a program while it’s running is a tried and true debugging method. Dangerous, primitive, and crude, but it’s better than nothing at all. It’s particularly useful for checking programs that get stuck in loops. Unless you have a detailed simulator, there is no other way to get access to the flags. I know I’ve already had trouble keeping the various branches, flags, and the instructions that set them untangled.

I always try to plan projects in a way that lets me test something in a low-stakes situation first. Semaphores are crucial to a lot of advanced programming, so understanding how they can be implemented on the 6502 is a big step forward. Saving the state of a program, in a way that allows returning to it later, is the fundamental core of a multitasking operating system. What I’m building towards should be pretty obvious.

Occasionally CONTINUE seems to print the “Nothing to continue!” warning when it shouldn’t. I assume this is an artifact of Modular Monitor breaking into itself, since it doesn’t seem to happen when breaking into another program. The NMI semaphore is properly set, and properly cleared, so I’m willing to live with it.

Originally I was going to add support for using the BRK software interrupt to trigger the debugging routines. That’s why BRK is called out, despite not being used here. While not terribly complicated to add, this article is already quite long. Better to put the BRK in it’s own article where I can explain it’s… let’s call them “quirks”.

Working around the 6502’s awkward instruction set remains the most significant challenge. A recurring problem is that A is the only register that can use all the fancy-ass addressing modes. This becomes a problem when you’re trying to do things like build a string using arguments passed in A and X. I now have the unenviable task of rewriting all my little helper modules to put the first argument in A so I can then store X without backing up A first. Assuming that it’s even possible to re-write them like that.

Speaking of building strings, REGDUMP would be a lot smaller and simpler if I had used a generic string preloaded with empty values. This would be copied to a buffer, then overloaded with the calculated values. This requires a lot of extra work to get going, which is why I opted for hard coding the constants. It would be less work in the long run, so I’m going to have to switch over eventually.

For reasons I cannot adequately explain, my serial connection hangs occasionally. I cannot really confirm what is causing this problem. It usually goes away after I try to type something. I suspect it’s the UM245R going to sleep, or perhaps Tera Term doing the same. Nothing else I’ve plugged into Tera Term does that though, so I’m currently leaning towards the UM245R. Using it was a mistake. I have a better development board design that should hopefully be ready soon. PRE-PUBLISH UPDATE- Completely uninstalling FTDI’s “virtual com port” driver fixed this. The basic Windows driver is strictly superior to the dedicated one. Your guess: good as mine.

August is here, heralding the end of summer. As expected, many of the projects I was hoping to complete were only partially finished. That’s always the way it goes, unfortunately. 2023 being a particularly hot, smoky summer didn’t help.

I still plan to try getting Scribble functionally complete by the end of August, but certain issues with it’s mechanics mean I won’t be finished for at least another month. In the meantime, I’ve found the 6502 stuff to be oddly relaxing. I have plenty more to publish over the next month or two.

Some Disassembly Required