Debugging the NXT startup: a binary printf()

Debugging a NXT that crashes during the bootup sequence is hard. Before the main AVR link comes up, there is no way to even get any sound. I’ve already done debugging by sound: during the early stages of NxOS a couple of years back, I would debug by playing bytes I wanted to check as morse-code-like dits and daas, one bit at a time, over the brick’s speaker. It’s extremely basic, but it’s how I got the display driver to work.

But debugging a crash before the sound driver is in a working state is hard. You have a large binary black box. Either it boots and the sound driver works, in which case you don’t have a problem, or it doesn’t and you only get The Beep Of Death, the sound of the coprocessor periodically blipping the speaker to say “Your OS is screwed, I’m not playing any more”.

Just now, attempting to debug one such crash, I discovered something interesting. If I initialize the sound controller and start an infinite loop of playing a tone, for some reason the pitch of the Beep Of Death changes by a few kHz for 2 beeps, then returns to its regular pitch.

This gives me a more basic equivalent of the morse code byte “printer”: if the tone changes, I know that the brick booted at least up to the point of my infinite loop. If it doesn’t, I know it crashed before that point. It’s an audio diagnostic LED that tells me either “I managed to initialize the kernel up until this point”, or “Nope, the crash occurs before execution gets to the bruteforce sound loop”.

Therefore, by moving the sound loop around in the init code, I should be able to zero in on the exact crash site. The initialization black box is no longer completely black. A little information leaks out. Instead of “Everything works/doesn’t work”, I now have “Everything works/doesn’t work up to the following intermediate point of my choosing”.

And, sometimes, when debugging embedded systems without proper hardware debugging hardware, that tiny insignificant diagnostic LED is the difference between hope and despair.

It’s alive!

Following the discovery of the “diagnostic LED” of my black box, it took mere minutes to home in on the bug and eradicate it.

What was the bug? Let’s just say that when you check, in the code of a driver, whether you properly told the power management driver to power up the chip you’re driving, it would be wise to also check the code of the power management driver to make sure the power-up code is right. Because a chip with no power ain’t gonna be driven nowhere.

In other news, powering up random peripherals unrelated to what you want to drive doesn’t work either. No, really.