You see, the NI card is not nearly as interrupt driven as I had believed it to be.
CIO cards are driven by two interrupts that the 3B2 can trigger by reading specific memory addresses: INT0 and INT1. These map directly to the 80186's INT0 and INT1 lines.
The normal startup procedure for any CIO card after it has been pumped is for the 3B2 to send it an INT0 and then an INT1, in that order. The NI pump code at startup disables all interrupts except for INT0, then in the INT0 handler disables all interrupts except for INT1. When it does then receive the INT1, it performs a SYSGEN, and then immediately starts polling the two Packet Receive Queues, looking for available jobs.
Polling? Yes, polling! The main INT1 handler never returns (well, in truth it has a few terminal conditions that will send it into an infinite loop on a fatal failure, but that's neither here nor there), it just loops back around if there was nothing for it to do. When it actually does see a job available in either of the two packet receive queues, it will immediately grab it and cache it for later use if and when it receives a packet. It will cache up to three slots like this, and, like always, when it grabs a job off the queue, it increments the Unload Pointer.
[And, as an aside, how the heck do you tell it about work in the General Request Queue, then? This is actually a nifty trick. In the main loop, the card installs a different INT1 handler that does nothing but set a flag and then return. The main loop looks for this flag, and if it's set, it knows it has a job pending in the General Request Queue. Neat.]
The NI card is polling the Receive Queues from the very moment that it is initialized. The 3B2 kernel driver doesn't actually populate those queues with jobs for many milliseconds, so the NI card just loops, watching them.
This, then, was my breakthrough. I realized, hey, during initialization of the queue structures, the 3B2 has to do a lot of work for each slot. It does a lot of system calls. It allocates buffers, frees buffers, talks to the STREAMS subsystem, and so on. It takes a few milliseconds to build a queue entry, it's not instant.
What must be happening is that the card is polling the queue quite rapidly. It takes much less time for it to loop than it does to for the 3B2 to actually fill a slot. So, as soon as it sees a job available, it takes it and increments its unload pointer.
I hope you can see where this is going.
By the time the kernel driver has reached the last slot in the queue,
the card has already taken and cached three jobs for later use. It
has incremented its Unload Pointer to
0x24. So, on the last slot,
when the driver asks itself the question "If I were to increment my
Load Pointer, would it overwrite the Unload Pointer?", the answer is
now NO, and it can do so. It fills the slot and the load pointer wraps
around to slot 0. All 8 slots get populated, and everyone is happy.
Talk about a head smacker. I would never have guessed that the card races the queue initialization like this. It took me forever for me to prove it to myself.