From what I can tell almost no card except the IBM original is actually a 'Standard' VGA, so just because a mode works on one card which isn't a 'super' VGA doesn't mean its possible on any non-'super' card. From what I can tell 'super' doesn't actually mean anything unless it supports the VESA standard.
FWIW, this wasn't a problem that started with VGA. Many third-party EGA cards also support oddball modes that a true-blue IBM card won't. Run the setup program for an old version of "PC Paintbrush" sometime and marvel at just how much diversity there was. And of course none of them were mutually compatible in their extensions. Something like VESA standardization was needed for years before it came along. (And even longer before it really worked well.)
Nonetheless, I will try out CompuShow, but I suspect it's using a special driver which is aware of the extra registers.
It's right in the documentation that it knows about OAK VGA cards.
Like Resman says, the deal killer is a regular VGA card can't fetch 8 bit pixels at the same rate it can 4 bit ones. The RAM on a standard VGA card is 32 bits wide. (Look at the card and you'll see its 256k of RAM will be in the form of four 64kx8 banks, not a single 256kx8 bank.) In planar 16 color mode this means the hardware only has to fetch a byte from memory every
eight pixels. (You can have four 1-bit shift registers in parallel latching from each bank and collectively clocking four bits at a time for each tick of the pixel clock into the output DAC.) This means that even in the higher of the two pixel clocks the memory only needs to run at an effective ~3.5 mhz to supply the ~14Mbyte/second worst-case bandwidth needed for pixel painting. 256 color modes shuffle the deck so each fetch gives you four pixels for each memory fetch instead of eight but that's fine as long as there's half as many pixels. But, yeah, your pixel clock is effectively halved which means you'll be limited to modes you can render with clocks of ~12.5 or ~14mhz.
What you're trying to do is get 256 color pixels at the same pixel clock as 16 color pixels, which means you're doubling the bandwidth you're asking from memory. By modern standards expecting sustained 25Mbyte or 28Mbyte/s performance from a 32 bit wide memory array doesn't sound like too much to ask, but in 1987 that was actually a little bit ambitious for consumer-grade hardware. (By IBM's standards, anyway. Something to remember is that IBM was
never ambitious when it came to the PC line, the hardware engineering was always conservative for the time.) But more to the point, the original IBM VGA wasn't built to do that so far as I know. I assume clones that could do 256 color modes at the same resolutions as 16 color could?
Trixter's workaround, as he explains, was to reduce the frames per second so a 14mhz pixel clock paints a 640x400 screen thirty-something times a second instead of 60. This would melt an IBM 8513 into slag. (Or, hopefully, just make it not sync.)