That seems to have turned out to be an urban legend. I got into this in a couple of questions and answers on the Retrocomputing SE where we discovered that the particular layout of the frame buffers just happens to do your DRAM refresh within the timings necessary, where in certain modes a linear frame buffer wouldn't do this. So it seems to be another case of an ingenious Woz design resulting in fairly large savings, rather than being penny-pinching.
Do you mean this? Because this does not support what you're saying here. In fact one of the posters seems to be pulling his hair out trying to debunk the notion that the strange memory map is a necessary consequence of the DRAM refresh. In fact, if we just stop and think about it for a minute it's painfully obvious that's not why it's all messed up.
In "Understanding the Apple II" the author discusses DRAM refresh, how it's vital that every ROW address be accessed at least once every two milliseconds, and how the Apple II does some weird cut-and-shuffling of how the address lines are set up on the address multiplexors to achieve this using the video timing as the source of entropy. So let's think about the reality of the situation here.
Let's start with what the Apple II actually has: The weird memory map, in both text and graphics modes, was because Woz wanted to use the *absolute minimum* of counter chips and latches for address generation of a 40 column screen. 40 is not a power of two. This is awkward if your display system has a text mode, because with a text mode you need to run through the same 40 bytes of memory for 8 consecutive scanlines to generate the characters, so you can't just use a braindead linear ripple counter that just gets 40 pulses each line to run through the display (like you technically could if you were just doing graphics), you need to be able to latch and repeat that same set of addresses for those 8 times before you let it increment.
Here is a fragment of the video address generation schematic for a Commodore PET showing a standard, pedantic way to do that. Woz apparently didn't want to have to implement the 13 bits (*) of counter and latch he'd need to implement this for the Apple II, so he went all hawg wild with his clever scheme that packs three non-contiguous lines into each 128 byte address block, "wasting" the last 8 locations, etc, etc, and, yeah, that scheme saved a few chips, at the cost of creating a weird doubly-interlaced memory map. Yay, Woz is clever genius, saved some chips...
... but how exactly was this a *good* thing for memory refresh? It's actually farging terrible, and that stackexchange thread explains why. Apple's scheme divided the screen vertically into three sectors in which the same 40 addresses (6 bits out of each 7 bits chunk) get replayed for 8 text mode lines, or 64 scanlines. Remember, we have to touch all of 7 bits worth of memory at least once in 2 milliseconds; 64 scanlines is about twice that, repeating the same 5 bits. So clearly we're badly screwed by this; we can't just use the low order address lines as our DRAM row inputs and expect this work, because this system is going to take essentially a whole frame to generate every bit combination. Thus the need to do the weird things described in Understanding the Apple II to give us some rolling values derived from somewhere else to provide a full set of refresh addresses.
By contrast, let's just pretend that instead of this bizarre setup it just had a pedantic counter/latch setup like the Commodore PET, with the low address lines used as ROW addresses on the DRAM chips. Sticking with character mode, with 8 line tall characters, without dedicated refresh cycles you're guarunteed to walk all the way through 7 bits of addressing every 25 scanlines. (You'll be hitting 120 of the addresses 8 times, and the last 8 once.) 25 scanlines is about 1.6 milliseconds. So, BOOM, you're guarunteed acceptable DRAM refresh rates *just* from the address generation circuitry cycling through valid active-area video addresses(*),
no weird shuffling of address lines needed, and with a sane linear memory map.
(* We will need to keep running dummy reads during the vertical blanking period, of course, since that's like a quarter of the 16.7ms frame time.)
I don't know how much the chips involved to do it cleanly and linearly cost in 1976, I was... a little young to be into that sort of thing, so I'm not going to second guess whether the tradeoff here was worth it, but please don't spread false information.
(*Edit: We could get by with the only the same number of bits the Commodore PET uses for the counter/latch system if we're willing to have interlaced
graphics memory; we could just use the character generator row lines for the top three address bits in graphics mode. I’m deeply puzzled by the claim that in “some modes” a linear framebuffer would somehow skip iterating through all possible 7 bit addresses in under 2ms; the text/interlaced graphics version with 10 bit addressing does it every 1.6ms, always, while a fully linear 13 bit address mode does it every 4 scanlines, or every quarter of a millisecond.)