• Please review our updated Terms and Rules here

DDA Texture Mapping [Part 1]

neilobremski

Experienced Member
Joined
Oct 9, 2016
Messages
55
Location
Seattle, USA
I am learning and working on a quadrilateral DDA texture mapper for CGA that uses the 8x8 monochrome BIOS characters for texels, for use in an optimized version of Magenta's Maze. Reading Mats and Abrash's tutorials [SUP][1][/SUP] have given me new insights and I have fought not to scrap my triangle/line routines just yet until I make this next step.

Stepping is really the name of the game: adding the deltas found by linear interpolation. It has been difficult to integrate the term interpolate into my brain's common math glossary. I understood it at a basic level but never played with it enough for it to become an inherent capability. This is one of the things I have had to remedy while learning texture mapping. Interpolation is the key to quickly determining what texel to read at a given pixel location.

Today my goal is simply to draft an inner loop which reads texels and writes pixels across a single scanline. This assumes that interpolation has been done and the registers are used like so:

  • ES:DI = vidptr
  • DS = texture (8 bytes) starting at offset 0
  • AX/AH = T (texel)
  • AX/AL = P (pixel)
  • BX/BH = U (3:5 fixed point)
  • BX/BL = V (3:5 fixed point)
  • CX/CH = scratch
  • CX/CL = scratch
  • DX/DH = O (previous V)
  • DX/DL = C (4:4 foreground:background)
  • BP = Ustep,Vstep (3:5,3:5 fixed point)
  • SI = X1,X2

My target machine is a Tandy 1000 HX which runs on an 8088-based core with a puny 8-bit bus and 4 byte prefetch queue. My intention is therefore to limit memory access and instruction size as much as possible. As a sad side effect there is a lot of shifting and masking that is slower (for clocks) on newer processors. Just to give you an idea, a simple PUSH takes up to 15 cycles on the 8088 and only three on the 286.

Code:
MOV	DH, FF	; 800 force load of texel on first pixel
MOV	CX, SI	; 802
AND	CH, 3	; 804
JZ	1812	; 807 >TEX_PIX_LOOP
MOV AL, ES:[DI]	; 809
SHL	CH, 1	; 80C
MOV	CL, CH	; 80E
SHR	AL, CL	; 810 P = *(ES:DI) >> (X1 % 4) * 2
		;
MOV	CX, BX	; 812						:TEX_PIX_LOOP
ROL	CX, 1	; 814
ROL	CX, 1	; 816
ROL	CX, 1	; 818
AND	CH, 7	; 81A CH = V % 8
CMP	CH, DH	; 81D
JNE	185A	; 81F if (V != O) >TEX_PIX_LOAD
NOP		; 821
		;
AND	CL, 7	; 822 CL = U % 8				:TEX_PIX_TEXEL
INC	CL	; 825
MOV	CH, AH	; 827
ROL	CH, CL	; 829
MOV	CL, DL	; 82B
AND	CH, 1	; 82D
JZ	1836	; 830 if (!texel) >TEX_PIX_DRAW
ROL	CL, 1	; 832
ROL	CL, 1	; 834
		;
AND	CL, 03	; 836						:TEX_PIX_DRAW
SHL	AL, 1	; 839
SHL	AL, 1	; 83B P <<= 2
OR	AL, CL	; 83D P |= ((C <<ROL (((T <<ROL (U+1)) & 1) * 2)) & 3)
		;
MOV	CX, SI	; 83F						:TEX_PIX_MOVE
INC	CH	; 841
MOV	SI, CX	; 843
CMP	CH, CL	; 845
JAE	186A	; 847 >TEX_PIX_END
		;
MOV	CX, BP	; 849						:TEX_PIX_STEP
ADD	BH, CH	; 84B U += Ustep
ADD	BL, CL	; 84D V += Vstep
MOV	BX, CX	; 84F
		;
TEST	CH, 3	; 851						:TEX_PIX_BYTE
JNZ	1812	; 854 >TEX_PIX_LOOP
STOSB		; 856 *(ES:DI++) = P
JMP	1812	; 857 >TEX_PIX_LOOP
NOP		; 859
		;
MOV	DH, CL	; 85A						:TEX_PIX_LOAD
MOV	CL, CH	; 85C
XOR	CH, CH	; 85E
XCHG	BX, CX	; 860
MOV 	AH, [BX]; 862 T = *(DS:V)
XCHG	CX, BX	; 864
XCHG	DH, CL	; 866 O = V
JMP	1822	; 868 >TEX_PIX_TEXEL
		;
AND	CH, 3	; 86A						:TEX_PIX_END
JNZ	1886	; 86D >TEX_PIX_LAST
MOV AH, ES:[DI]	; 86F
MOV	CL, CH	; 872
SHL	CL, 1	; 874
MOV	CH, FF	; 876
SHR	CH, CL	; 878
AND	AH, CH	; 87A
NEG	CL	; 87C
ADD	CL, F8	; 87E
SHL	AL, CL	; 881 P <<= ((4 - (X1 % 4)) * 2)
OR	AL, AH	; 883 P |= *(ES:DI) & (0xFF >> (X1 % 4) * 2)
NOP		; 885
		;
STOSB		; 886						:TEX_PIX_LAST

The U and V texture coordinates are stored in register BX and their deltas (Ustep and Vstep) in BP. These are 3:5 fixed point numbers [SUP][2][/SUP] that when added together result in a step along the scanline within the texture map.

I'm hoping over time that I can optimize this with more tricks but first let me explain the pieces ...

First, the whole part of V (0 - 7) is put in CL and tested against a copy of the previous V (notated as O) in DH. If these are different then the texture byte containing the current row of texels must be loaded from memory. That is done more rarely considering the tiny texture size means that mostly this routine will be scaling up rather than down.

Given that V is merely used to determine the texture byte containing the current row, it is discarded afterwards to free up CX. The whole part of U (0 - 7) is put in CL and used to rotate a copy of the texture byte in CH so that the current texel is in bit 0, e.g. the first bit.

Now here I played around a lot with jumps and no-jumps, various ways to twiddle the bits. I ended up using a jump because from everything I can tell, it is less egregious on cycles on the fall through than my other methods. This means that textures will draw faster the more sparse they are; I believe it's the right thing to do for BIOS characters.

The texture colors are stored split in DL where the upper nibble is the foreground color and the lower is the background color. I'm only using the upper 2 and the lower 2 bits depending on whether a texel is lit, but this gives me some breathing room for switching to 16 colors later. Of course, by then I'll probably rewrite this dang thing from scratch.

Placing the new color bits is always done in the lower 2 bits of the pixel byte P after it has been shifted twice to the left. This lets the loop write in as many pixels as necessary and also necessitated using SI to store the X1,X2 pair where the former is incremented on each iteration. Whenever the last two bits of X1 are zero (X1 % 4), the current pixel byte P is written and the video pointer is incremented using STOSB.

Finally, the first and last CGA memory bytes of the scanline add some complication because they may only partially overlap existing pixels. So there's code in there to initialize AL for the first byte if not starting on a byte boundary. And if X1 ends within a pixel byte, then there's a lot of fun and funky shifting and masking to merge P with the existing pixels of that last byte.

Footnotes:

[SUP][1][/SUP]. Mats Byggmastar's FATMAP.TXT and Michael Abrash's "Pooh and the Space Station".

[SUP][2][/SUP]. This is a maximum fractional capability of 1/32 which is very low. Textures can thus not be enlarged more than 32x or shrunk smaller than 1/32 of their original size. The fractional bits also make steps a bit uneven which results in "hairiness" but this is okay given the context of the mapping (ASCII characters printed on 3D squares).
 
Before you got too far down the path of implementation, you may want to test your assumptions about fixed point precision. Error accumulates quite fast while running a fixed point DDA. As I'm not sure if you are allowing arbitrary texture mapping or only trying to simply scale a bitmap, it could affect the quality of the output significantly. I'd try it in an easy-to-test environment before spending a lot of time optimizing assembly code.
 
Back
Top