DDA Lines with 1 Byte Error Accumulators

neilobremski · Mar 23, 2017

For the past couple of weeks I've had my head metaphorically underwater as I drown myself in writing hardcore assembly for the first-time in two CGA routines: line and filled-triangle drawing. Both of these are based on Bresenham's algorithms which are fundamentally referred to as DDA (digital differential analyzers) and use the concept of integer error accumulators in order to approximate a line via pixels. Consider these two lines:

View attachment 37253 <-- (0,0) to (7,7) . . . . . (0,2) to (7,5) --> View attachment 37254

The line on the left is a simple diagonal where each axis is stepped by one whole pixel from source to destination. However, while the X (horizontal) axis of the line on the right is stepped by a whole pixel each time, the Y (vertical) axis is half-stepped. If you could draw a smooth line over top of this you'd see that the pixel position is approximated and there is no such thing as a "half pixel".

How do we represent this "half-step" in code without real floating point values? You may have read this algorithm elsewhere but there's a simple explanation that I think is often overlooked: the decimal "partial" value is derived from a fraction (a division expression is a fraction) so you can store the numerator and denominator in place of the decimal and it works just as well.

The fraction itself is either X distance over Y distance or Y distance over X distance depending on which axis you decide to move along. That is, the denominator is the axis you're stepping evenly at each iteration and the numerator is the amount of the step. Many implementations will step along the axis which has the greatest distance.

X Distance = ABS(X2 - X1) + 1
Y Distance = ABS(Y2 - Y1) + 1

Let's look at our two example lines again. The one on the left has a horizontal distance of 8 and a vertical distance of 8 also. This means that the fraction for stepping in either direction is 8/8 which is why we step a whole pixel along both axes! Alright, but what of the one on the right? The horizontal distance is still 8 but the vertical distance is only 4. This means that if we move along the Y axis the fraction is 8/4 or two horizontal pixel steps per one whole vertical pixel step. But if we move along the X axis then the fraction becomes 4/8 or a half vertical pixel step per one whole horizontal pixel step. Converted to decimal this would be either 2.0 or 0.5 but it means the same thing and consequently using fractions can be more accurate. Ever tried representing 1/3 in decimal faithfully?

This brings me to my recent assembler forays wherein I'm implementing this algorithm using only registers and no memory access. I imagine the X86's registers as a very small deck of cards that I can put things in and manipulate but there are a severely limited amount of them. There are technically eight general purpose 16-bit registers (AX, BX, CX, DX, SI, DI, BP, SP) but: you really don't want to touch SP, BP can only be used if you're not planning on making local variables, and DI must point to the video memory for writing pixels. If we want to store a fraction, a color, and a few coordinates you're looking at six or seven registers, not to mention scratch registers for performing arithmetic!

To resolve the register shortage I limited my lines to single byte coordinates (0 - 255). This works for Magenta's Maze and my next game because the main rendering area is already only 256x190. But while this limitation frees up registers it also introduces a bit of complication, especially in the area of error accumulation.

At every axis step in the Bresenham algorithm, the same fraction is added to an error accumulator. Since the denominator is equal between the increment and the accumulator, there are three values: numerator, accumulated numerator, and denominator. For example, the fraction 4/8 in our example line is added at each whole step along the X axis. For the first iteration then, the error accumulator is 4/8 and then the second iteration it is 8/8 which causes a whole step and the denominator (8) is then subtracted leaving the accumulator at 0/8. If you follow this on paper, you'll see that that method of using a fraction will perfectly adjust the pixels for partial steps along a particular axis.

Okay, so what's the problem with using a single byte for an error accumulator? Imagine we have a line that goes from 0,0 to 249,99. The X Distance (horizontal axis difference) is 250 and the Y Distance is 100. This means that our fraction is 250/100 [SUP][1][/SUP] when moving along the vertical axis in whole steps, which is incidentally what I'm doing since video memory is organized into vertical scan-lines. While 250 and 100 are easily represented in a single byte each, what happens when you add this same fraction to the error accumulator?

1. 0/100 + 250/100 = 250/100
2. 250/100 - 100/100 = 150/100 (write pixel)
3. 150/100 - 100/100 = 50/100 (write pixel)
4. 50/100 + 250/100 = 300/100

You can certainly not fit 300 into a single byte.[SUP][2][/SUP] Thankfully this sort of overflow can be detected in X86 via the CF or Carry Flag. If CF is set then I know that the value would have been 1XX in hexadecimal or rather, whatever value is left in my byte is 256 less than what would have been if there was enough room.[SUP][3][/SUP].

What I do in this case is use a JC (Jump on Carry) or JNC (Jump on No Carry) to detect the issue. Then I start a temporary error accumulator at 0 [SUP][4][/SUP] and "drain" it by subtracting the denominator over and over, increasing my whole step along that (horizontal) axis each time, until the denominator is greater. Finally, I add the left-over plus one to the error accumulator. This leaves me with an error accumulator that fits within a byte!

Let's follow along with the 250/100 example in DEBUG. Assume AH/AL is the fractional error accumulator, BH is the numerator increment, and BL is the whole number of steps to take:

Code:

0100 B86432	MOV     AX,3264	; Err = 50/100
0103 BB00FA	MOV     BX,FA00	; Inc = 250, Pix = 0
0106 00FC	ADD     AH,BH	; 50/100 + 250/100
0108 730C	JAE     0116	; JAE same as JNC (Jump if No Carry)
010A B500	MOV     CH,00	; TempErr = 0
010C FEC3	INC     BL	; Pix += 1
010E 28C5	SUB     CH,AL	; TempErr -= Err.Denominator
0110 38C5	CMP     CH,AL	;
0112 73F8	JAE     010C	; IF (TempErr >= Err.Denominator) GOTO 010C
0114 00EC	ADD     AH,CH	; Err.Numerator += TempErr
0116 38C4	CMP     AH,AL	;
0118 7206	JB      0120	; IF (Err.Numerator < Err.Denominator) GOTO 0120
011A FEC3	INC     BL	; Pix += 1
011C 28C4	SUB     AH,AL	; Err.Numerator -= Err.Denominator
011E EBF6	JMP     0116	; GOTO 0116
0120 CC  	INT     3	; DONE!

If you run this as is you'll note that BL ends up 3 which is correct: 300/100 = 3 pixel steps! Voila! This works with rather large and variable differences too and all within a single 8-bit error accumulator, assuming you have a scratch register to use (CH in this case).

I'm thinking I could get away without a scratch register if I subtract from the error accumulator until it carries but I have yet to try that.

Update!

I couldn't stop thinking about getting rid of that scratch register and I came up with a shorter solution that makes better use of CF. Check it out:

Code:

0100 B86432	MOV     AX,3264	; Err = 50/100
0103 BB00FA	MOV     BX,FA00	; Inc = 250, Pix = 0
0106 00FC	ADD     AH,BH	; 50/100 + 250/100
0108 730C	JAE     0110	; JAE same as JNC (Jump if No Carry)
010A FEC3	INC     BL	; Pix += 1
010C 28C4	SUB     AH,AL	; Err.Numerator -= Err.Denominator
010E 73FA	JAE     010A	; IF (Err.Numerator > 0xFF) GOTO 010A
0110 FEC3	INC	BL	; Pix += 1
0112 28C4	SUB	AH, AL	; Err.Numerator -= Err.Denominator
0114 73FA	JAE	0110	; IF (Err.Numerator > 0) GOTO 0110
0116 00C4	ADD	AH, AL	; Err.Numerator += Err.Denominator
0118 FECB	DEC	BL	; Pix -= 1
011A CC		INT     3	; DONE!

Footnotes:

[SUP][1][/SUP] It's not easy nor fast to simplify fractions in code but that would certainly save me from some headache if I could.

[SUP][2][/SUP] You could use a 16-bit value for the error accumulator but this is also complicated since X86 assembler doesn't allow the mixing of 8-bit and 16-bit operands. Thus you'd either have to store everything in 16-bit or do some clever masking tricks ... which might be faster than what I'm doing!

[SUP][3][/SUP] The third digit in a hexadecimal number represents a multiple of decimal 256, just as this represents a multiple of 100 if it were a decimal number. That is, the "hundredths" place in hexadecimal is the "256ths place". And since the carry can only ever be 1, we know it's 256 x 1 or just 256.

[SUP][4][/SUP] Starting at zero is the same as starting at 256 which is, as you'll recall, how much overflow we're dealing with. This works only as long as you subtract first and compare later!

DDA Lines with 1 Byte Error Accumulators

neilobremski

Experienced Member