• Please review our updated Terms and Rules here

Calling REP MOVSW in C

neilobremski

Experienced Member
Joined
Oct 9, 2016
Messages
55
Location
Seattle, USA
What I'm about to show you will only work in 16-bit real-mode x86 compilers. The only built-in CRT functions you have for copying data from one place to another are things like memcpy(). And the problem with this function is it doesn't work on far pointers [SUP]1[/SUP]. Even if you have one that does, it works on bytes and not words, so it will be slower on 8086-based CPU's that have a 16-bit bus.

At any rate, the best way to copy data into a video buffer is through REP MOVSW but this requires configuring the segment registers and the count register. You point DS:SI to the source memory, point ES:DI to the destination, and set the count of words (2 bytes) to copy in CX. One other important setup instruction is CLD which CLears the Direction flag.

So how to call REP MOVSW from C?

There is a way to set both segment and general purpose registers but it also means calling an interrupt: intdosx(). This of course means that you need an interrupt handler but we're in luck because there are several interrupt vectors reserved for user programs; the first is 0x60.

Alright so we need to put our code in an interrupt handler [SUP]2[/SUP] but we still don't know how to call the code! This is both easy and weird: put the machine code into an array and use the far address as the interrupt vector:

Code:
unsigned char far CLD_REP_MOVSW_IRET[4] = {
    0xFC, 0xF3, 0xA5, 0xCF };

None of these instructions reference a memory location which is why they can be located anywhere in the program and still work correctly [SUP]3[/SUP]. Another cool fact is that each of these instructions is a single byte so we have four bytes for four instructions; it's lovely symmetry!

Let's put all of this together:

Code:
#include <conio.h>
#include <dos.h>
#include <stdio.h>
#include <string.h>

#if defined(__POWERC)					/* Power C			*/
#include <malloc.h>
#define _dos_getvect getvect
#define _dos_setvect setvect
typedef void interrupt (far *p_interrupt)();

#elif defined(__TURBOC__)				/* Borland Turbo C	*/
#include <alloc.h>
#define _dos_getvect getvect
#define _dos_setvect setvect
#define _fmalloc farmalloc
#define _ffree farfree
typedef void interrupt (*p_interrupt) ();

#elif defined(MSDOS) && defined(M_I86)	/* Microsoft C		*/
#include <malloc.h>
typedef void (_CDECL interrupt far * _CDECL p_interrupt)();

#else									/* Unknown Compiler	*/
#error Unsupported compiler for FWMEMCPY

#endif

#ifndef MK_FP
#define MK_FP(seg,ofs)	((void far *) \
	(((unsigned long)(seg) << 16) | (unsigned short)(ofs)))
#endif

void fwmemcpy(void far *dst, void far *src, unsigned int count)
{
	static unsigned char CLD_REP_MOVSW_IRET[4] = {
		0xFC, 0xF3, 0xA5, 0xCF };
	p_interrupt oldvect = _dos_getvect(0x60);
	p_interrupt newvect;
	union REGS regs;
	struct SREGS segregs;

	segread(&segregs);
	newvect = MK_FP(segregs.ds, (unsigned short)CLD_REP_MOVSW_IRET);

	segregs.ds = FP_SEG(src); regs.x.si = FP_OFF(src);
	segregs.es = FP_SEG(dst); regs.x.di = FP_OFF(dst);
	regs.x.cx = count;

	_dos_setvect(0x60, newvect);
	int86x(0x60, &regs, &regs, &segregs);
	_dos_setvect(0x60, oldvect);
}

void main(void)
{
	unsigned short far *dbuf = (unsigned short far *)_fmalloc(6);
	dbuf[0] = (short)'H' | (0x7 << 8);
	dbuf[1] = (short)'i' | (0x7 << 8);
	dbuf[2] = (short)'!' | (0x7 << 8);

	fwmemcpy((void far*)0xB8000000, dbuf, 3);

	_ffree(dbuf);
}

The results of this are "Hi!" displayed in the upper-left of the screen [SUP]4[/SUP].


Note that I tried to do the same thing by putting a call to memcpy() in the interrupt handler and calling it via the offsets of global pointers:

Code:
void interrupt far copier(void)
{
	memcpy((void*)FP_OFF(dst), (void*)FP_OFF(src), count);
}

However, this didn't work because often enough the offset is zero and C will burp up a null-pointer run-time error!

Footnotes:
[SUP]1[/SUP] Many compilers do have fmemcpy() or farmemcpy() but not Microsoft C 5.10 which is what I'm using here.
[SUP]2[/SUP] Calling an interrupt wastes 50+ clock cycles by itself.
[SUP]3[/SUP] This is not true in protected mode where segments must be declared as either code or data.
[SUP]4[/SUP] The example program assumes you are in a VGA text mode and your video refresh buffer is in segment 0xB800.
 
Back
Top