mattst88's code snippets

All code below is under the public domain.

Sine

A very fast x86 assembly implementation of the sine function using the Taylor Series. The function can be easily modified for higher precision results in exchange for execution time.

According to the Taylor Series, sine is defined as:

$\sin (x) = x - \frac{x^{3}}{3!} + \frac{x^{5}}{5!} - \frac{x^{7}}{7!} + \frac{x^{9}}{9!} ...$

sin:
	fld	[x]	; x
	fmul	st0,st0	; x^2
	fld	[rf9]	; 1/9! x^2
	fmul	st0,st1	; x^2/9! x^2
	fsub	[rf7]	; (-1/7! + x^2/9!) x^2
	fmul	st0,st1	; (-x^2/7! + x^4/9!) x^2
	fadd	[rf5]	; (1/5! - x^2/7! + x^4/9!) x^2
	fmul	st0,st1	; (x^2/5! - x^4/7! + x^6/9!) x^2
	fsub	[rf3]	; (-1/3! + x^2/5! - x^4/7! + x^6/9!) x^2
	fmulp	st1,st0	; (-x^2/3! + x^4/5! - x^6/7! + x^8/9!) x^2
	fmul	[x]	; (-x^3/3! + x^5/5! - x^7/7! + x^9/9!)
	fadd	[x]	; (x - x^3/3! + x^5/5! - x^7/7! + x^9/9!)
	fstp	[x]	;
	ret

; reciprocals of factorials
rf9	dq	2.7557319223985890651862166557528e-6
rf7	dq	0.0001984126984126984126984126984127
rf5	dq	0.0083333333333333333333333333333333
rf3	dq	0.16666666666666666666666666666667

x	dq	?

Cosine

A very fast x86 assembly implementation of the cosine function using the Taylor Series. The function can be easily modified for higher precision results in exchange for execution time.

According to the Taylor Series, cosine is defined as:

$\cos (x) = 1 - \frac{x^{2}}{2!} + \frac{x^{4}}{4!} - \frac{x^{6}}{6!} + \frac{x^{8}}{8!} ...$

cos:
	fld	[x]	; x
	fmul	st0,st0	; x^2
	fld	[rf8]	; (1/8!) x^2
	fmul	st0,st1	; (x^2/8!) x^2
	fsub	[rf6]	; (-1/6! + x^2/8!) x^2
	fmul	st0,st1	; (-x^2/6! + x^4/8!) x^2
	fadd	[rf4]	; (1/4! - x^2/6! + x^4/8!) x^2
	fmul	st0,st1	; (x^2/4! - x^4/6! + x^6/8!) x^2
	fsub	[rf2]	; (-1/2 + x^2/4! - x^4/6! + x^6/8!) x^2
	fmulp	st1,st0	; (-x^2/2 + x^4/4! - x^6/6! + x^8/8!)
	fadd	[one]	; (1 - x^2/2 + x^4/4! - x^6/6! + x^8/8!)
	fstp	[x]	; 
	ret
	
; reciprocals of factorials
rf8	dq	2.4801587301587301587301587301587e-5
rf6	dq	0.0013888888888888888888888888888889
rf4	dq	0.041666666666666666666666666666667
rf2	dq	0.5

one	dq	1.0

x	dq	?

Absolute Value

My favorite way to take the absolute value of a general purpose register. x86-32 and x86-64 versions included. It uses 2 clock cycles on my Sempron 64.

x86-32:

; FASM syntax
; Input - eax
; Output - eax
abs:
	cdq
	xor eax,edx
	sub eax,edx
	ret

x86-64:

; FASM syntax
; Input - rax
; Output - rax
abs:
	cqo
	xor rax,rdx
	sub rax,rdx
	ret

Next Power of Two

Programming with SDL and OpenGL will sometimes require you to find the next power of two since OpenGL textures work most efficiently with widths and heights of a power of two.

Here's possibly the most efficient function I've seen to find the next power of two. It only uses fewer than 10 clock cycles on my Sempron 64.

; FASM syntax
; Input - eax
; Output - eax
nextpow2:
		  ; dec eax if you want the the next power
		  ; of two of a power of two to be itself
	bsr ecx,eax
	mov eax,2 ; by replacing this with 'mov eax,1'
	          ; this function will return the previous
		  ; power of two
	jz .end
	shl eax,cl
	.end:
	ret

SDL Collision Detection

Here's a pretty nice function to detect if two SDL_Rects collide. It returns 1 on a collision, and 0 on no collision.

int Collide(const SDL_Rect * a, const SDL_Rect * b) {
	if ( b->x + b->w < a->x ) return 0;
	if ( b->x > a->x + a->w ) return 0;
	if ( b->y + b->h < a->y ) return 0;
	if ( b->y > a->y + a->h ) return 0;
	return 1;
}

SDL_Surface to OpenGL Texture

I have a feeling this function is fairly crude, but I don't know how crude it actually is. I'm using it in glPong and it seems to work fine. It could probably use a slight clean up and some optimization though. It only supports power of two surfaces, but with the next power of two function above, a simple wrapper function should take care of the conversion of non-power of two (NPOT) surfaces.

GLuint SDL_GL_SurfaceToTexture(SDL_Surface * surface) {
	Uint32 rmask, gmask, bmask, amask;
	GLuint texture;
	GLenum format = GL_RGB;

#if SDL_BYTEORDER == SDL_BIG_ENDIAN
	rmask = 0xff000000;
	gmask = 0x00ff0000;
	bmask = 0x0000ff00;
	amask = 0x000000ff;
#else
	rmask = 0x000000ff;
	gmask = 0x0000ff00;
	bmask = 0x00ff0000;
	amask = 0xff000000;
#endif
	
	if (surface->format->Rmask & rmask) {
		format = GL_RGB;
		if (surface->format->BitsPerPixel == 32) {
			format = GL_RGBA;
		}
	} else if (surface->format->Rmask & bmask) {
		format = GL_BGR;
		if (surface->format->BitsPerPixel == 32) {
			format = GL_BGRA;
		}
	}
	
	glGenTextures(1, &texture);
	glBindTexture(GL_TEXTURE_2D, texture);
	
	SDL_LockSurface(surface);
	glTexImage2D(GL_TEXTURE_2D, 0, surface->format->BytesPerPixel, surface->w,
                     surface->h, 0, format, GL_UNSIGNED_BYTE, surface->pixels);
	SDL_UnlockSurface(surface);
	
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
	
	return texture;
}