All code below is under the public domain.
Calculate Pi with Leibniz' Formula.
Sine
A very fast x86 assembly implementation of the sine function using the Taylor Series. The function can be easily modified for higher precision results in exchange for execution time.
According to the Taylor Series, sine is defined as:
sin: fld [x] ; x fmul st0,st0 ; x^2 fld [rf9] ; 1/9! x^2 fmul st0,st1 ; x^2/9! x^2 fsub [rf7] ; (-1/7! + x^2/9!) x^2 fmul st0,st1 ; (-x^2/7! + x^4/9!) x^2 fadd [rf5] ; (1/5! - x^2/7! + x^4/9!) x^2 fmul st0,st1 ; (x^2/5! - x^4/7! + x^6/9!) x^2 fsub [rf3] ; (-1/3! + x^2/5! - x^4/7! + x^6/9!) x^2 fmulp st1,st0 ; (-x^2/3! + x^4/5! - x^6/7! + x^8/9!) x^2 fmul [x] ; (-x^3/3! + x^5/5! - x^7/7! + x^9/9!) fadd [x] ; (x - x^3/3! + x^5/5! - x^7/7! + x^9/9!) fstp [x] ; ret ; reciprocals of factorials rf9 dq 2.7557319223985890651862166557528e-6 rf7 dq 0.0001984126984126984126984126984127 rf5 dq 0.0083333333333333333333333333333333 rf3 dq 0.16666666666666666666666666666667 x dq ?
Cosine
A very fast x86 assembly implementation of the cosine function using the Taylor Series. The function can be easily modified for higher precision results in exchange for execution time.
According to the Taylor Series, cosine is defined as:
cos: fld [x] ; x fmul st0,st0 ; x^2 fld [rf8] ; (1/8!) x^2 fmul st0,st1 ; (x^2/8!) x^2 fsub [rf6] ; (-1/6! + x^2/8!) x^2 fmul st0,st1 ; (-x^2/6! + x^4/8!) x^2 fadd [rf4] ; (1/4! - x^2/6! + x^4/8!) x^2 fmul st0,st1 ; (x^2/4! - x^4/6! + x^6/8!) x^2 fsub [rf2] ; (-1/2 + x^2/4! - x^4/6! + x^6/8!) x^2 fmulp st1,st0 ; (-x^2/2 + x^4/4! - x^6/6! + x^8/8!) fadd [one] ; (1 - x^2/2 + x^4/4! - x^6/6! + x^8/8!) fstp [x] ; ret ; reciprocals of factorials rf8 dq 2.4801587301587301587301587301587e-5 rf6 dq 0.0013888888888888888888888888888889 rf4 dq 0.041666666666666666666666666666667 rf2 dq 0.5 one dq 1.0 x dq ?
Absolute Value
My favorite way to take the absolute value of a general purpose register. x86-32 and x86-64 versions included. It uses 2 clock cycles on my Sempron 64.
x86-32:
; FASM syntax ; Input - eax ; Output - eax abs: cdq xor eax,edx sub eax,edx ret
x86-64:
; FASM syntax ; Input - rax ; Output - rax abs: cqo xor rax,rdx sub rax,rdx ret
Next Power of Two
Programming with SDL and OpenGL will sometimes require you to find the next power of two since OpenGL textures work most efficiently with widths and heights of a power of two.
Here's possibly the most efficient function I've seen to find the next power of two. It only uses fewer than 10 clock cycles on my Sempron 64.
; FASM syntax ; Input - eax ; Output - eax nextpow2: ; dec eax if you want the the next power ; of two of a power of two to be itself bsr ecx,eax mov eax,2 ; by replacing this with 'mov eax,1' ; this function will return the previous ; power of two jz .end shl eax,cl .end: ret
SDL Collision Detection
Here's a pretty nice function to detect if two SDL_Rects collide. It returns 1 on a collision, and 0 on no collision.
int Collide(const SDL_Rect * a, const SDL_Rect * b) { if ( b->x + b->w < a->x ) return 0; if ( b->x > a->x + a->w ) return 0; if ( b->y + b->h < a->y ) return 0; if ( b->y > a->y + a->h ) return 0; return 1; }
SDL_Surface to OpenGL Texture
I have a feeling this function is fairly crude, but I don't know how crude it actually is. I'm using it in glPong and it seems to work fine. It could probably use a slight clean up and some optimization though. It only supports power of two surfaces, but with the next power of two function above, a simple wrapper function should take care of the conversion of non-power of two (NPOT) surfaces.
GLuint SDL_GL_SurfaceToTexture(SDL_Surface * surface) { Uint32 rmask, gmask, bmask, amask; GLuint texture; GLenum format = GL_RGB; #if SDL_BYTEORDER == SDL_BIG_ENDIAN rmask = 0xff000000; gmask = 0x00ff0000; bmask = 0x0000ff00; amask = 0x000000ff; #else rmask = 0x000000ff; gmask = 0x0000ff00; bmask = 0x00ff0000; amask = 0xff000000; #endif if (surface->format->Rmask & rmask) { format = GL_RGB; if (surface->format->BitsPerPixel == 32) { format = GL_RGBA; } } else if (surface->format->Rmask & bmask) { format = GL_BGR; if (surface->format->BitsPerPixel == 32) { format = GL_BGRA; } } glGenTextures(1, &texture); glBindTexture(GL_TEXTURE_2D, texture); SDL_LockSurface(surface); glTexImage2D(GL_TEXTURE_2D, 0, surface->format->BytesPerPixel, surface->w, surface->h, 0, format, GL_UNSIGNED_BYTE, surface->pixels); SDL_UnlockSurface(surface); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); return texture; }