Calculating Pi using Leibniz' Formula has been great fun for me to practice writing in various languages. I first wrote it in Python since I needed something nontrivial to do in my Visual Basic class in high school. It was only able to do about 400,000 iterations in an hour and a half. I'm pretty sure my x86 assembly version is a little faster. In between then I've written it in C, D, FORTRAN, Python, Lua, and Ruby. As I've learned more, I've since programmed it using SSE3 instructions and later in DEC Alpha assembly.

## C

```#include <stdio.h>

int main() {
double sum = 1.0;
double denominator = 3.0;
double numerator = 1.0;
double temp;
unsigned long counter = 0;

while (1) {
numerator = -numerator;
temp = numerator;
temp /= denominator;
sum += temp;
denominator += 2;
counter++;
if ( counter > 10000000 ) {
printf("1/%.0f - %19.17f\n", denominator, sum);
counter = 0;
}
}
return 0;
}```

Here's a version that uses threading and SDL for output.

## x86 Assembly using the FPU (FASM syntax)

```start:
finit
fild [two]
fild [one]
fild [one]
fild [one]

; align top of the loop to cache line size
top:
fxch st(1)
fchs
fld st0
fdiv st0, st(2)
fstp st0
fxch st(1)

; unroll loop as needed

jmp top

ret

one dd 1
two dd 2```

## x86-64 Assembly using SSE3 (FASM Syntax)

The heading says x86-64, but it could easily be modified to run on x86. It computes two terms in parallel using double precision.

```format ELF64

extrn	printf
public	main

section '.text' executable align 64

main:
push	rbp
mov	rbp,rsp
movdqa	xmm6,dqword[denom]
movdqa	xmm2,dqword[numer]
movdqa	xmm4,xmm2
movdqa	xmm5,dqword[zero]
mov	r12,10000000

align 64
top:
divpd	xmm2,xmm6
movdqa	xmm2,xmm4

; unroll loop as needed

sub	r12,1
jnz	top

mov	r12,10000000
movdqa	xmm1,xmm5
movdqa	xmm0,xmm6

mov	eax,2
mov	edi,msg
call	printf

jmp	top

leave

ret

section '.data' writable align 16

denom:	dq	1.0,3.0
numer:	dq	4.0,-4.0
zero:	dq	0.0,0.0

msg	db	"1/%.0f - %50.48f",0xA,0```

## DEC Alpha Assembly (GAS Syntax)

The following code was written and tested in order to study RISC architectures on my Alphaserver DS20L.

```.data
F_ONE:	.t_floating	1.0
PRINT:	.ascii		"1/%.0f - %19.17f\n"

.text
.align	4
.set	noreorder
.arch	ev6
.globl	main
.ent	main

main:	ldgp	\$gp,0(\$27)
stq	\$26,0(\$sp)

ldt	\$f2,F_ONE		# \$f2 num = 1.0
ldq	\$10,COUNT		# \$10 = counter
cpyse	\$f2,\$f2,\$f3		# \$f3 sum = 1.0
addt	\$f2,\$f2,\$f4		# \$f4 incrementor = 2.0
ldq	\$9,ONE			# \$9 = 1
mov	\$10,\$2			# load counter into \$2
addt	\$f4,\$f3,\$f5		# \$f5 denom = 3.0

.align	4
loop:	cpysn	\$f2,\$f2,\$f2		# num = -num
subq	\$2,\$9,\$2		# counter -= 1
divt	\$f2,\$f5,\$f10		# temp = num / denom
addt	\$f5,\$f4,\$f5		# denom += 2.0
addt	\$f3,\$f10,\$f3		# sum += temp

bne	\$2,loop			# if counter == 0, loop

lda	\$16,PRINT		# load format string
cpyse	\$f5,\$f5,\$f17		# \$f17 = denom
cpyse	\$f3,\$f3,\$f18		# \$f18 = sum
jsr	\$26,printf		# call printf
ldgp	\$gp,0(\$26)		# reload global pointer

mov	\$10,\$2			# reset counter