11 January 2009 - The Case for Open Sourcing Alpha-Optimized Libraries

As long as I've been interested in Alpha hardware, I've been intrigued by Compaq's Alpha-optimized compilers and libraries. In some cases, the compilers produce code multiple times faster than by gcc. The math library, libcpml, contains functions that execute in half the time of their glibc equivalents. Since the abandonment of the Alpha platform, this code has languished. In some cases, the performance gap between Compaq's tools and their open source counterparts has shrunk. In others, the benefits of hand-tuned assembly still shine. This prompted me to contact HP and request the release of the code. They unfortunately concluded that an old MIPS license prevented them from releasing the compilers. I've recently contacted HP once again to persuade them to release libcpml and libots as free software, as libraries containing nothing but hand-tuned Alpha assembly could not be encumbered by this license. I also attached the following benchmarks as evidence of why this code is still valuable so many years after it was written.

Using a test suite I wrote, I benchmarked the implementation of math functions found in glibc with those in libcpml.

Function Library Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Average Speed Up (cpml over glibc)
sin glibc 311 292 312 310 297 304 307 300 312 312 305.11
sin cpml 156 155 148 150 152 154 152 151 151 163 152.89 99.56%
cos glibc 251 246 245 243 392 251 240 240 252 240 261
cos cpml 156 151 153 160 159 160 146 149 151 157 154 69.48%
tan glibc 9384 9345 9351 9252 9195 9273 9237 9272 9213 9239 9264.11
tan cpml 172 169 168 166 168 173 170 176 183 175 172 5286.11%
sinh glibc 305 296 296 302 291 295 437 295 294 298 311.56
sinh cpml 141 136 140 139 135 139 137 140 136 139 137.89 125.95%
cosh glibc 352 327 316 363 351 338 352 358 329 362 344
cosh cpml 138 139 137 141 140 142 138 138 144 138 139.67 146.29%
tanh glibc 260 258 270 260 266 260 263 257 265 256 261.67
tanh cpml 212 203 199 203 203 198 210 195 212 198 202.33 29.33%
asin glibc 1434 1197 1227 1300 1346 1323 1390 1227 1274 1244 1280.89
asin cpml 627 611 581 641 612 596 660 586 620 692 622.11 105.89%
acos glibc 1034 1207 1054 1015 1015 1031 1068 994 1051 964 1044.33
acos cpml 621 625 657 635 610 638 587 623 617 614 622.89 67.66%
atan glibc 932 860 904 904 866 902 948 879 908 880 894.56
atan cpml 566 536 538 519 538 513 521 533 526 497 524.56 70.54%
asinh glibc 784 743 749 741 742 773 751 726 784 743 750.22
asinh cpml 519 513 506 494 527 494 494 529 495 510 506.89 48.00%
acosh glibc 912 866 855 785 990 823 865 820 845 836 853.89
acosh cpml 954 912 898 946 905 904 898 920 928 885 910.67 -6.23%
atanh glibc 1125 1939 1071 1053 1079 1085 1068 996 1062 1143 1166.22
atanh cpml 875 898 851 828 912 864 843 871 1355 851 919.22 26.87%
floor glibc 88 82 82 79 76 91 87 84 84 86 83.44
floor cpml 121 123 128 120 121 119 117 120 126 112 120.67 -30.85%
ceil glibc 89 90 85 85 86 79 81 80 81 80 83
ceil cpml 123 117 115 114 131 114 119 114 122 118 118.22 -29.79%
round glibc 102 77 78 90 85 87 77 83 85 84 82.89
round cpml 366 111 115 102 118 112 108 107 109 111 110.33 -24.87%
trunc glibc 84 83 87 89 77 84 82 85 85 84 84
trunc cpml 118 120 118 117 116 119 112 109 114 110 115 -26.96%
log glibc 790 764 767 763 768 768 732 739 744 747 754.67
log cpml 502 456 476 476 465 460 922 484 481 456 519.56 45.25%
log10 glibc 840 803 808 802 784 856 785 782 774 826 802.22
log10 cpml 527 534 549 551 536 578 535 530 540 537 543.33 47.65%
log2 glibc 493 499 522 504 478 504 519 495 499 539 506.56
log2 cpml 520 493 514 509 481 520 493 493 493 728 524.89 -3.49%
log1p glibc 233 240 235 224 234 230 227 228 229 233 231.11
log1p cpml 304 279 276 297 269 299 291 291 305 286 288.11 -19.78%
exp glibc 401 357 357 403 344 407 413 475 372 401 392.11
exp cpml 130 138 132 128 133 139 124 130 137 126 131.89 197.30%
expm1 glibc 225 205 218 208 214 213 216 211 221 208 212.67
expm1 cpml 160 165 169 157 166 162 157 164 165 157 162.44 30.92%
exp2 glibc 1339 1314 1327 1305 1339 1310 1284 1284 1334 1309 1311.78
exp2 cpml 151 149 136 140 148 137 139 149 138 138 141.56 826.66%

As can be seen, many math (especially trigonometric) functions are 50-200% faster in libcpml. In other cases, such as the rounding functions, glibc is faster.

A few notes:

As more evidence of libcpml's superiority, by simply linking nbench with -lcpml instead of -lm, the fourier benchmark gets a speed up of 2.5x to 3.0x.

If you'd like to run this test yourself, here's how. (I assume you run Gentoo on your Alpha.)

If you like, email the results to me. I'd like to see what these benchmarks look like on an EV5 machine.

Tags: alpha linux