mattst88's blog

The Case for Open Sourcing Alpha-Optimized Libraries

As long as I've been interested in Alpha hardware, I've been intrigued by Compaq's Alpha-optimized compilers and libraries. In some cases, the compilers produce code multiple times faster than by gcc. The math library, libcpml, contains functions that execute in half the time of their glibc equivalents. Since the abandonment of the Alpha platform, this code has languished. In some cases, the performance gap between Compaq's tools and their open source counterparts has shrunk. In others, the benefits of hand-tuned assembly still shine. This prompted me to contact HP and request the release of the code. They unfortunately concluded that an old MIPS license prevented them from releasing the compilers. I've recently contacted HP once again to persuade them to release libcpml and libots as free software, as libraries containing nothing but hand-tuned Alpha assembly could not be encumbered by this license. I also attached the following benchmarks as evidence of why this code is still valuable so many years after it was written.

Using a test suite I wrote, I benchmarked the implementation of math functions found in glibc with those in libcpml.

Function Library Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Average Speed Up (cpml over glibc)
sin glibc 311 292 312 310 297 304 307 300 312 312 305.11
sin cpml 156 155 148 150 152 154 152 151 151 163 152.89 99.56%
cos glibc 251 246 245 243 392 251 240 240 252 240 261
cos cpml 156 151 153 160 159 160 146 149 151 157 154 69.48%
tan glibc 9384 9345 9351 9252 9195 9273 9237 9272 9213 9239 9264.11
tan cpml 172 169 168 166 168 173 170 176 183 175 172 5286.11%
sinh glibc 305 296 296 302 291 295 437 295 294 298 311.56
sinh cpml 141 136 140 139 135 139 137 140 136 139 137.89 125.95%
cosh glibc 352 327 316 363 351 338 352 358 329 362 344
cosh cpml 138 139 137 141 140 142 138 138 144 138 139.67 146.29%
tanh glibc 260 258 270 260 266 260 263 257 265 256 261.67
tanh cpml 212 203 199 203 203 198 210 195 212 198 202.33 29.33%
asin glibc 1434 1197 1227 1300 1346 1323 1390 1227 1274 1244 1280.89
asin cpml 627 611 581 641 612 596 660 586 620 692 622.11 105.89%
acos glibc 1034 1207 1054 1015 1015 1031 1068 994 1051 964 1044.33
acos cpml 621 625 657 635 610 638 587 623 617 614 622.89 67.66%
atan glibc 932 860 904 904 866 902 948 879 908 880 894.56
atan cpml 566 536 538 519 538 513 521 533 526 497 524.56 70.54%
asinh glibc 784 743 749 741 742 773 751 726 784 743 750.22
asinh cpml 519 513 506 494 527 494 494 529 495 510 506.89 48.00%
acosh glibc 912 866 855 785 990 823 865 820 845 836 853.89
acosh cpml 954 912 898 946 905 904 898 920 928 885 910.67 -6.23%
atanh glibc 1125 1939 1071 1053 1079 1085 1068 996 1062 1143 1166.22
atanh cpml 875 898 851 828 912 864 843 871 1355 851 919.22 26.87%
floor glibc 88 82 82 79 76 91 87 84 84 86 83.44
floor cpml 121 123 128 120 121 119 117 120 126 112 120.67 -30.85%
ceil glibc 89 90 85 85 86 79 81 80 81 80 83
ceil cpml 123 117 115 114 131 114 119 114 122 118 118.22 -29.79%
round glibc 102 77 78 90 85 87 77 83 85 84 82.89
round cpml 366 111 115 102 118 112 108 107 109 111 110.33 -24.87%
trunc glibc 84 83 87 89 77 84 82 85 85 84 84
trunc cpml 118 120 118 117 116 119 112 109 114 110 115 -26.96%
log glibc 790 764 767 763 768 768 732 739 744 747 754.67
log cpml 502 456 476 476 465 460 922 484 481 456 519.56 45.25%
log10 glibc 840 803 808 802 784 856 785 782 774 826 802.22
log10 cpml 527 534 549 551 536 578 535 530 540 537 543.33 47.65%
log2 glibc 493 499 522 504 478 504 519 495 499 539 506.56
log2 cpml 520 493 514 509 481 520 493 493 493 728 524.89 -3.49%
log1p glibc 233 240 235 224 234 230 227 228 229 233 231.11
log1p cpml 304 279 276 297 269 299 291 291 305 286 288.11 -19.78%
exp glibc 401 357 357 403 344 407 413 475 372 401 392.11
exp cpml 130 138 132 128 133 139 124 130 137 126 131.89 197.30%
expm1 glibc 225 205 218 208 214 213 216 211 221 208 212.67
expm1 cpml 160 165 169 157 166 162 157 164 165 157 162.44 30.92%
exp2 glibc 1339 1314 1327 1305 1339 1310 1284 1284 1334 1309 1311.78
exp2 cpml 151 149 136 140 148 137 139 149 138 138 141.56 826.66%

As can be seen, many math (especially trigonometric) functions are 50-200% faster in libcpml. In other cases, such as the rounding functions, glibc is faster.

A few notes:

As more evidence of libcpml's superiority, by simply linking nbench with -lcpml instead of -lm, the fourier benchmark gets a speed up of 2.5x to 3.0x.

If you'd like to run this test yourself, here's how. (I assume you run Gentoo on your Alpha.)

If you like, email the results to me. I'd like to see what these benchmarks look like on an EV5 machine.

– Tags: alpha linux

UP1500's and 833 MHz Alpha CPUs For Sale

One of the hardest things about using an Alternative Architecture like the Alpha is the small userbase. Since very few people have Alpha hardware, relative to other architectures, if one encounters a problem there are exceedingly few users able and willing to help. Even worse, if the problem is specific to your model, the chances of getting help are slimmed even more. Another issue is the difficulty in finding replacement parts. Want replacement Slot B CPUs? How about the impossible to find UP1500? In most cases, you'd have a terrible time even finding the parts and when you do, watch out for the price tag. Fortunately for you, I've got both of these areas covered. I've got brand new, sealed, in the box, latest revision UP1500 motherboards and unused, in the box 833 MHz 4MB Slot B CPUs for sale! Edit: Sold out.

The Samsung UP1500 is the quintessential Alpha motherboard. It sports

Unbelievably, these boards are brand new and still sealed in the box. The factory date is listed as 01/12/28. Someone packed these away in a warehouse seven years ago and forgot about them. Back then, they could have sold them at prices in excess of 2500 dollars. Bad for them. Good for you. Their loss is your gain.

This is the only Alpha to support DDR RAM, and outside of the outrageously expensive EV7 Marvel systems, the only Alpha to support AGP 4x!

At the time of this writing, I've got mine set up with 4 GB of CL2 DDR RAM, a 4 port USB 2.0 PCI card, and a Radeon X1550 PCI card. I can't pass up the AGP though, so I'm waiting to grab a Radeon 9800 Pro.

Maybe you've already got a nice Samsung Alpha motherboard, such as the UP2000[+]. Unfortunately, your really nice and expensive processors wore out after years of service, and you can't find replacements. Don't worry about replacements. I've got upgrades!

These processors are the fastest available for the UP2000! Upgrade from your old 667 MHz 2MB CPUs to a pair of 833 MHz 4MB EV68AL Slot B processors.

Disclaimer: All the parts are working to the best of my knowledge. All UP1500s are sealed in their original boxes. I've opened one for myself, and it operates beautifully. The Slot B CPUs are opened but unused.

All these parts are guaranteed not to be dead-on-arrival.

If you're an Alpha fan and would like to get your hands on the perfect Samsung motherboard or a pair of the fastest Slot B CPUs, contact me. Quantities are extremely limited. Customers are served on a first come first served basis.

I sincerely hope that by putting some UP1500s and fast CPUs in the hands of Alpha users, we can band together to fix the problems we face.

– Tags: alpha linux

The State of Alpha Linux

Software is never finished; it's forgotten. There is always one more enhancement to be made or one little quirk to work out. Sometimes there are even big problems. It happens from time to time. It's expected, and it's expected that the problems will be fixed. After spending quite a bit of time recently working with Linux on the Alpha platform, I've come to realize we face some very serious problems. And unfortunately, these may not ever be fixed, putting in jeopardy the future (hah!) of Alpha/Linux. I decided to articulate these problems in an email to the Linux on Alpha Processors mailing list in order to inform and ultimately find solutions and breathe a bit of life back into Alpha/Linux. I'd like to think that Alpha/Linux isn't a piece of forgotten software, not yet.

The State of Alpha Linux

We're all subscribed to this list because we use a dying platform. We do what we can to keep it going, but in recent months the State of Alpha Linux has been deteriorating at an accelerated rate.

Let me outline some issues facing us today:

  1. We have no glibc/Alpha maintainer [1]
  2. Kernel development for Alpha is comatose
  3. We can't run modern X.Org [2]

To make things worse, for such a small group of users, we're much too segregated and disorganized. For instance, how many (of the only four) Gentoo/Alpha maintainers are subscribed to this list? Debian/Alpha? How many realized we were without a glibc maintainer? That we can't use X.Org 7.4?

If this trend continues, we will completely first lose X.Org support. I even had an X.Org developer tell me he didn't care [about Alpha support] when I pinged him about an Alpha bug he had originally filed [3]!

We'll later lose glibc support. As it stands now, Alpha isn't even in the main tree [4]. I'm not sure what version Debian ships, but Gentoo is 3 versions behind at 2.6.1. Newer than that and the test suite causes a hard lock [5]. How much longer is it going to be before 2.6 is incompatible with the latest version and we begin to lose the ability to use other modern software?

While we may never lose kernel support, it will certainly begin to lag behind other platforms more and more. Bugs begin to take longer and longer to be fixed [6]. Release candidate kernels as late in the cycle as rc-8 of the 2.6.28 series fail to compile on Alpha [7]. This is definitely a worrying sign.

It is certainly expected that as a platform ages, it slowly loses its users and developers. In 1999, many average users knew or we're interested in learning Alpha assembly language, were interested in support for Alpha among Free Software, and were interested in programming for the platform. Obviously this cannot be the case today. We don't expect that it should.

We, the ones who do wish to see our platform live on, even if only a little longer, should focus on fixing what we can and maintaining what we already have.

Whether Fedora adds Alpha as a Second Tier Architecture is trivial in comparison to these issues. We should focus on making sure we have working software for Fedora/Alpha before we consider how to properly market it.

We, the small band of Alpha users, need to work together. We have the same problems, why should we work separately on them?

In order to facilitate better communication among Alpha users, developers, please use the Alpha IRC channel on Freenode, #alpha, and the Wiki [8]. If you have unused hardware that may be useful to developers, consider donating it.

From here, it's up to us to find solutions to these problems.

Ideas and Suggestions requested.

Matt Turner

  1. http://sourceware.org/ml/libc-alpha/2008-12/msg00009.html
  2. http://bugs.freedesktop.org/show_bug.cgi?id=17801
    http://bugzilla.kernel.org/show_bug.cgi?id=10893
  3. http://bugs.freedesktop.org/show_bug.cgi?id=19026
  4. http://sources.redhat.com/bugzilla/show_bug.cgi?id=6896
  5. Actually a kernel problem,
    http://bugs.gentoo.org/show_bug.cgi?id=205099
  6. http://bugzilla.kernel.org/show_bug.cgi?id=10893
  7. http://lkml.org/lkml/2008/10/29/69
  8. http://alphalinux.org/wiki/index.php/Main_Page

What can we do? I think there are a couple things we need to do, namely:

If we can do these things, we will be on the road to recovery.

– Tags: alpha gentoo glibc linux xorg

Does Anyone Care About Fixing Bugs?

As time goes on, alternative architectures like Alpha and PA-RISC slowly lose their userbase. Experienced developers move on to things that interest them more. Emphasis isn't put on fixing bugs for these aging platforms, and the level of support slowly erodes. Eventually a small hardcore userbase is all that is left. The Gentoo Bugzilla showed this effect on the Alpha platform. All nontrivial bugs were left to rot. What's worse, many bugs were so old that the software containing them wasn't even in Portage anymore, yet no one closed the bug report or asked if it was fixed. One, a two-and-a-half-year-old bug about a failing cipher algorithm in libmcrypt caught my eye. I decided I'd give fixing it a shot.

The project's KNOWN-BUGS file stated

- cast-256 and rc6 do not work properly on Alpha (64 bit) machines

Fittingly, the bug was filed by a developer who has since retired. An automated test suite included with libmcrypt reported a failing cipher, CAST-256. Maybe it's a bug with gcc. Months pass. If it is, it's a bug across both 3.x and 4.x series. Years pass. Maybe we'll just mask the failure.

No one seemed to want to fire up vi and check the code.

I decided I'd compile the same version side by side on my AMD64 desktop and my UP1500 Alpha. Both compile cleanly, and I can reproduce the failing case quickly. The first thing I check is the test suite itself by adding print statements and comparing the output between the AMD64 and Alpha systems. All the start-up code looks fine. The problem has to be in the library itself, which is what I expect.

Finally, I find that the results begin to vary during a function call to _mcrypt_set_key. Continuing, I slowly isolate the failing code to the k_rnd macro, then the f1 macro, and finally to the rotl32 macro.

The rotl32 macro rotates bits left in a 32-bit memory cell. The macro and its siblings look like

#define rotl32(x,n)   (((x) << ((word32)(n))) | ((x) >> (32 - (word32)(n))))
#define rotr32(x,n)   (((x) >> ((word32)(n))) | ((x) << (32 - (word32)(n))))
#define rotl16(x,n)   (((x) << ((word16)(n))) | ((x) >> (16 - (word16)(n))))
#define rotr16(x,n)   (((x) >> ((word16)(n))) | ((x) << (16 - (word16)(n))))

I confirmed that this function did yield different results on AMD64 and Alpha by writing a small test program. Guessing, I figured that this implementation wasn't compatible with Alphas and that I could easily find another working implementation. In the Linux Kernel's include/linux/bitops.h file, they had virtually the same implementation. No luck there.

After a few hours of scouring the internet for quick-fix solutions, I turned to the Alpha Architecture Handbook and look up Alpha's shift instructions, sll and srl.

SxL   Ra.rq,Rb.rq,Rc.wq
Rc <- LEFT_SHIFT (Rav, Rbv<5:0>)      !SLL
Rc <- RIGHT_SHIFT(Rav, Rbv<5:0>)      !SRL

Beyond the terse syntax, this means that only six bits of the shift argument matter. The designers did this because with the Alpha's 64-bit wide registers, it doesn't make sense to implement instructions (and circuitry) to shift more than 63 times. Just the same, the rotl32 macro is only supposed to operate on 32-bit numbers, so it doesn't make sense to rotate more than 31 times.

The result of rotating 32 times should be the same as the number input, since it would rotate the bits the entire width of the field. On Alpha though there are more than 32-bits in each register, so shifting 32 times doesn't leave the bits in place. It moves them into the upper part of the register.

By masking the shift argument and ignoring all but the first five bits, I fixed the problem.

#define rotl32(x,n)   (((x) << ((word32)(n & 31))) | ((x) >> (32 - (word32)(n & 31))))
#define rotr32(x,n)   (((x) >> ((word32)(n & 31))) | ((x) << (32 - (word32)(n & 31))))
#define rotl16(x,n)   (((x) << ((word16)(n & 15))) | ((x) >> (16 - (word16)(n & 15))))
#define rotr16(x,n)   (((x) >> ((word16)(n & 15))) | ((x) << (16 - (word16)(n & 15))))

This bug didn't affect AMD64, since it has 32-bit shift instructions as well as 64-bit. Undoubtedly though, had this been a problem on AMD64 instead of an obscure and aging architecture such as Alpha, it would have been fixed in a heartbeat.

It's amazing that such a simple fix was needed to squash a bug that (1) was reported by a Gentoo/Alpha developer, and (2) had been in the tracker for two-and-a-half years.

Now, I need to check on that Kernel code. Who knows how long it's contained this bug!

– Tags: alpha gentoo linux

Status of X11 on Alpha

As mentioned yesterday, X.Org 7.4 (xserver-1.5 and newer) cannot operate on Alpha due to way it accesses PCI resources such as ROM information and video memory. Kernel Bug 10893 was filed 6 months ago, but nothing has been fixed. A work-around is to implement a fallback in libpciaccess that would access /dev/mem directly, as previous Xservers do. Unfortunately, no one appears to care enough about X support on Alpha to implement it.

Julien Cristau (jcristau), an X.Org developer, originally reported the implications of providing no fallback to the Debian bug tracking system. After failing to find it reported anywhere in FreeDesktop.org's Bugzilla, I reported it. On the #xorg-devel IRC channel, I asked jcristau if he could add anything to the bug report.

<mattst88> jcristau, if you could add anything to bug 19026, I'd really appreciate it.
<jcristau> mattst88: honestly i don't really care..

Not a good sign. Discouraged, I worked on something else for a half hour. I came back to IRC to see that developers had been discussing the fallback. No one was particularly enthusiastic. I asked Adam Jackson (ajax) of Red Hat why he opposed the fallback.

<mattst88> do we not want this fallback just on principle or because no one really cares to write it?
<ajax> mattst88: can't it be both?
<mattst88> sure, but is a temporary fallback really unacceptable?
<ajax> it's distasteful. i'm not going to write it. if someone else did, i probably wouldn't stop them.

I figured at this point I'd bother Ian Romanick, the libpciaccess maintainer, a bit more to see if I could get anything done. Before I had the chance though, David Airlie, responsible for all sorts of X development, responded.

<airlied> mattst88: also kms doesn't use the sysfs files
<airlied> or pciaccess.

The obvious implication of this statement is that once KMS (kernel modesetting) is implemented, lacking PCI resource files won't matter!

Unfortunately, it's not as quick and easy as we'd hope.

<airlied> but I need to revisit the whole mapping VRAM into unpriv userspace on those bonghits platforms.
<mattst88> right, so it should allow people to use radeons without fbdev, but isn't the radeon driver going to use sysfs/libpciaccess?
<airlied> mattst88: not with kms.
<airlied> userspace drivers in kms don't get access to all of VRAM
<airlied> or to registers.
<mattst88> so with kms, all this business about fallbacks and sysfs won't matter?
<airlied> no, however we have a whole new set of worries.
<airlied> mattst88: things like alpha sparsemem means we can't map VRAM into userspace on those platforms nicely.
<airlied> I need to read up more on the drug induced haze that is alpha mmio
<mattst88> is it doable? that is, are you at all interested in doing it? :)
<airlied> mattst88: I'm probably having to figure out how it might all work for IA64.
<mattst88> is that a similar situation to alpha?
<airlied> well its bad in that you can crap out certain machines if you allow users to access the mmio space.
<airlied> so its a DoS.

As always, there's work to be done, but this time it looks like there's someone who is actually going to do the work.

If anyone is interested in testing kernel modesetting with an R300, R400, or R500, check out David Airlie's drm-rawhide branch of his drm-2.6 kernel tree on Kernel.org.

I'll attempt to test with my Radeon X1550 and UP1500 motherboard soon and will report what I find.

– Tags: alpha linux radeon xorg