As long as I've been interested in Alpha hardware, I've been intrigued by Compaq's Alpha-optimized compilers and libraries. In some cases, the compilers produce code multiple times faster than by gcc. The math library, libcpml, contains functions that execute in half the time of their glibc equivalents. Since the abandonment of the Alpha platform, this code has languished. In some cases, the performance gap between Compaq's tools and their open source counterparts has shrunk. In others, the benefits of hand-tuned assembly still shine. This prompted me to contact HP and request the release of the code. They unfortunately concluded that an old MIPS license prevented them from releasing the compilers. I've recently contacted HP once again to persuade them to release libcpml and libots as free software, as libraries containing nothing but hand-tuned Alpha assembly could not be encumbered by this license. I also attached the following benchmarks as evidence of why this code is still valuable so many years after it was written.
Using a test suite I wrote, I benchmarked the implementation of math functions found in glibc with those in libcpml.
Function
Library
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10
Average
Speed Up (cpml over glibc)
sin
glibc
311
292
312
310
297
304
307
300
312
312
305.11
sin
cpml
156
155
148
150
152
154
152
151
151
163
152.89
99.56%
cos
glibc
251
246
245
243
392
251
240
240
252
240
261
cos
cpml
156
151
153
160
159
160
146
149
151
157
154
69.48%
tan
glibc
9384
9345
9351
9252
9195
9273
9237
9272
9213
9239
9264.11
tan
cpml
172
169
168
166
168
173
170
176
183
175
172
5286.11%
sinh
glibc
305
296
296
302
291
295
437
295
294
298
311.56
sinh
cpml
141
136
140
139
135
139
137
140
136
139
137.89
125.95%
cosh
glibc
352
327
316
363
351
338
352
358
329
362
344
cosh
cpml
138
139
137
141
140
142
138
138
144
138
139.67
146.29%
tanh
glibc
260
258
270
260
266
260
263
257
265
256
261.67
tanh
cpml
212
203
199
203
203
198
210
195
212
198
202.33
29.33%
asin
glibc
1434
1197
1227
1300
1346
1323
1390
1227
1274
1244
1280.89
asin
cpml
627
611
581
641
612
596
660
586
620
692
622.11
105.89%
acos
glibc
1034
1207
1054
1015
1015
1031
1068
994
1051
964
1044.33
acos
cpml
621
625
657
635
610
638
587
623
617
614
622.89
67.66%
atan
glibc
932
860
904
904
866
902
948
879
908
880
894.56
atan
cpml
566
536
538
519
538
513
521
533
526
497
524.56
70.54%
asinh
glibc
784
743
749
741
742
773
751
726
784
743
750.22
asinh
cpml
519
513
506
494
527
494
494
529
495
510
506.89
48.00%
acosh
glibc
912
866
855
785
990
823
865
820
845
836
853.89
acosh
cpml
954
912
898
946
905
904
898
920
928
885
910.67
-6.23%
atanh
glibc
1125
1939
1071
1053
1079
1085
1068
996
1062
1143
1166.22
atanh
cpml
875
898
851
828
912
864
843
871
1355
851
919.22
26.87%
floor
glibc
88
82
82
79
76
91
87
84
84
86
83.44
floor
cpml
121
123
128
120
121
119
117
120
126
112
120.67
-30.85%
ceil
glibc
89
90
85
85
86
79
81
80
81
80
83
ceil
cpml
123
117
115
114
131
114
119
114
122
118
118.22
-29.79%
round
glibc
102
77
78
90
85
87
77
83
85
84
82.89
round
cpml
366
111
115
102
118
112
108
107
109
111
110.33
-24.87%
trunc
glibc
84
83
87
89
77
84
82
85
85
84
84
trunc
cpml
118
120
118
117
116
119
112
109
114
110
115
-26.96%
log
glibc
790
764
767
763
768
768
732
739
744
747
754.67
log
cpml
502
456
476
476
465
460
922
484
481
456
519.56
45.25%
log10
glibc
840
803
808
802
784
856
785
782
774
826
802.22
log10
cpml
527
534
549
551
536
578
535
530
540
537
543.33
47.65%
log2
glibc
493
499
522
504
478
504
519
495
499
539
506.56
log2
cpml
520
493
514
509
481
520
493
493
493
728
524.89
-3.49%
log1p
glibc
233
240
235
224
234
230
227
228
229
233
231.11
log1p
cpml
304
279
276
297
269
299
291
291
305
286
288.11
-19.78%
exp
glibc
401
357
357
403
344
407
413
475
372
401
392.11
exp
cpml
130
138
132
128
133
139
124
130
137
126
131.89
197.30%
expm1
glibc
225
205
218
208
214
213
216
211
221
208
212.67
expm1
cpml
160
165
169
157
166
162
157
164
165
157
162.44
30.92%
exp2
glibc
1339
1314
1327
1305
1339
1310
1284
1284
1334
1309
1311.78
exp2
cpml
151
149
136
140
148
137
139
149
138
138
141.56
826.66%
As can be seen, many math (especially trigonometric) functions are 50-200% faster in libcpml. In other cases, such as the rounding functions, glibc is faster.
A few notes:
Testing was done on my UP1500 with an 800 MHz EV68AL, 8MB L2 cache, and 4 GB RAM
It may not be fair to benchmark ceil/floor as their implementations in glibc are not correct
I don't entirely trust the glibc tan results, as they appear to be 50x slower than libcpml
As more evidence of libcpml's superiority, by simply linking nbench with -lcpml instead of -lm, the fourier benchmark gets a speed up of 2.5x to 3.0x.
If you'd like to run this test yourself, here's how. (I assume you run Gentoo on your Alpha.)
One of the hardest things about using an Alternative Architecture like the Alpha is the small userbase. Since very few people have Alpha hardware, relative to other architectures, if one encounters a problem there are exceedingly few users able and willing to help. Even worse, if the problem is specific to your model, the chances of getting help are slimmed even more. Another issue is the difficulty in finding replacement parts. Want replacement Slot B CPUs? How about the impossible to find UP1500? In most cases, you'd have a terrible time even finding the parts and when you do, watch out for the price tag. Fortunately for you, I've got both of these areas covered. I've got brand new, sealed, in the box, latest revision UP1500 motherboards and unused, in the box 833 MHz 4MB Slot B CPUs for sale! Edit: Sold out.
The Samsung UP1500 is the quintessential Alpha motherboard. It sports
an 800 MHz EV68AL Alpha processor, with 8 MB of L2 cache
the latest revision (Rev B4) AMD-761 (Irongate-2) chipset
up to 4 GB of ECC Registered DDR PC2100 RAM
an AGP 4x
other niceties such as ATA-100, on-board Ethernet, Sound, USB, and 3 PCI slots
Unbelievably, these boards are brand new and still sealed in the box. The factory date is listed as 01/12/28. Someone packed these away in a warehouse seven years ago and forgot about them. Back then, they could have sold them at prices in excess of 2500 dollars. Bad for them. Good for you. Their loss is your gain.
This is the only Alpha to support DDR RAM, and outside of the outrageously expensive EV7 Marvel systems, the only Alpha to support AGP 4x!
At the time of this writing, I've got mine set up with 4 GB of CL2 DDR RAM, a 4 port USB 2.0 PCI card, and a Radeon X1550 PCI card. I can't pass up the AGP though, so I'm waiting to grab a Radeon 9800 Pro.
Maybe you've already got a nice Samsung Alpha motherboard, such as the UP2000[+]. Unfortunately, your really nice and expensive processors wore out after years of service, and you can't find replacements. Don't worry about replacements. I've got upgrades!
These processors are the fastest available for the UP2000! Upgrade from your old 667 MHz 2MB CPUs to a pair of 833 MHz 4MB EV68AL Slot B processors.
Disclaimer: All the parts are working to the best of my knowledge. All UP1500s are sealed in their original boxes. I've opened one for myself, and it operates beautifully. The Slot B CPUs are opened but unused.
All these parts are guaranteed not to be dead-on-arrival.
If you're an Alpha fan and would like to get your hands on the perfect Samsung motherboard or a pair of the fastest Slot B CPUs, contact me. Quantities are extremely limited. Customers are served on a first come first served basis.
I sincerely hope that by putting some UP1500s and fast CPUs in the hands of Alpha users, we can band together to fix the problems we face.
Software is never finished; it's forgotten. There is always one more enhancement to be made or one little quirk to work out. Sometimes there are even big problems. It happens from time to time. It's expected, and it's expected that the problems will be fixed. After spending quite a bit of time recently working with Linux on the Alpha platform, I've come to realize we face some very serious problems. And unfortunately, these may not ever be fixed, putting in jeopardy the future (hah!) of Alpha/Linux. I decided to articulate these problems in an email to the Linux on Alpha Processors mailing list in order to inform and ultimately find solutions and breathe a bit of life back into Alpha/Linux. I'd like to think that Alpha/Linux isn't a piece of forgotten software, not yet.
The State of Alpha Linux
We're all subscribed to this list because we use a dying platform. We do what we can to keep it going, but in recent months the State of Alpha Linux has been deteriorating at an accelerated rate.
Let me outline some issues facing us today:
We have no glibc/Alpha maintainer [1]
Kernel development for Alpha is comatose
We can't run modern X.Org [2]
To make things worse, for such a small group of users, we're much too segregated and disorganized. For instance, how many (of the only four) Gentoo/Alpha maintainers are subscribed to this list? Debian/Alpha? How many realized we were without a glibc maintainer? That we can't use X.Org 7.4?
If this trend continues, we will completely first lose X.Org support. I even had an X.Org developer tell me he didn't care [about Alpha support] when I pinged him about an Alpha bug he had originally filed [3]!
We'll later lose glibc support. As it stands now, Alpha isn't even in the main tree [4]. I'm not sure what version Debian ships, but Gentoo is 3 versions behind at 2.6.1. Newer than that and the test suite causes a hard lock [5]. How much longer is it going to be before 2.6 is incompatible with the latest version and we begin to lose the ability to use other modern software?
While we may never lose kernel support, it will certainly begin to lag behind other platforms more and more. Bugs begin to take longer and longer to be fixed [6]. Release candidate kernels as late in the cycle as rc-8 of the 2.6.28 series fail to compile on Alpha [7]. This is definitely a worrying sign.
It is certainly expected that as a platform ages, it slowly loses its users and developers. In 1999, many average users knew or we're interested in learning Alpha assembly language, were interested in support for Alpha among Free Software, and were interested in programming for the platform. Obviously this cannot be the case today. We don't expect that it should.
We, the ones who do wish to see our platform live on, even if only a little longer, should focus on fixing what we can and maintaining what we already have.
Whether Fedora adds Alpha as a Second Tier Architecture is trivial in comparison to these issues. We should focus on making sure we have working software for Fedora/Alpha before we consider how to properly market it.
We, the small band of Alpha users, need to work together. We have the same problems, why should we work separately on them?
In order to facilitate better communication among Alpha users, developers, please use the Alpha IRC channel on Freenode, #alpha, and the Wiki [8]. If you have unused hardware that may be useful to developers, consider donating it.
From here, it's up to us to find solutions to these problems.
What can we do? I think there are a couple things we need to do, namely:
Consolidate our efforts by consolidating distributions. With as few users as we have, we have fewer developers. There's no use in testing packages on Debian or Fedora when they're already tested in Gentoo.
Demand that Alpha remain supported. Projects, including projects integral to the Linux desktop such as X.Org, need to know that we do still use Alpha hardware and that we want to be supported. Make yourself heard in #xorg-devel and appropriate mailing lists.
Experienced developers need to take the lead. We understand that it's hard to justify time spent working on Alpha-related issues. We do not ask much. We just ask that you not abandon us.
If we can do these things, we will be on the road to recovery.
As time goes on, alternative architectures like Alpha and PA-RISC slowly lose their userbase. Experienced developers move on to things that interest them more. Emphasis isn't put on fixing bugs for these aging platforms, and the level of support slowly erodes. Eventually a small hardcore userbase is all that is left. The Gentoo Bugzilla showed this effect on the Alpha platform. All nontrivial bugs were left to rot. What's worse, many bugs were so old that the software containing them wasn't even in Portage anymore, yet no one closed the bug report or asked if it was fixed. One, a two-and-a-half-year-old bug about a failing cipher algorithm in libmcrypt caught my eye. I decided I'd give fixing it a shot.
The project's KNOWN-BUGS file stated
- cast-256 and rc6 do not work properly on Alpha (64 bit) machines
Fittingly, the bug was filed by a developer who has since retired. An automated test suite included with libmcrypt reported a failing cipher, CAST-256. Maybe it's a bug with gcc. Months pass. If it is, it's a bug across both 3.x and 4.x series. Years pass. Maybe we'll just mask the failure.
No one seemed to want to fire up vi and check the code.
I decided I'd compile the same version side by side on my AMD64 desktop and my UP1500 Alpha. Both compile cleanly, and I can reproduce the failing case quickly. The first thing I check is the test suite itself by adding print statements and comparing the output between the AMD64 and Alpha systems. All the start-up code looks fine. The problem has to be in the library itself, which is what I expect.
Finally, I find that the results begin to vary during a function call to _mcrypt_set_key. Continuing, I slowly isolate the failing code to the k_rnd macro, then the f1 macro, and finally to the rotl32 macro.
The rotl32 macro rotates bits left in a 32-bit memory cell. The macro and its siblings look like
I confirmed that this function did yield different results on AMD64 and Alpha by writing a small test program. Guessing, I figured that this implementation wasn't compatible with Alphas and that I could easily find another working implementation. In the Linux Kernel's include/linux/bitops.h file, they had virtually the same implementation. No luck there.
After a few hours of scouring the internet for quick-fix solutions, I turned to the Alpha Architecture Handbook and look up Alpha's shift instructions, sll and srl.
Beyond the terse syntax, this means that only six bits of the shift argument matter. The designers did this because with the Alpha's 64-bit wide registers, it doesn't make sense to implement instructions (and circuitry) to shift more than 63 times. Just the same, the rotl32 macro is only supposed to operate on 32-bit numbers, so it doesn't make sense to rotate more than 31 times.
The result of rotating 32 times should be the same as the number input, since it would rotate the bits the entire width of the field. On Alpha though there are more than 32-bits in each register, so shifting 32 times doesn't leave the bits in place. It moves them into the upper part of the register.
By masking the shift argument and ignoring all but the first five bits, I fixed the problem.
This bug didn't affect AMD64, since it has 32-bit shift instructions as well as 64-bit. Undoubtedly though, had this been a problem on AMD64 instead of an obscure and aging architecture such as Alpha, it would have been fixed in a heartbeat.
It's amazing that such a simple fix was needed to squash a bug that (1) was reported by a Gentoo/Alpha developer, and (2) had been in the tracker for two-and-a-half years.
Now, I need to check on that Kernel code. Who knows how long it's contained this bug!
As mentioned yesterday, X.Org 7.4 (xserver-1.5 and newer) cannot operate on Alpha due to way it accesses PCI resources such as ROM information and video memory. Kernel Bug 10893 was filed 6 months ago, but nothing has been fixed. A work-around is to implement a fallback in libpciaccess that would access /dev/mem directly, as previous Xservers do. Unfortunately, no one appears to care enough about X support on Alpha to implement it.
Julien Cristau (jcristau), an X.Org developer, originally reported the implications of providing no fallback to the Debian bug tracking system. After failing to find it reported anywhere in FreeDesktop.org's Bugzilla, I reported it. On the #xorg-devel IRC channel, I asked jcristau if he could add anything to the bug report.
<mattst88> jcristau, if you could add anything to bug 19026, I'd really appreciate it.
<jcristau> mattst88: honestly i don't really care..
Not a good sign. Discouraged, I worked on something else for a half hour. I came back to IRC to see that developers had been discussing the fallback. No one was particularly enthusiastic. I asked Adam Jackson (ajax) of Red Hat why he opposed the fallback.
<mattst88> do we not want this fallback just on principle or because no one really cares to write it?
<ajax> mattst88: can't it be both?
<mattst88> sure, but is a temporary fallback really unacceptable?
<ajax> it's distasteful. i'm not going to write it. if someone else did, i probably wouldn't stop them.
I figured at this point I'd bother Ian Romanick, the libpciaccess maintainer, a bit more to see if I could get anything done. Before I had the chance though, David Airlie, responsible for all sorts of X development, responded.
<airlied> mattst88: also kms doesn't use the sysfs files
<airlied> or pciaccess.
The obvious implication of this statement is that once KMS (kernel modesetting) is implemented, lacking PCI resource files won't matter!
Unfortunately, it's not as quick and easy as we'd hope.
<airlied> but I need to revisit the whole mapping VRAM into unpriv userspace on those bonghits platforms.
<mattst88> right, so it should allow people to use radeons without fbdev, but isn't the radeon driver going to use sysfs/libpciaccess?
<airlied> mattst88: not with kms.
<airlied> userspace drivers in kms don't get access to all of VRAM
<airlied> or to registers.
<mattst88> so with kms, all this business about fallbacks and sysfs won't matter?
<airlied> no, however we have a whole new set of worries.
<airlied> mattst88: things like alpha sparsemem means we can't map VRAM into userspace on those platforms nicely.
<airlied> I need to read up more on the drug induced haze that is alpha mmio
<mattst88> is it doable? that is, are you at all interested in doing it? :)
<airlied> mattst88: I'm probably having to figure out how it might all work for IA64.
<mattst88> is that a similar situation to alpha?
<airlied> well its bad in that you can crap out certain machines if you allow users to access the mmio space.
<airlied> so its a DoS.
As always, there's work to be done, but this time it looks like there's someone who is actually going to do the work.
If anyone is interested in testing kernel modesetting with an R300, R400, or R500, check out David Airlie's drm-rawhide branch of his drm-2.6 kernel tree on Kernel.org.
I'll attempt to test with my Radeon X1550 and UP1500 motherboard soon and will report what I find.