mattst88's blog

mattst88's blog mattst88's blog - Does Anyone Care About Fixing Bugs? https://mattst88.com/ mattst88@gmail.com (Matt Turner) Thu, 13 May 2021 00:00:00 -0400 2024-05-02T22:49:18-04:00 GNOME 40 available in Gentoo <a href="https://forty.gnome.org/">GNOME 40</a> was released at the end of March, and yesterday I added the last bits of it to Gentoo. You may not think that's fast, and you'd be right, but it's a lot faster than any GNOME release has been added to Gentoo that I can recall. I wasn't looking to become Gentoo's GNOME maintainer when I joined the team 18 months ago. I only wanted to use a GNOME release that was a little less stale. So how did I get here? <script async="async" src="https://platform.twitter.com/widgets.js"></script> <blockquote class="twitter-tweet"> I asked about the GNOME 3.26 status when 3.28 and 3.30 were already out. Repeat that story until I got tired of waiting and added myself to the Gentoo/GNOME team and started updating glib... then I started updating mutter and gnome-shell... then I started updating everything... — Matt Turner (@mattst88) <a href="https://twitter.com/mattst88/status/1388347267099279360?ref_src=twsrc%5Etfw">May 1, 2021</a> </blockquote> GNOME has two major releases per year (in March and September), so to be more than two major releases behind is significant. At least two of my coworkers on the Mesa team at Intel switched to Gentoo for one reason or another, but ultimately switched back to their old distribution because Gentoo's GNOME packages were so out of date. That was pretty disappointing to hear, but I sympathized with them. I maintain the X11/Wayland stack in Gentoo, and I think I do a good job of keeping on top of it. I make upstream releases of X packages and contribute to Mesa professionally so I'm often able to make the upstream and downstream changes at the same time. But for GNOME I was just a user who happened to be a Gentoo Developer, so I started by just poking and asking if there was anything I could do to help. Unfortunately the answer was "no" nearly every time. So I just watched and occasionally asked how things were going. And occasionally GNOME updates happened, but the gap between Gentoo and upstream never really closed. GNOME 3.26 was added to Gentoo, and before significant progress was made on adding 3.28 or 3.30 a new major version 3.32 was released upstream. It looked like we were just treading water. What's worse, there were multiple unofficial overlays often providing newer versions of GNOME than what the ::gentoo repository contained. For reasons that were never clear to me, it seemed that none of the external overlay contributors (one of whom was a full Gentoo Developer!) were willing or able to collaborate with the Gentoo GNOME team. I started small by adding new versions of GNOME packages and making <a href="https://github.com/gentoo/gentoo/pull/10996">pull request</a> on GitHub for more experienced GNOME team members to review. Unfortunately by this time, the GNOME team had only one active member. I joined the GNOME team in October 2019 and worked around the edges, doing small version bumps of non-critical packages. Since most of the GNOME packages were behind, I began adding the next major GNOME's <a href="https://en.wikipedia.org/wiki/GLib">glib</a> to the tree to get extra testing. I figured if that additional testing caught issues before they could block the rest of GNOME from being updated that I could save us some time. That worked out pretty well, and I felt a little more confident so I added the next major GNOME's mutter and gnome-shell. Kind of scary. But that worked out well too. Users tested, filed bugs, and I fixed them. And since the most critical GNOME packages entered the ::gentoo repo long before the ancillary applications we didn't have any big surprises when it was time to ask for stabilization. Initially I had no idea which packages were related or if there were particular problems to look out for. This knowledge existed only in the head of one Gentoo Developer, so as I squeezed it out of him (as I made mistakes and he let me know!) I began <a href="https://wiki.gentoo.org/wiki/Project:GNOME/GNOME_Bumping_Guide">documenting it on the Wiki</a>. As I updated packages, I encountered various build system bugs. Gentoo naturally uncovers problems binary distributions don't notice. Whenever possible, I made a merge request upstream so that the next time we added a new version we wouldn't have to carry a patch. So far I've had <a href="https://gitlab.gnome.org/dashboard/merge_requests?scope=all&state=merged&author_username=mattst88">13 merge requests accepted</a>! Starting on March 20 I added the first bits of GNOME 40 to the tree (glib and some other packages are often released before the official release date). I added glib first, and then I figured I couldn't break anything too badly if I just bumped the GNOME games. I added gnome-shell (behind package.mask), and then sort of forgot that's where I normally stopped. Less than 8 weeks later, all of GNOME is entirely up to date in Gentoo! The bookends of adding GNOME 40 are commits <a href="https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=71e9245b05e62da290db311ea9f5d6cf7bab288c">71e9245b05e6</a> and <a href="https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=b93e3e581161111eba0a64d68376aea1fad7dfbd">b93e3e581161</a>. In that time I made 610 commits. The vast majority are GNOME-related (511 of them by my count). Categorized, they are: <ul> <li>2 reverted commits (both mine)</li> <li>229 commits adding new package versions</li> <li>152 commits dropping old package versions</li> <li>3 commits adding new packages</li> <li>7 commits adding support for Python 3.9</li> <li>118 miscellaneous commits fixing, cleaning, masking, unmasking</li> </ul> Those commits closed 120 bugs (and referenced 21 more), which made a nice dent in the Gentoo GNOME team's bug backlog. At the time of this writing, there are 514 bugs assigned to the GNOME team or with the GNOME team in the Cc list. By default, Bugzilla only shows 500 bugs on a single page, so the GNOME bug list doesn't even fit. That was a bit of a psychological hurdle for me to get started. It'll be a nice moment when we get to the other side of 500. I hope that with the gap to upstream now closed that some other Gentoo Developers and users will be more willing to help contribute. GNOME fell behind in Gentoo because it was too much work for a single person to maintain sustainably. I've remedied the most glaring symptom of the situation but not the underlying problem. Reach out to me if you'd like to help! Because it's fun to look at, here's the <a href="/../blog/2021/05/13/GNOME_40_available_in_Gentoo/gnome-40.0.html">output</a> of our <a href="https://gitweb.gentoo.org/proj/gentoo-bumpchecker.git/tree/gnome-bumpchecker.py">gnome-bumpchecker.py</a> tool, showing that we're indeed up-to-date on everything. <iframe src="/../blog/2021/05/13/GNOME_40_available_in_Gentoo/gnome-40.0.html" height="300"></iframe> https://mattst88.com/blog/2021/05/13/GNOME_40_available_in_Gentoo/ mattst88@gmail.com (Matt Turner) Thu, 17 May 2012 00:00:00 -0400 2024-05-02T22:49:18-04:00 Optimizing pixman for Loongson: Process and Results The <a href="http://www.lemote.com/en/products/Notebook/2010/0310/112.html">Lemote Yeeloong</a> is a small notebook that is often the computer of choice for Free Software advocates, including Richard Stallman. It's powered by an 800 MHz STMicroelectronics <a href="https://en.wikipedia.org/wiki/Loongson#Loongson_2F">Loongson 2F</a> processor and has an antiquated Silicon Motion 712 graphics chip. The SM712's acceleration features are pretty subpar for today's standards, and performance of the old XFree86 Acceleration Architecture (<a href="https://en.wikipedia.org/wiki/XFree86_Acceleration_Architecture">XAA</a>) that supports the SM712 has slowly decayed as developers move to support newer hardware and newer acceleration architectures. In short, graphics performance of the SM712 isn't very good with new X servers, so how can we improve it? If you don't care about how pixman was optimized and just want to see the results, you can <a href="#loongson-pixman-results">skip ahead</a>. <a href="http://cgit.freedesktop.org/pixman/">pixman</a>, the pixel-manipulation library used by <a href="http://cairographics.org/">cairo</a> and X has MMX-accelerated compositing functions, written using MMX via C-level intrinsic functions, which allow the programmer to write C but still have fine-grained control over performance sensitive MMX code. Last summer I began optimizing graphics performance of the <a href="http://wiki.laptop.org/go/XO-1.75">OLPC XO-1.75</a> laptop. The Marvell processor it uses supports iwMMXt2, a 64-bit SIMD instruction set designed by Intel for their XScale ARM CPUs. The instruction set is predictably very similar to Intel's original MMX instruction set. By design, Intel's MMX intrinsics also support generating iwMMXt instructions, so that the same optimized C code will be easily portable to processors supporting iwMMXt. With a relatively small amount of work (as compared to writing compositing functions in ARM/iwMMXt assembly) I had pixman's MMX optimized code working on the XO-1.75 for some nice performance gains. The Loongson 2F processor also includes a 64-bit SIMD instruction set, very similar to Intel's MMX. Its SIMD instructions use the 32 floating-point registers, and like iwMMXt it provides some useful instructions not found in x86 processors until AMD's Enhanced 3DNow! or Intel's SSE instruction sets. So just like I did with the XO-1.75, I planned to use pixman's existing MMX code to optimize performance on my Yeeloong. While Intel's MMX intrinsic functions are well designed, well tested, well supported, and widely used, the <a href="https://gcc.gnu.org/onlinedocs/gcc/MIPS-Loongson-Built-in-Functions.html">Loongson intrinsics</a> are none of these. In fact, they're incomplete, badly designed, and used no where I can find (indeed, all of the instances of Loongson-optimized SIMD code I have found are written in inline assembly, which is no surprise given the state of the intrinsics). Of course, the gcc manual doesn't tell me this, so I learned it only after trying to use them with pixman. <aside>Aside: let me pretend that I'm designing and implementing Loongson's vector intrinsics, covering an instruction set very similar to MMX, which already has an excellent set of intrinsic functions. Why would I create my own incompatible set, instead of implementing the same interface that lots of software already use?!</aside> Using the Loongson vector intrinsics, pixman passed the test suite, and objdump verified that gcc was successfully generating vector instructions, but the performance was terrible. gcc apparently was not privy to the knowledge that the integer data types returned by the intrinsics were actually stored in floating point registers, so in between any two vector instructions you might find three or four instructions that simply copied the same data back and forth between integer and floating-point registers. <pre>punpcklwd $f9,$f9,$f5 dmtc1 v0,$f8 punpcklwd $f19,$f19,$f5 dmfc1 t9,$f9 dmtc1 v0,$f9 dmtc1 t9,$f20 dmfc1 s0,$f19 punpcklbh $f20,$f20,$f2</pre> This path lead no where, so I decided to take the hint from previous programmers and forget that the Loongson intrinsics exist. I still wanted to use pixman's MMX code, so I implemented Intel's MMX intrinsics myself using Loongson inline assembly. Object code size was significantly smaller and performance was better, in fact much better in some select functions, but overall was still a net loss. There must have been optimization opportunities that I was missing. On the XO-1.75 the MMX code is faster than the generic code, so I didn't recognize inefficiencies in the MMX code the first time I worked with it, but with the Loongson it was necessary that I find and fix them. The great thing is that optimizations to this code benefit x86/MMX, ARM/iwMMXt, and Loongson. I took a look at the book Dirty Pixels at the suggestion of pixman's maintainer, Søren Sandmann. In it, I discovered that the original MMX instruction set lacked an unsigned packed-multiply high instruction which would be useful for the over compositing operation. To work around the lack of this instruction, an extra two shifts and an add had to be used. AMD recognized this inefficiency and added the instruction in Enhanced 3DNow! and later Intel did the same with SSE. I <a href="http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9">modified</a> the <code>pix_multiply</code> function to use the new instruction, and the resulting object code size shrunk by 5%. I realized that the <code>expand_alpha</code>, <code>expand_alpha_rev</code>, and <code>invert_colors</code> functions that mix and duplicate pixel components could be reduced from a combined total of around 30 instruction to a single instruction each. <a href="http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e">This change</a> further reduced object code size by another 9%. After that, I focused on eliminating unnecessary copies to and from the vector registers. Consider this code: <pre>__m64 vsrc = load8888 (*src);</pre> The code loads <code>*src</code> into an integer register, and then <code>load8888</code> loads and expands the value into a vector register. Instead, it's simpler and faster to load from memory into a vector register directly. By counting the number of <code>dmfc1</code> (doubleword move from floating-point) and <code>dmtc1</code> (doubleword move to floating-point) instructions I could determine which functions had room for improvement. After reducing the number of unnecessary copies and adding a number of other optimizations (list available <a href="http://lists.freedesktop.org/archives/pixman/2012-May/001946.html">here</a>) I was ready to see if the Yeeloong was more usable. <a id="loongson-pixman-results"></a> Results gathered from cairo's perf-trace tool confirm the real-world performance improvements given by the pixman optimizations. The image columns show the time in seconds to complete a cairo-perf-trace workload when using 32 bits per pixel and likewise image16 for 16 bits per pixel. The first column in both image and image16 groupings is the time to complete the workload without using Loongson MMI code. The second column is time to complete the workload after pixman commit <a href="http://cgit.freedesktop.org/pixman/commit/?id=c78e986085">c78e986085</a>, the commit that turns on the Loongson MMI code. The third column is the time to complete the workload with pixman-0.25.6 which has many more optimizations. <table class="ralign"> <tr><td> </td><th colspan="4">image</th><th colspan="4">image16</th></tr> <tr><td>evolution</td><td>32.985</td><td>29.667</td><td>28.752</td><th>14.7% faster</th><td>27.314</td><td>23.870</td><td>22.960</td><th>19.0% faster</th></tr> <tr><td>firefox-planet-gnome</td><td>197.982</td><td>180.437</td><td>169.532</td><th>16.8% faster</th><td>220.986</td><td>205.057</td><td>199.077</td><th>11.0% faster</th></tr> <tr><td>gnome-terminal-vim</td><td>60.799</td><td>50.528</td><td>50.792</td><th>19.7% faster</th><td>51.655</td><td>44.131</td><td>43.561</td><th>18.6% faster</th></tr> <tr><td>gvim</td><td>38.646</td><td>32.552</td><td>33.570</td><th>15.1% faster</th><td>38.126</td><td>34.453</td><td>35.457</td><th>7.5% faster</th></tr> <tr><td>ocitysmap</td><td>23.065</td><td>18.057</td><td>17.516</td><th>31.7% faster</th><td>23.046</td><td>18.055</td><td>17.543</td><th>31.4% faster</th></tr> <tr><td>poppler</td><td>43.676</td><td>36.077</td><td>35.498</td><th>23.0% faster</th><td>43.065</td><td>36.090</td><td>35.534</td><th>21.2% faster</th></tr> <tr><td>swfdec-giant-steps</td><td>20.166</td><td>20.365</td><td>20.469</td><td>no change</td><td>22.354</td><td>16.578</td><td>14.473</td><th>54.4% faster</th></tr> <tr><td>swfdec-youtube</td><td>31.502</td><td>28.118</td><td>24.168</td><th>30.3% faster</th><td>44.052</td><td>41.771</td><td>38.577</td><th>14.2% faster</th></tr> <tr><td>xfce4-terminal-a1</td><td>69.517</td><td>51.288</td><td>50.838</td><th>36.7% faster</th><td>62.225</td><td>53.309</td><td>44.297</td><th>40.5% faster</th></tr> </table> May 29th edit: the % faster numbers were previously calculated as a percent difference between the initial workload times and the final workload times. I realized that this calculation's result is not strictly a metric of how much faster the code is. To calculate that, the new formula is (1/initial - 1/final) / (1/initial)) which calculates the percent difference in terms of operations/second. This number is % faster. The table has been updated accordingly. As the results show, real-world performance is improved by the Loongson MMI code. I can tell a difference when using GNOME 3 (in fallback mode) on my Yeeloong. So far this has been very successful. I've optimized pixman on an interesting platform, learned a new instruction set, and in the process found many opportunities to optimize the MMX code on x86 and ARM. I still see a bunch of things to work on with just these compositing operations alone. Beyond that, there are many other things to do like bilinear and nearest scaling functions (which are extremely important for Firefox performance). And beyond that, I've improved my understanding of pixman's code and have a few ideas for improvements in general. Thanks to <ul> <li>Danny Clark, who runs <a href="http://freedomincluded.com/">freedomincluded.com</a>, for providing me with a Lemote Yeeloong laptop for my work on Gentoo's MIPS port.</li> <li>Søren Sandmann and Siarhei Siamashka for reviewing and helping me improve my code.</li> </ul> https://mattst88.com/blog/2012/05/17/Optimizing_pixman_for_Loongson:_Process_and_Results/ mattst88@gmail.com (Matt Turner) Tue, 02 Aug 2011 00:00:00 -0400 2024-05-02T22:49:18-04:00 New multilib N32 Gentoo MIPS Stages Gentoo/MIPS has been in, well, <a href="https://bugs.gentoo.org/show_bug.cgi?id=348653">not great shape</a> for quite some time. When I was going through Gentoo recruitment, there were no stages (used for installing Gentoo) newer than 2008, so this was one of the main things I wanted to improve, specifically by creating new N32 ABI stages. Even though the N32 (meaning New 32-bit) ABI was introduced in IRIX in 1996 to replace SGI's o32 (Old 32-bit) ABI, Linux support for N32 has lagged behind until the last few years. Now, I'm pleased to unofficially announce new multilib N32 stages and that we'll be supporting as the preferred ABI. MIPS has three main ABIs: o32 (32-bit integer and pointer), N32 (64-bit integer, 32-bit pointer), N64 (64-bit integer and pointer). Compared with N32 and N64, o32 is very restrictive. Very few function arguments are passed in registers; only half the number of floating point registers are usable; no native 64-bit integer datatype; no long double type. (see <a href="http://techpubs.sgi.com/library/manuals/2000/007-2816-005/pdf/007-2816-005.pdf">SGI's MIPSpro N32 ABI Handbook</a> for details). Offering N32 as the default ABI means better performance, sometimes 30% more, just by removing the unnecessary restrictions a 32-bit ABI imposes on 64-bit CPUs. Providing multilib stages (ie, stages with glibc and gcc built for all three ABIs) gives the user flexibility to switch to another ABI relatively easily if desired, while also allowing him to reduce build times by switching to an N32-only profile. The process of creating N32 (and especially multilib) stages wasn't straight forward. Our profiles were long unmaintained and in many cases totally broken. There were lots of keywording bugs open for mips, many where the MIPS was the last team to complete the request by years. There were actually some real bugs discovered too, like <a href="https://bugs.gentoo.org/show_bug.cgi?id=354877">354877</a> and <a href="https://bugs.gentoo.org/show_bug.cgi?id=358149">358149</a>, usually caused by the incorrect assumption that the lib directory is always a symlink to lib32. All in all, I've reduced the number of open bugs for MIPS down to ~20. Work needed to be done to <a href="https://wiki.gentoo.org/wiki/Catalyst">catalyst</a>, Gentoo's release building tool. Since the end of June, I've made 15 commits cleaning, fixing, and adding to the mips support code in catalyst. Other developers like <a href="http://blog.hartwork.org/">Sebastian Pipping</a> have also resumed work on a project that had otherwise been minimally maintained since the beginning of the year. The last major component in reviving Gentoo's MIPS support is to create installation media, preferably in an automated manner. I've acquired two <a href="/computers/bcm91250a/">Broadcom BCM91250A</a> MIPS development boards (and should be receiving a third soon), but they need disks, controllers, RAM, and cases. For that, I wrote a funding <a href="https://bugs.gentoo.org/attachment.cgi?id=281843">Proposal to build three MIPS development computers (pdf)</a> and had it approved by the Gentoo Foundation. Things seem to be going well in acquisitions (<a href="https://bugs.gentoo.org/show_bug.cgi?id=373241">track progress</a>) so I hope to have the project completed in the next few months with the systems automatically building stages for a wide variety of MIPS systems. Initially, I used a big-endian 2006.1 N32 stage and had to bootstrap my system with a series of at least 20 hacks (not a fun experience) until it was usable enough that I was able to build a clean N32 stage. From there, using crossdev I built a multilib toolchain, and with a few more hacks I was able to build a multilib stage. With that in the past, I've been building stages that can be used to seed the automated stage creation system to come. At this point, my TODO list looks like this: <ul> <li>Big Endian <ul> <li>multilib <ul> <li>(done) mips3 -mfix-r4000 -mfix-r4400 (for SGI Indy and Indigo2)</li> <li>(done) mips4 (for SGI Indy and O2)</li> <li>(done) r10k (for SGI Indigo2, Octane)</li> <li>(done) mips64 (for Broadcom Sibyte systems)</li> </ul> </li> <li>o32 <ul> <li>  mips32 (for embedded mips systems)</li> <li>  mips1 (for everything else)</li> </ul> </li> </ul> </li> <li>Little Endian <ul> <li>multilib <ul> <li>  mips3 -Wa,-mfix-loongson2f-nop (for Loongson 2 systems)</li> <li>  mips4 (for Cobalt systems)</li> <li>  mips64 (for Loongson 3, Broadcom Sibyte systems)</li> </ul> </li> <li>o32 <ul> <li>  mips32 (for embedded mips systems)</li> <li>  mips1 (for everything else)</li> </ul> </li> </ul> </li> </ul> The final touches will be to create bootable media like CD, USB, and netboot images. All stages are available in the experimental/mips/stages/ directory (as soon as the files propagate) of a <a href="http://www.gentoo.org/main/en/mirrors2.xml">Gentoo Mirror</a>. Hopefully by the time I'm able to convince Lemote (or, who?) to send me a <a href="https://en.wikipedia.org/wiki/Loongson#Loongson_3A_laptop">Loongson 3A laptop</a>, installing and using Gentoo/MIPS will be a fun and pleasant experience. https://mattst88.com/blog/2011/08/02/New_multilib_N32_Gentoo_MIPS_Stages/ mattst88@gmail.com (Matt Turner) Sun, 14 Dec 2008 00:00:00 -0500 2024-05-02T22:49:18-04:00 The State of Alpha Linux Software is never finished; it's forgotten. There is always one more enhancement to be made or one little quirk to work out. Sometimes there are even big problems. It happens from time to time. It's expected, and it's expected that the problems will be fixed. After spending quite a bit of time recently working with Linux on the Alpha platform, I've come to realize we face some very serious problems. And unfortunately, these may not ever be fixed, putting in jeopardy the future (hah!) of Alpha/Linux. I decided to articulate these problems in an email to the <a href="http://www.redhat.com/mailman/listinfo/axp-list">Linux on Alpha Processors</a> mailing list in order to inform and ultimately find solutions and breathe a bit of life back into Alpha/Linux. I'd like to think that Alpha/Linux isn't a piece of forgotten software, not yet. <blockquote> <h3>The State of Alpha Linux</h3> We're all subscribed to this list because we use a dying platform. We do what we can to keep it going, but in recent months the State of Alpha Linux has been deteriorating at an accelerated rate. Let me outline some issues facing us today: <ol> <li>We have no glibc/Alpha maintainer [1]</li> <li>Kernel development for Alpha is comatose</li> <li>We can't run modern X.Org [2]</li> </ol> To make things worse, for such a small group of users, we're much too segregated and disorganized. For instance, how many (of the only four) Gentoo/Alpha maintainers are subscribed to this list? Debian/Alpha? How many realized we were without a glibc maintainer? That we can't use X.Org 7.4? If this trend continues, we will completely first lose X.Org support. I even had an X.Org developer tell me he didn't care [about Alpha support] when I pinged him about an Alpha bug he had originally filed [3]! We'll later lose glibc support. As it stands now, Alpha isn't even in the main tree [4]. I'm not sure what version Debian ships, but Gentoo is 3 versions behind at 2.6.1. Newer than that and the test suite causes a hard lock [5]. How much longer is it going to be before 2.6 is incompatible with the latest version and we begin to lose the ability to use other modern software? While we may never lose kernel support, it will certainly begin to lag behind other platforms more and more. Bugs begin to take longer and longer to be fixed [6]. Release candidate kernels as late in the cycle as rc-8 of the 2.6.28 series fail to compile on Alpha [7]. This is definitely a worrying sign. It is certainly expected that as a platform ages, it slowly loses its users and developers. In 1999, many average users knew or we're interested in learning Alpha assembly language, were interested in support for Alpha among Free Software, and were interested in programming for the platform. Obviously this cannot be the case today. We don't expect that it should. We, the ones who do wish to see our platform live on, even if only a little longer, should focus on fixing what we can and maintaining what we already have. Whether Fedora adds Alpha as a Second Tier Architecture is trivial in comparison to these issues. We should focus on making sure we have working software for Fedora/Alpha before we consider how to properly market it. We, the small band of Alpha users, need to work together. We have the same problems, why should we work separately on them? In order to facilitate better communication among Alpha users, developers, please use the Alpha IRC channel on Freenode, <code>#alpha</code>, and the Wiki [8]. If you have unused hardware that may be useful to developers, consider donating it. From here, it's up to us to find solutions to these problems. Ideas and Suggestions requested. Matt Turner <ol> <li><a href="http://sourceware.org/ml/libc-alpha/2008-12/msg00009.html">http://sourceware.org/ml/libc-alpha/2008-12/msg00009.html</a></li> <li><a href="http://bugs.freedesktop.org/show_bug.cgi?id=17801">http://bugs.freedesktop.org/show_bug.cgi?id=17801</a> <a href="http://bugs.freedesktop.org/show_bug.cgi?id=17801">http://bugzilla.kernel.org/show_bug.cgi?id=10893</a></li> <li><a href="http://bugs.freedesktop.org/show_bug.cgi?id=19026">http://bugs.freedesktop.org/show_bug.cgi?id=19026</a></li> <li><a href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=6896">http://sources.redhat.com/bugzilla/show_bug.cgi?id=6896</a></li> <li>Actually a kernel problem, <a href="http://bugs.gentoo.org/show_bug.cgi?id=205099">http://bugs.gentoo.org/show_bug.cgi?id=205099</a></li> <li><a href="http://bugzilla.kernel.org/show_bug.cgi?id=10893">http://bugzilla.kernel.org/show_bug.cgi?id=10893</a></li> <li><a href="http://alphalinux.org/wiki/index.php/Main_Page">http://lkml.org/lkml/2008/10/29/69</a></li> <li><a href="http://alphalinux.org/wiki/index.php/Main_Page">http://alphalinux.org/wiki/index.php/Main_Page</a></li> </ol> </blockquote> What can we do? I think there are a couple things we need to do, namely: <ul> <li>Consolidate our efforts by consolidating distributions. With as few users as we have, we have fewer developers. There's no use in testing packages on Debian or Fedora when they're already tested in Gentoo.</li> <li>Demand that Alpha remain supported. Projects, including projects integral to the Linux desktop such as X.Org, need to know that we do still use Alpha hardware and that we want to be supported. Make yourself heard in <code>#xorg-devel</code> and appropriate mailing lists.</li> <li>Experienced developers need to take the lead. We understand that it's hard to justify time spent working on Alpha-related issues. We do not ask much. We just ask that you not abandon us.</li> </ul> If we can do these things, we will be on the road to recovery. https://mattst88.com/blog/2008/12/14/The_State_of_Alpha_Linux/ mattst88@gmail.com (Matt Turner) Sat, 13 Dec 2008 00:00:00 -0500 2024-05-02T22:49:18-04:00 Does Anyone Care About Fixing Bugs? As time goes on, alternative architectures like Alpha and PA-RISC slowly lose their userbase. Experienced developers move on to things that interest them more. Emphasis isn't put on fixing bugs for these aging platforms, and the level of support slowly erodes. Eventually a small hardcore userbase is all that is left. The <a href="http://bugs.gentoo.org/">Gentoo Bugzilla</a> showed this effect on the Alpha platform. All nontrivial bugs were left to rot. What's worse, many bugs were so old that the software containing them wasn't even in Portage anymore, yet no one closed the bug report or asked if it was fixed. One, a two-and-a-half-year-old bug about a failing cipher algorithm in <a href="http://mcrypt.sourceforge.net/">libmcrypt</a> caught my eye. I decided I'd give fixing it a shot. The project's KNOWN-BUGS file stated <blockquote>- cast-256 and rc6 do not work properly on Alpha (64 bit) machines</blockquote> Fittingly, <a href="https://bugs.gentoo.org/132356">the bug</a> was filed by a developer who has since retired. An automated test suite included with libmcrypt reported a failing cipher, <a href="https://en.wikipedia.org/wiki/CAST-256">CAST-256</a>. Maybe it's a bug with gcc. Months pass. If it is, it's a bug across both 3.x and 4.x series. Years pass. Maybe we'll just mask the failure. No one seemed to want to fire up vi and check the code. I decided I'd compile the same version side by side on my AMD64 desktop and my UP1500 Alpha. Both compile cleanly, and I can reproduce the failing case quickly. The first thing I check is the test suite itself by adding print statements and comparing the output between the AMD64 and Alpha systems. All the start-up code looks fine. The problem has to be in the library itself, which is what I expect. Finally, I find that the results begin to vary during a function call to _mcrypt_set_key. Continuing, I slowly isolate the failing code to the k_rnd macro, then the f1 macro, and finally to the rotl32 macro. The rotl32 macro rotates bits left in a 32-bit memory cell. The macro and its siblings look like <pre>#define rotl32(x,n) (((x) << ((word32)(n))) | ((x) >> (32 - (word32)(n)))) #define rotr32(x,n) (((x) >> ((word32)(n))) | ((x) << (32 - (word32)(n)))) #define rotl16(x,n) (((x) << ((word16)(n))) | ((x) >> (16 - (word16)(n)))) #define rotr16(x,n) (((x) >> ((word16)(n))) | ((x) << (16 - (word16)(n))))</pre> I confirmed that this function did yield different results on AMD64 and Alpha by writing a small test program. Guessing, I figured that this implementation wasn't compatible with Alphas and that I could easily find another working implementation. In the Linux Kernel's include/linux/bitops.h file, they had virtually the same implementation. No luck there. After a few hours of scouring the internet for quick-fix solutions, I turned to the Alpha Architecture Handbook and look up Alpha's shift instructions, sll and srl. <pre>SxL Ra.rq,Rb.rq,Rc.wq Rc <- LEFT_SHIFT (Rav, Rbv<5:0>) !SLL Rc <- RIGHT_SHIFT(Rav, Rbv<5:0>) !SRL</pre> Beyond the terse syntax, this means that only six bits of the shift argument matter. The designers did this because with the Alpha's 64-bit wide registers, it doesn't make sense to implement instructions (and circuitry) to shift more than 63 times. Just the same, the rotl32 macro is only supposed to operate on 32-bit numbers, so it doesn't make sense to rotate more than 31 times. The result of rotating 32 times should be the same as the number input, since it would rotate the bits the entire width of the field. On Alpha though there are more than 32-bits in each register, so shifting 32 times doesn't leave the bits in place. It moves them into the upper part of the register. By masking the shift argument and ignoring all but the first five bits, I fixed the problem. <pre>#define rotl32(x,n) (((x) << ((word32)(n & 31))) | ((x) >> (32 - (word32)(n & 31)))) #define rotr32(x,n) (((x) >> ((word32)(n & 31))) | ((x) << (32 - (word32)(n & 31)))) #define rotl16(x,n) (((x) << ((word16)(n & 15))) | ((x) >> (16 - (word16)(n & 15)))) #define rotr16(x,n) (((x) >> ((word16)(n & 15))) | ((x) << (16 - (word16)(n & 15))))</pre> This bug didn't affect AMD64, since it has 32-bit shift instructions as well as 64-bit. Undoubtedly though, had this been a problem on AMD64 instead of an obscure and aging architecture such as Alpha, it would have been fixed in a heartbeat. It's amazing that such a simple fix was needed to squash a bug that (1) was reported by a Gentoo/Alpha developer, and (2) had been in the tracker for two-and-a-half years. Now, I need to check on that Kernel code. Who knows how long it's contained this bug! https://mattst88.com/blog/2008/12/13/Does_Anyone_Care_About_Fixing_Bugs?/ mattst88@gmail.com (Matt Turner)