<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:georss="http://www.georss.org/georss" xmlns:thr="http://purl.org/syndication/thread/1.0" version="2.0">
 <channel>
  <title>mattst88's blog</title>
  <link>http://mattst88.com/</link>
  <managingEditor>mattst88@gmail.com (Matt Turner)</managingEditor>
  <item>
   <pubDate>Thu, 17 May 2012 00:00:00 -0400</pubDate>
   <atom:updated>2012-05-29T23:06:16-04:00 </atom:updated>
   <title>Optimizing pixman for Loongson: Process and Results</title>
   <description>   &lt;p&gt;The &lt;a href="http://www.lemote.com/en/products/Notebook/2010/0310/112.html"&gt;Lemote Yeeloong&lt;/a&gt; is a small notebook that is often the computer of choice for Free Software advocates, including Richard Stallman. It's powered by an 800 MHz STMicroelectronics &lt;a href="http://en.wikipedia.org/wiki/Loongson#Loongson_2F"&gt;Loongson 2F&lt;/a&gt; processor and has an antiquated Silicon Motion 712 graphics chip. The SM712's acceleration features are pretty subpar for today's standards, and performance of the old XFree86 Acceleration Architecture (&lt;a href="http://en.wikipedia.org/wiki/XFree86_Acceleration_Architecture"&gt;XAA&lt;/a&gt;) that supports the SM712 has slowly decayed as developers move to support newer hardware and newer acceleration architectures. In short, graphics performance of the SM712 isn't very good with new X servers, so how can we improve it?&lt;/p&gt;
   &lt;p&gt;If you don't care about how pixman was optimized and just want to see the results, you can &lt;a href="#loongson-pixman-results"&gt;skip ahead&lt;/a&gt;.&lt;/p&gt;
   &lt;p&gt;&lt;a href="http://cgit.freedesktop.org/pixman/"&gt;pixman&lt;/a&gt;, the pixel-manipulation library used by &lt;a href="http://cairographics.org/"&gt;cairo&lt;/a&gt; and X has MMX-accelerated compositing functions, written using MMX via C-level intrinsic functions, which allow the programmer to write C but still have fine-grained control over performance sensitive MMX code.&lt;/p&gt;
   &lt;p&gt;Last summer I began optimizing graphics performance of the &lt;a href="http://wiki.laptop.org/go/XO-1.75"&gt;OLPC XO-1.75&lt;/a&gt; laptop. The Marvell processor it uses supports iwMMXt2, a 64-bit SIMD instruction set designed by Intel for their XScale ARM CPUs. The instruction set is predictably very similar to Intel's original MMX instruction set. By design, Intel's MMX intrinsics also support generating iwMMXt instructions, so that the same optimized C code will be easily portable to processors supporting iwMMXt. With a relatively small amount of work (as compared to writing compositing functions in ARM/iwMMXt assembly) I had pixman's MMX optimized code working on the XO-1.75 for some nice performance gains.&lt;/p&gt;
   &lt;p&gt;The Loongson 2F processor also includes a 64-bit SIMD instruction set, very similar to Intel's MMX. Its SIMD instructions use the 32 floating-point registers, and like iwMMXt it provides some useful instructions not found in x86 processors until AMD's Enhanced 3DNow! or Intel's SSE instruction sets.&lt;/p&gt;
   &lt;p&gt;So just like I did with the XO-1.75, I planned to use pixman's existing MMX code to optimize performance on my Yeeloong.&lt;/p&gt;
   &lt;p&gt;While Intel's MMX intrinsic functions are well designed, well tested, well supported, and widely used, the &lt;a href="http://gcc.gnu.org/onlinedocs/gcc/MIPS-Loongson-Built_002din-Functions.html"&gt;Loongson intrinsics&lt;/a&gt; are none of these. In fact, they're incomplete, badly designed, and used no where I can find (indeed, all of the instances of Loongson-optimized SIMD code I have found are written in inline assembly, which is no surprise given the state of the intrinsics). Of course, the gcc manual doesn't tell me this, so I learned it only after trying to use them with pixman.&lt;/p&gt;
   &lt;p&gt;Aside: let me pretend that I'm designing and implementing Loongson's vector intrinsics, covering an instruction set very similar to MMX, which already has an excellent set of intrinsic functions. Why would I create my own incompatible set, instead of implementing the same interface that lots of software already use?!&lt;/p&gt;
   &lt;p&gt;Using the Loongson vector intrinsics, pixman passed the test suite, and objdump verified that gcc was successfully generating vector instructions, but the performance was terrible. gcc apparently was not privy to the knowledge that the integer data types returned by the intrinsics were actually stored in floating point registers, so in between any two vector instructions you might find three or four instructions that simply copied the same data back and forth between integer and floating-point registers.&lt;/p&gt;
   &lt;pre&gt;punpcklwd	$f9,$f9,$f5
    dmtc1	v0,$f8
punpcklwd	$f19,$f19,$f5
    dmfc1	t9,$f9
    dmtc1	v0,$f9
    dmtc1	t9,$f20
    dmfc1	s0,$f19
punpcklbh	$f20,$f20,$f2&lt;/pre&gt;
   &lt;p&gt;This path lead no where, so I decided to take the hint from previous programmers and forget that the Loongson intrinsics exist. I still wanted to use pixman's MMX code, so I implemented Intel's MMX intrinsics myself using Loongson inline assembly. Object code size was significantly smaller and performance was better, in fact &lt;strong&gt;much better&lt;/strong&gt; in some select functions, but overall was still a net loss. There must have been optimization opportunities that I was missing.&lt;/p&gt;
   &lt;p&gt;On the XO-1.75 the MMX code is faster than the generic code, so I didn't recognize inefficiencies in the MMX code the first time I worked with it, but with the Loongson it was necessary that I find and fix them. The great thing is that optimizations to this code benefit x86/MMX, ARM/iwMMXt, and Loongson.&lt;/p&gt;
   &lt;p&gt;I took a look at the book &lt;em&gt;Dirty Pixels&lt;/em&gt; at the suggestion of pixman's maintainer, Søren Sandmann. In it, I discovered that the original MMX instruction set lacked an unsigned packed-multiply high instruction which would be useful for the &lt;em&gt;over&lt;/em&gt; compositing operation. To work around the lack of this instruction, an extra two shifts and an add had to be used. AMD recognized this inefficiency and added the instruction in Enhanced 3DNow! and later Intel did the same with SSE. I &lt;a href="http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9"&gt;modified&lt;/a&gt; the &lt;span class="inline-code"&gt;pix_multiply&lt;/span&gt; function to use the new instruction, and the resulting object code size shrunk by 5%.&lt;/p&gt;
   &lt;p&gt;I realized that the &lt;span class="inline-code"&gt;expand_alpha&lt;/span&gt;, &lt;span class="inline-code"&gt;expand_alpha_rev&lt;/span&gt;, and &lt;span class="inline-code"&gt;invert_colors&lt;/span&gt; functions that mix and duplicate pixel components could be reduced from a combined total of around 30 instruction to a single instruction each. &lt;a href="http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e"&gt;This change&lt;/a&gt; further reduced object code size by another 9%.&lt;/p&gt;
   &lt;p&gt;After that, I focused on eliminating unnecessary copies to and from the vector registers. Consider this code:&lt;/p&gt;
   &lt;pre&gt;__m64 vsrc = load8888 (*src);&lt;/pre&gt;
   &lt;p&gt;The code loads &lt;span class="inline-code"&gt;*src&lt;/span&gt; into an integer register, and then &lt;span class="inline-code"&gt;load8888&lt;/span&gt; loads and expands the value into a vector register. Instead, it's simpler and faster to load from memory into a vector register directly. By counting the number of &lt;span class="inline-code"&gt;dmfc1&lt;/span&gt; (doubleword move from floating-point) and &lt;span class="inline-code"&gt;dmtc1&lt;/span&gt; (doubleword move to floating-point) instructions I could determine which functions.&lt;/p&gt;
   &lt;p&gt;After reducing the number of unnecessary copies and adding a number of other optimizations (list available &lt;a href="http://lists.freedesktop.org/archives/pixman/2012-May/001946.html"&gt;here&lt;/a&gt;) I was ready to see if the Yeeloong was more usable.&lt;/p&gt;
   &lt;a id="loongson-pixman-results"&gt;&lt;/a&gt;
   &lt;p&gt;Results gathered from cairo's perf-trace tool confirm the real-world performance improvements given by the pixman optimizations. The &lt;em&gt;image&lt;/em&gt; columns show the time in seconds to complete a cairo-perf-trace workload when using 32 bits per pixel and likewise &lt;em&gt;image16&lt;/em&gt; for 16 bits per pixel. The first column in both &lt;em&gt;image&lt;/em&gt; and &lt;em&gt;image16&lt;/em&gt; groupings is the time to complete the workload without using Loongson MMI code. The second column is time to complete the workload after pixman commit &lt;a href="http://cgit.freedesktop.org/pixman/commit/?id=c78e986085"&gt;c78e986085&lt;/a&gt;, the commit that turns on the Loongson MMI code. The third column is the time to complete the workload with pixman-0.25.6 which has many more optimizations.&lt;/p&gt;
   &lt;table style="text-align:right;"&gt;
    &lt;tr&gt;&lt;td&gt;&#160;&lt;/td&gt;&lt;th style="text-align:center;" colspan="4"&gt;image&lt;/th&gt;&lt;th style="text-align:center;" colspan="4"&gt;image16&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;evolution&lt;/td&gt;&lt;td&gt;32.985&lt;/td&gt;&lt;td&gt;29.667&lt;/td&gt;&lt;td&gt;28.752&lt;/td&gt;&lt;th&gt;14.7% faster&lt;/th&gt;&lt;td&gt;27.314&lt;/td&gt;&lt;td&gt;23.870&lt;/td&gt;&lt;td&gt;22.960&lt;/td&gt;&lt;th&gt;19.0% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;firefox-planet-gnome&lt;/td&gt;&lt;td&gt;197.982&lt;/td&gt;&lt;td&gt;180.437&lt;/td&gt;&lt;td&gt;169.532&lt;/td&gt;&lt;th&gt;16.8% faster&lt;/th&gt;&lt;td&gt;220.986&lt;/td&gt;&lt;td&gt;205.057&lt;/td&gt;&lt;td&gt;199.077&lt;/td&gt;&lt;th&gt;11.0% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;gnome-terminal-vim&lt;/td&gt;&lt;td&gt;60.799&lt;/td&gt;&lt;td&gt;50.528&lt;/td&gt;&lt;td&gt;50.792&lt;/td&gt;&lt;th&gt;19.7% faster&lt;/th&gt;&lt;td&gt;51.655&lt;/td&gt;&lt;td&gt;44.131&lt;/td&gt;&lt;td&gt;43.561&lt;/td&gt;&lt;th&gt;18.6% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;gvim&lt;/td&gt;&lt;td&gt;38.646&lt;/td&gt;&lt;td&gt;32.552&lt;/td&gt;&lt;td&gt;33.570&lt;/td&gt;&lt;th&gt;15.1% faster&lt;/th&gt;&lt;td&gt;38.126&lt;/td&gt;&lt;td&gt;34.453&lt;/td&gt;&lt;td&gt;35.457&lt;/td&gt;&lt;th&gt;7.5% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;ocitysmap&lt;/td&gt;&lt;td&gt;23.065&lt;/td&gt;&lt;td&gt;18.057&lt;/td&gt;&lt;td&gt;17.516&lt;/td&gt;&lt;th&gt;31.7% faster&lt;/th&gt;&lt;td&gt;23.046&lt;/td&gt;&lt;td&gt;18.055&lt;/td&gt;&lt;td&gt;17.543&lt;/td&gt;&lt;th&gt;31.4% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;poppler&lt;/td&gt;&lt;td&gt;43.676&lt;/td&gt;&lt;td&gt;36.077&lt;/td&gt;&lt;td&gt;35.498&lt;/td&gt;&lt;th&gt;23.0% faster&lt;/th&gt;&lt;td&gt;43.065&lt;/td&gt;&lt;td&gt;36.090&lt;/td&gt;&lt;td&gt;35.534&lt;/td&gt;&lt;th&gt;21.2% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;swfdec-giant-steps&lt;/td&gt;&lt;td&gt;20.166&lt;/td&gt;&lt;td&gt;20.365&lt;/td&gt;&lt;td&gt;20.469&lt;/td&gt;&lt;td&gt;no change&lt;/td&gt;&lt;td&gt;22.354&lt;/td&gt;&lt;td&gt;16.578&lt;/td&gt;&lt;td&gt;14.473&lt;/td&gt;&lt;th&gt;54.4% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;swfdec-youtube&lt;/td&gt;&lt;td&gt;31.502&lt;/td&gt;&lt;td&gt;28.118&lt;/td&gt;&lt;td&gt;24.168&lt;/td&gt;&lt;th&gt;30.3% faster&lt;/th&gt;&lt;td&gt;44.052&lt;/td&gt;&lt;td&gt;41.771&lt;/td&gt;&lt;td&gt;38.577&lt;/td&gt;&lt;th&gt;14.2% faster&lt;/th&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;xfce4-terminal-a1&lt;/td&gt;&lt;td&gt;69.517&lt;/td&gt;&lt;td&gt;51.288&lt;/td&gt;&lt;td&gt;50.838&lt;/td&gt;&lt;th&gt;36.7% faster&lt;/th&gt;&lt;td&gt;62.225&lt;/td&gt;&lt;td&gt;53.309&lt;/td&gt;&lt;td&gt;44.297&lt;/td&gt;&lt;th&gt;40.5% faster&lt;/th&gt;&lt;/tr&gt;
   &lt;/table&gt;
   &lt;p&gt;May 29th edit: the &lt;em&gt;% faster&lt;/em&gt; numbers were previously calculated as a percent difference between the initial workload times and the final workload times. I realized that this calculation's result is not strictly a metric of how much faster the code is. To calculate that, the new formula is (1/initial - 1/final) / (1/initial)) which calculates the percent difference in terms of operations/second. This number is &lt;em&gt;% faster&lt;/em&gt;. The table has been updated accordingly.&lt;/p&gt;
   &lt;p&gt;As the results show, real-world performance is improved by the Loongson MMI code. I can tell a difference when using GNOME 3 (in fallback mode) on my Yeeloong.&lt;/p&gt;
   &lt;p&gt;So far this has been very successful. I've optimized pixman on an interesting platform, learned a new instruction set, and in the process found many opportunities to optimize the MMX code on x86 and ARM. I still see a bunch of things to work on with just these compositing operations alone. Beyond that, there are many other things to do like bilinear and nearest scaling functions (which are extremely important for Firefox performance). And beyond &lt;em&gt;that&lt;/em&gt;, I've improved my understanding of pixman's code and have a few ideas for improvements in general.&lt;/p&gt;
   &lt;p&gt;Thanks to
    &lt;ul&gt;
     &lt;li&gt;Danny Clark, who runs &lt;a href="http://freedomincluded.com/"&gt;freedomincluded.com&lt;/a&gt;, for providing me with a Lemote Yeeloong laptop for my work on Gentoo's MIPS port.&lt;/li&gt;
     &lt;li&gt;Søren Sandmann and Siarhei Siamashka for reviewing and helping me improve my code.&lt;/li&gt;
    &lt;/ul&gt;
   &lt;/p&gt;
</description>
   <link>http://mattst88.com/blog/2012/05/17/Optimizing pixman for Loongson: Process and Results/</link>
   <author>mattst88@gmail.com (Matt Turner)</author>
  </item>
  <item>
   <pubDate>Tue, 02 Aug 2011 00:00:00 -0400</pubDate>
   <atom:updated>2011-08-16T13:04:06-04:00 </atom:updated>
   <title>New multilib N32 Gentoo MIPS Stages</title>
   <description>   &lt;p&gt;Gentoo/MIPS has been in, well, &lt;a href="https://bugs.gentoo.org/show_bug.cgi?id=348653"&gt;not great shape&lt;/a&gt; for quite some time. When I was going through Gentoo recruitment, there were no stages (used for installing Gentoo) newer than 2008, so this was one of the main things I wanted to improve, specifically by creating new N32 ABI stages. Even though the N32 (meaning New 32-bit) ABI was introduced in IRIX in 1996 to replace SGI's o32 (Old 32-bit) ABI, Linux support for N32 has lagged behind until the last few years. Now, I'm pleased to unofficially announce new multilib N32 stages and that we'll be supporting as the preferred ABI.&lt;/p&gt;
   &lt;p&gt;MIPS has three main ABIs: o32 (32-bit integer and pointer), N32 (64-bit integer, 32-bit pointer), N64 (64-bit integer and pointer). Compared with N32 and N64, o32 is very restrictive. Very few function arguments are passed in registers; only half the number of floating point registers are usable; no native 64-bit integer datatype; no long double type. (see &lt;a href="http://techpubs.sgi.com/library/manuals/2000/007-2816-005/pdf/007-2816-005.pdf"&gt;SGI's MIPSpro N32 ABI Handbook&lt;/a&gt; for details). Offering N32 as the default ABI means better performance, sometimes 30% more, just by removing the unnecessary restrictions a 32-bit ABI imposes on 64-bit CPUs. Providing multilib stages (ie, stages with glibc and gcc built for all three ABIs) gives the user flexibility to switch to another ABI relatively easily if desired, while also allowing him to reduce build times by switching to an N32-only profile.&lt;/p&gt;
   &lt;p&gt;The process of creating N32 (and especially multilib) stages wasn't straight forward. Our profiles were long unmaintained and in many cases totally broken. There were lots of keywording bugs open for mips, many where the MIPS was the last team to complete the request by years. There were actually some real bugs discovered too, like &lt;a href="https://bugs.gentoo.org/show_bug.cgi?id=354877"&gt;354877&lt;/a&gt; and &lt;a href="https://bugs.gentoo.org/show_bug.cgi?id=358149"&gt;358149&lt;/a&gt;, usually caused by the incorrect assumption that the &lt;em&gt;lib&lt;/em&gt; directory is always a symlink to &lt;em&gt;lib32&lt;/em&gt;. All in all, I've reduced the number of open bugs for MIPS down to ~20.&lt;/p&gt;
   &lt;p&gt;Work needed to be done to &lt;a href="http://www.gentoo.org/proj/en/releng/catalyst/"&gt;catalyst&lt;/a&gt;, Gentoo's release building tool. Since the end of June, I've made &lt;a href="http://git.overlays.gentoo.org/gitweb/?p=proj/catalyst.git;a=summary"&gt;15 commits&lt;/a&gt; cleaning, fixing, and adding to the mips support code in catalyst. Other developers like &lt;a href="http://blog.hartwork.org/"&gt;Sebastian Pipping&lt;/a&gt; have also resumed work on a project that had otherwise been minimally maintained since the beginning of the year.&lt;/p&gt;
   &lt;p&gt;The last major component in reviving Gentoo's MIPS support is to create installation media, preferably in an automated manner. I've acquired two &lt;a href="http://mattst88.com/computers/bcm91250a/"&gt;Broadcom BCM91250A&lt;/a&gt; MIPS development boards (and should be receiving a third soon), but they need disks, controllers, RAM, and cases. For that, I wrote a funding &lt;a href="https://bugs.gentoo.org/attachment.cgi?id=281843"&gt;Proposal to build three MIPS development computers (pdf)&lt;/a&gt; and had it approved by the Gentoo Foundation. Things seem to be going well in acquisitions (&lt;a href="https://bugs.gentoo.org/show_bug.cgi?id=373241"&gt;track progress&lt;/a&gt;) so I hope to have the project completed in the next few months with the systems automatically building stages for a wide variety of MIPS systems.&lt;/p&gt;
   &lt;p&gt;Initially, I used a big-endian 2006.1 N32 stage and had to bootstrap my system with a series of at least 20 hacks (not a fun experience) until it was usable enough that I was able to build a clean N32 stage. From there, using crossdev I built a multilib toolchain, and with a few more hacks I was able to build a multilib stage.&lt;/p&gt;
   &lt;p&gt;With that in the past, I've been building stages that can be used to seed the automated stage creation system to come. At this point, my TODO list looks like this:&lt;/p&gt;
   &lt;ul&gt;
    &lt;li&gt;Big Endian
     &lt;ul&gt;
      &lt;li&gt;multilib
       &lt;ul&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;font-weight:bold;"&gt;(done)&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips3 -mfix-r4000 -mfix-r4400&lt;/span&gt; (for SGI Indy and Indigo2)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;font-weight:bold;"&gt;(done)&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips4&lt;/span&gt; (for SGI Indy and O2)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;font-weight:bold;"&gt;(done)&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;r10k&lt;/span&gt; (for SGI Indigo2, Octane)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;font-weight:bold;"&gt;(done)&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips64&lt;/span&gt; (for Broadcom Sibyte systems)&lt;/li&gt;
       &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;o32
       &lt;ul&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips32&lt;/span&gt; (for embedded mips systems)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips1&lt;/span&gt; (for everything else)&lt;/li&gt;
       &lt;/ul&gt;
      &lt;/li&gt;
     &lt;/ul&gt;
    &lt;/li&gt;
    &lt;li&gt;Little Endian
     &lt;ul&gt;
      &lt;li&gt;multilib
       &lt;ul&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips3 -Wa,-mfix-loongson2f-nop&lt;/span&gt; (for Loongson 2 systems)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips4&lt;/span&gt; (for Cobalt systems)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips64&lt;/span&gt; (for Loongson 3, Broadcom Sibyte systems)&lt;/li&gt;
       &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;o32
       &lt;ul&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips32&lt;/span&gt; (for embedded mips systems)&lt;/li&gt;
        &lt;li&gt;&lt;span style="width:3.5em;display:inline-block;"&gt;&#160;&lt;/span&gt; &lt;span style="width:15em;display:inline-block;"&gt;mips1&lt;/span&gt; (for everything else)&lt;/li&gt;
       &lt;/ul&gt;
      &lt;/li&gt;
     &lt;/ul&gt;
    &lt;/li&gt;
   &lt;/ul&gt;
   &lt;p&gt;The final touches will be to create bootable media like CD, USB, and netboot images.&lt;/p&gt;
   &lt;p&gt;All stages are available in the experimental/mips/stages/ directory (as soon as the files propagate) of a &lt;a href="http://www.gentoo.org/main/en/mirrors2.xml"&gt;Gentoo Mirror&lt;/a&gt;.&lt;/p&gt;
   &lt;p&gt;Hopefully by the time I'm able to convince Lemote (or, who?) to send me a &lt;a href="http://en.wikipedia.org/wiki/Loongson#Loongson_3A_laptop"&gt;Loongson 3A laptop&lt;/a&gt;, installing and using Gentoo/MIPS will be a fun and pleasant experience.&lt;/p&gt;
</description>
   <link>http://mattst88.com/blog/2011/08/02/New multilib N32 Gentoo MIPS Stages/</link>
   <author>mattst88@gmail.com (Matt Turner)</author>
  </item>
  <item>
   <pubDate>Sun, 14 Dec 2008 00:00:00 -0500</pubDate>
   <atom:updated>2011-08-16T13:04:06-04:00 </atom:updated>
   <title>The State of Alpha Linux</title>
   <description>   &lt;p&gt;Software is never finished; it's forgotten. There is always one more enhancement to be made or one little quirk to work out. Sometimes there are even big problems. It happens from time to time. It's expected, and it's expected that the problems will be fixed. After spending quite a bit of time recently working with Linux on the Alpha platform, I've come to realize we face some very serious problems. And unfortunately, these may not ever be fixed, putting in jeopardy the future (&lt;em&gt;hah!&lt;/em&gt;) of Alpha/Linux. I decided to articulate these problems in an email to the &lt;a href="http://www.redhat.com/mailman/listinfo/axp-list"&gt;Linux on Alpha Processors&lt;/a&gt; mailing list in order to inform and ultimately find solutions and breathe a bit of life back into Alpha/Linux. I'd like to think that Alpha/Linux isn't a piece of forgotten software, not yet.&lt;/p&gt;
   &lt;div class="quote"&gt;
     &lt;h3&gt;The State of Alpha Linux&lt;/h3&gt;
     &lt;p&gt;We're all subscribed to this list because we use a dying platform. We do what we can to keep it going, but in recent months the State of Alpha Linux has been deteriorating at an accelerated rate.&lt;/p&gt;
     &lt;p&gt;Let me outline some issues facing us today:&lt;/p&gt;
     &lt;ol&gt;
      &lt;li&gt;We have no glibc/Alpha maintainer [1]&lt;/li&gt;
      &lt;li&gt;Kernel development for Alpha is comatose&lt;/li&gt;
      &lt;li&gt;We can't run modern X.Org [2]&lt;/li&gt;
     &lt;/ol&gt;
     &lt;p&gt;To make things worse, for such a small group of users, we're much too segregated and disorganized. For instance, how many (of the only four) Gentoo/Alpha maintainers are subscribed to this list? Debian/Alpha? How many realized we were without a glibc maintainer? That we can't use X.Org 7.4?&lt;/p&gt;
     &lt;p&gt;If this trend continues, we will completely first lose X.Org support. I even had an X.Org developer tell me he didn't care [about Alpha support] when I pinged him about an Alpha bug he had originally filed [3]!&lt;/p&gt;
     &lt;p&gt;We'll later lose glibc support. As it stands now, Alpha isn't even in the main tree [4]. I'm not sure what version Debian ships, but Gentoo is 3 versions behind at 2.6.1. Newer than that and the test suite causes a hard lock [5]. How much longer is it going to be before 2.6 is incompatible with the latest version and we begin to lose the ability to use other modern software?&lt;/p&gt;
     &lt;p&gt;While we may never lose kernel support, it will certainly begin to lag behind other platforms more and more. Bugs begin to take longer and longer to be fixed [6]. Release candidate kernels as late in the cycle as rc-8 of the 2.6.28 series fail to compile on Alpha [7]. This is definitely a worrying sign.&lt;/p&gt;
     &lt;p&gt;It is certainly expected that as a platform ages, it slowly loses its users and developers. In 1999, many average users knew or we're interested in learning Alpha assembly language, were interested in support for Alpha among Free Software, and were interested in programming for the platform. Obviously this cannot be the case today. We don't expect that it should.&lt;/p&gt;
     &lt;p&gt;We, the ones who do wish to see our platform live on, even if only a little longer, should focus on fixing what we can and maintaining what we already have.&lt;/p&gt;
     &lt;p&gt;Whether Fedora adds Alpha as a Second Tier Architecture is trivial in comparison to these issues. We should focus on making sure we have working software for Fedora/Alpha before we consider how to properly market it.&lt;/p&gt;
     &lt;p&gt;We, the small band of Alpha users, need to work together. We have the same problems, why should we work separately on them?&lt;/p&gt;
     &lt;p&gt;In order to facilitate better communication among Alpha users, developers, please use the Alpha IRC channel on Freenode, #alpha, and the Wiki [8]. If you have unused hardware that may be useful to developers, consider donating it.&lt;/p&gt;
     &lt;p&gt;From here, it's up to us to find solutions to these problems.&lt;/p&gt;
     &lt;p&gt;Ideas and Suggestions requested.&lt;/p&gt;
     &lt;p&gt;Matt Turner&lt;/p&gt;
     &lt;ol&gt;
      &lt;li&gt;&lt;a href="http://sourceware.org/ml/libc-alpha/2008-12/msg00009.html"&gt;http://sourceware.org/ml/libc-alpha/2008-12/msg00009.html&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://bugs.freedesktop.org/show_bug.cgi?id=17801"&gt;http://bugs.freedesktop.org/show_bug.cgi?id=17801&lt;/a&gt;&lt;br /&gt;
          &lt;a href="http://bugs.freedesktop.org/show_bug.cgi?id=17801"&gt;http://bugzilla.kernel.org/show_bug.cgi?id=10893&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://bugs.freedesktop.org/show_bug.cgi?id=19026"&gt;http://bugs.freedesktop.org/show_bug.cgi?id=19026&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=6896"&gt;http://sources.redhat.com/bugzilla/show_bug.cgi?id=6896&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;Actually a kernel problem,&lt;br /&gt;
          &lt;a href="http://bugs.gentoo.org/show_bug.cgi?id=205099"&gt;http://bugs.gentoo.org/show_bug.cgi?id=205099&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://bugzilla.kernel.org/show_bug.cgi?id=10893"&gt;http://bugzilla.kernel.org/show_bug.cgi?id=10893&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://alphalinux.org/wiki/index.php/Main_Page"&gt;http://lkml.org/lkml/2008/10/29/69&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href="http://alphalinux.org/wiki/index.php/Main_Page"&gt;http://alphalinux.org/wiki/index.php/Main_Page&lt;/a&gt;&lt;/li&gt;
     &lt;/ol&gt;
    &lt;/div&gt;
    &lt;p&gt;What can we do? I think there are a couple things we &lt;em&gt;need&lt;/em&gt; to do, namely:&lt;/p&gt;
    &lt;ul&gt;
     &lt;li&gt;&lt;strong&gt;Consolidate our efforts by consolidating distributions.&lt;/strong&gt; With as few users as we have, we have fewer developers. There's no use in testing packages on Debian or Fedora when they're already tested in Gentoo.&lt;/li&gt;
     &lt;li&gt;&lt;strong&gt;Demand that Alpha remain supported.&lt;/strong&gt; Projects, including projects integral to the Linux desktop such as X.Org, need to know that we do still use Alpha hardware and that we want to be supported. Make yourself heard in #xorg-devel and appropriate mailing lists.&lt;/li&gt;
     &lt;li&gt;&lt;strong&gt;Experienced developers need to take the lead.&lt;/strong&gt; We understand that it's hard to justify time spent working on Alpha-related issues. We do not ask much. We just ask that you not abandon us.&lt;/li&gt;
    &lt;/ul&gt;
    &lt;p&gt;If we can do these things, we will be on the road to recovery.&lt;/p&gt;
</description>
   <link>http://mattst88.com/blog/2008/12/14/The State of Alpha Linux/</link>
   <author>mattst88@gmail.com (Matt Turner)</author>
  </item>
  <item>
   <pubDate>Sat, 13 Dec 2008 00:00:00 -0500</pubDate>
   <atom:updated>2012-11-20T23:39:23-05:00 </atom:updated>
   <title>Does Anyone Care About Fixing Bugs?</title>
   <description>   &lt;p&gt;As time goes on, alternative architectures like Alpha and PA-RISC slowly lose their userbase. Experienced developers move on to things that interest them more. Emphasis isn't put on fixing bugs for these aging platforms, and the level of support slowly erodes. Eventually a small hardcore userbase is all that is left. The &lt;a href="http://bugs.gentoo.org/"&gt;Gentoo Bugzilla&lt;/a&gt; showed this effect on the Alpha platform. All nontrivial bugs were left to rot. What's worse, many bugs were so old that the software containing them wasn't even in Portage anymore, yet no one closed the bug report or asked if it was fixed. One, a two-and-a-half-year-old bug about a failing cipher algorithm in &lt;a href="http://mcrypt.sourceforge.net/"&gt;libmcrypt&lt;/a&gt; caught my eye. I decided I'd give fixing it a shot.&lt;/p&gt;
   &lt;p&gt;The project's &lt;em&gt;KNOWN-BUGS&lt;/em&gt; file stated&lt;/p&gt;
   &lt;div class="quote"&gt;- cast-256 and rc6 do not work properly on Alpha (64 bit) machines&lt;/div&gt;
   &lt;p&gt;Fittingly, &lt;a href="https://bugs.gentoo.org/132356"&gt;the bug&lt;/a&gt; was filed by a developer who has since retired. An automated test suite included with libmcrypt reported a failing cipher, &lt;a href="http://en.wikipedia.org/wiki/CAST-256"&gt;CAST-256&lt;/a&gt;. Maybe it's a bug with gcc. Months pass. If it is, it's a bug across both 3.x and 4.x series. Years pass. Maybe we'll just mask the failure.&lt;/p&gt;
   &lt;p&gt;No one seemed to want to fire up &lt;em&gt;vi&lt;/em&gt; and check the code.&lt;/p&gt;
   &lt;p&gt;I decided I'd compile the same version side by side on my AMD64 desktop and my UP1500 Alpha. Both compile cleanly, and I can reproduce the failing case quickly. The first thing I check is the test suite itself by adding print statements and comparing the output between the AMD64 and Alpha systems. All the start-up code looks fine. The problem has to be in the library itself, which is what I expect.&lt;/p&gt;
   &lt;p&gt;Finally, I find that the results begin to vary during a function call to &lt;em&gt;_mcrypt_set_key&lt;/em&gt;. Continuing, I slowly isolate the failing code to the &lt;em&gt;k_rnd&lt;/em&gt; macro, then the &lt;em&gt;f1&lt;/em&gt; macro, and finally to the &lt;em&gt;rotl32&lt;/em&gt; macro.&lt;/p&gt;
   &lt;p&gt;The &lt;em&gt;rotl32&lt;/em&gt; macro rotates bits left in a 32-bit memory cell. The macro and its siblings look like&lt;/p&gt;
   &lt;pre&gt;#define rotl32(x,n)   (((x) &lt;&lt; ((word32)(n))) | ((x) &gt;&gt; (32 - (word32)(n))))
#define rotr32(x,n)   (((x) &gt;&gt; ((word32)(n))) | ((x) &lt;&lt; (32 - (word32)(n))))
#define rotl16(x,n)   (((x) &lt;&lt; ((word16)(n))) | ((x) &gt;&gt; (16 - (word16)(n))))
#define rotr16(x,n)   (((x) &gt;&gt; ((word16)(n))) | ((x) &lt;&lt; (16 - (word16)(n))))&lt;/pre&gt;
   &lt;p&gt;I confirmed that this function did yield different results on AMD64 and Alpha by writing a small test program. Guessing, I figured that this implementation wasn't compatible with Alphas and that I could easily find another working implementation. In the Linux Kernel's &lt;em&gt;include/linux/bitops.h&lt;/em&gt; file, they had virtually the same implementation. No luck there.&lt;/p&gt;
   &lt;p&gt;After a few hours of scouring the internet for quick-fix solutions, I turned to the Alpha Architecture Handbook and look up Alpha's shift instructions, &lt;em&gt;sll&lt;/em&gt; and &lt;em&gt;srl&lt;/em&gt;.&lt;/p&gt;
   &lt;pre&gt;SxL   Ra.rq,Rb.rq,Rc.wq
Rc &lt;- LEFT_SHIFT (Rav, Rbv&lt;5:0&gt;)      !SLL
Rc &lt;- RIGHT_SHIFT(Rav, Rbv&lt;5:0&gt;)      !SRL&lt;/pre&gt;
   &lt;p&gt;Beyond the terse syntax, this means that only six bits of the shift argument matter. The designers did this because with the Alpha's 64-bit wide registers, it doesn't make sense to implement instructions (and circuitry) to shift more than 63 times. Just the same, the &lt;em&gt;rotl32&lt;/em&gt; macro is only supposed to operate on 32-bit numbers, so it doesn't make sense to rotate more than 31 times.&lt;/p&gt;
   &lt;p&gt;The result of rotating 32 times should be the same as the number input, since it would rotate the bits the entire width of the field. On Alpha though there are more than 32-bits in each register, so shifting 32 times doesn't leave the bits in place. It moves them into the upper part of the register.&lt;/p&gt;
   &lt;p&gt;By masking the shift argument and ignoring all but the first five bits, I fixed the problem.&lt;/p&gt;
   &lt;pre&gt;#define rotl32(x,n)   (((x) &lt;&lt; ((word32)(n &amp; 31))) | ((x) &gt;&gt; (32 - (word32)(n &amp; 31))))
#define rotr32(x,n)   (((x) &gt;&gt; ((word32)(n &amp; 31))) | ((x) &lt;&lt; (32 - (word32)(n &amp; 31))))
#define rotl16(x,n)   (((x) &lt;&lt; ((word16)(n &amp; 15))) | ((x) &gt;&gt; (16 - (word16)(n &amp; 15))))
#define rotr16(x,n)   (((x) &gt;&gt; ((word16)(n &amp; 15))) | ((x) &lt;&lt; (16 - (word16)(n &amp; 15))))&lt;/pre&gt;
   &lt;p&gt;This bug didn't affect AMD64, since it has 32-bit shift instructions as well as 64-bit. Undoubtedly though, had this been a problem on AMD64 instead of an obscure and aging architecture such as Alpha, it would have been fixed in a heartbeat.&lt;/p&gt;
   &lt;p&gt;It's amazing that such a simple fix was needed to squash a bug that (1) was reported by a Gentoo/Alpha developer, and (2) had been in the tracker for two-and-a-half years.&lt;/p&gt;
   &lt;p&gt;Now, I need to check on that Kernel code. Who knows how long it's contained this bug!&lt;/p&gt;
</description>
   <link>http://mattst88.com/blog/2008/12/13/Does Anyone Care About Fixing Bugs?/</link>
   <author>mattst88@gmail.com (Matt Turner)</author>
  </item>
  <item>
   <pubDate>Wed, 16 Jul 2008 00:00:00 -0400</pubDate>
   <atom:updated>2011-08-16T13:04:06-04:00 </atom:updated>
   <title>Open Sourcing Compaq's Alpha Tools for Linux</title>
   <description>   &lt;p&gt;I'm in the (long and arduous) process of becoming a Gentoo/Alpha developer. This involves, firstly, becoming an &lt;em&gt;Architecture Tester&lt;/em&gt;. This in turn, requires that I complete a quiz, mail it to the head of the Alpha team and wait. The developer who manages the alpha arch testers, of which there are none, currently, has been missing in action for 18 days.&lt;/p&gt;
   &lt;p&gt;Once I prove myself worthy, or whatnot, I'll then complete a longer version of the arch tester quiz. Once my quiz has been reviewed, I'll be mentored for 30 days. After this 30 day period, I'll officially be a Gentoo/Alpha developer.&lt;/p&gt;
   &lt;p&gt;At this point, I'll try to get Compaq's alpha compilers and libraries back into portage in one fashion or another. libots is already in portage. The math library, libcpml, &lt;a href="/computers/ds20l/benchmarks.php"&gt;whose value I showed&lt;/a&gt; will be next on my agenda. Compaq's C, C++, and FORTRAN compilers, having been unmaintained for years now, have quite a few incompatibilities with modern Linux distributions. I think the best course of action for these is to get them into a portage overlay.&lt;/p&gt;
   &lt;p&gt;Compaq's alpha optimized compilers, unmaintained and bit rotting, still hold a wealth of knowledge. Compaq's cc still produces code &lt;em&gt;much&lt;/em&gt; better than gcc in certain cases. libcpml and libots have incredibly optimized routines for all sorts of common functions. It's such a shame for them to rot as they're doing now.&lt;/p&gt;
   &lt;p&gt;With this attitude, I emailed Linda Knippers of HP, who in some capacity orcestrated the &lt;a href="http://www.hp.com/hpinfo/newsroom/press/2008/080623a.html?mtxs=rss-corp-news"&gt;AdvFS code release&lt;/a&gt; under the GPL.&lt;/p&gt;
   &lt;div class="quote"&gt;
    &lt;p&gt;Hi Linda,&lt;/p&gt;
    &lt;p&gt;There are a few tools and libraries Compaq/HP has provided for Alpha Linux in the past that have been totally unmaintained in the last few years but still hold great value and knowledge to the open source community. I'm emailing to (1) see if there's any possibility that they may be open sourced in the future, or (2) be directed to contact someone more appropriate.&lt;/p&gt;
    &lt;p&gt;The tools and libraries in question are Compaq's Alpha-Optimized C, C++, and FORTRAN compilers, libots, libcpml, libcxml, and ladebug. Other Tru64 tools that we Alpha Linux users would love to use include spike and pixie.&lt;/p&gt;
    &lt;p&gt;Do you know of any plans to release any code for any (hopefully all) of these products? It seems a shame to have them bit rot when GNU Compiler Developers could learn a thing or two from ccc, and when the GNU libc Developers could incorporate libcpml routines into their math library and so forth.&lt;/p&gt;
    &lt;p&gt;Thank you for your time,&lt;/p&gt;
    &lt;p&gt;Matt Turner&lt;/p&gt;
   &lt;/div&gt;
   &lt;p&gt;She kindly responded (9 minutes later!) and told me that although she didn't know of any plans to open source these products, that it may be something HP would consider. She forwareded my mail to a former Tru64 developer to get his thoughts.&lt;/p&gt;
   &lt;p&gt;While I don't have code in hand (yet), it appears that they are at minimum receptive to this idea. Hopefully in the near future we'll be reading through Digital/Compaq's code, learning more about alpha optimization, and implementing what we learn in gcc and glibc.&lt;/p&gt;
   &lt;p&gt;I'll definitely post back any new information I find out.&lt;/p&gt;
</description>
   <link>http://mattst88.com/blog/2008/07/16/Open Sourcing Compaq's Alpha Tools for Linux/</link>
   <author>mattst88@gmail.com (Matt Turner)</author>
  </item>
 </channel>
</rss>