WQUXGA a.k.a. OMGWTF – IBM T221 3840×2400 204dpi Monitor – Part 6: Regressing Drivers and Xen

I recently built a new machine, primarily because I got fed up of having to stop what I’m working on and reboot from Linux into Windows whenever my friends and/or family invited me to join them in a Borderlands 2 session. Unfortunately, my old machine was just a tiny bit too old (Intel X38 based) to have full, bug-free VT-d/IOMMU support required for VGA passthrough to work, so after 5 years, I finally decided it was time to rectify this. More on this in another article, but the important point I am getting to is that VGA passthrough requires a recent version of Xen. And there this part of the story really begins.

Some of you may have figured out that RHEL derivatives are my Linux distribution of choice (RedSleeve was a big hint). Unfortunately, RedHat have dropped support for Xen Dom0 kernels in EL6, but thankfully, other people have picked up the torch and provide a set of up to date, supported Xen Dom0 kernels and packages for EL6. So far so good. But it was never going to be so simple, at a time when drivers are getting increasingly dumber, feature sparse and more bloated at the same time. That is really what this story is about.

For a start, a few details about the system setup that I am using, and have been using for years.

  • I am a KDE, rather than Gnome user. EL6 comes with KDE 4, which use X RandR rather than Xinerama extensions to establish the geometry of the screen layout. This isn’t a problem in itself, but there is no way to override whatever RandR reports, so on a T221 you end up with a regular desktop on half of the T221, and an empty desktop on the other, which looks messy and unnatural.
  1. EL6 had had a Xorg package update that bumped the ABI version to from 10 to 11
  2. Nvidia drivers have changed the way TwinView works after version 295.x (TwinView option in xorg.conf is no longer recognized)
  3. Nvidia drivers 295.x do not support Xorg ABI v11.
  4. Nvidia kernel drivers 295.x do not build against kernels 3.8.x.

And therein lies the complication.

Nvidia drivers v295 when used with options TwinView and NoTwinViewXineramaInfo also seem to override RandR geometry to the show there is a single, large screen available, rather than two screens. This is exactly what we want when using the T221. Drivers after 295.x (304.x seems to be the next version), don’t recognize the TwinView configuration option, and while they provide Xinerama geometry override when using the NoTwinViewXineramaInfo option, they do not override RandR information any more. This means that you end up with a desktop that looks as you would expect it to if you used two separate monitors (e.g. status bar is only on the first screen, no wallpaper stretch, etc.), rather than a single, seamless desktop.

As you can see, there is a large compound issue in play here. We cannot use the 295.x drivers, because

  1. They don’t support Xorg ABI 11 – this can be solved by downgrading the xorg-x11-server-* and xorg-x11-drv-* packages to an older version (1.10 from EL 6.3). Easily enough done – just make sure you add xorg-x11-* to your exclude line in /etc/yum.conf after downgrading to avoid accidentally updating them in the future.
  2. They don’t build against 3.8.x kernels (which is what the Xen kernel I am using is – this is regardless of the long standing semi-allergy of Nvidia binary drivers to Xen). This is more of an issue – but with a bit of manual source editing I was able to solve it.

Here is how to get the latest 295.x driver (295.75) to build against Xen kernel 3.8.6. You may need to do this as root.

Kernel source acquisition and preparation:

wget http://uk1.mirror.crc.id.au/repo/el6/SRPMS/kernel-xen-3.8.6-1.el6xen.src.rpm
rpm -ivh kernel-xen-3.8.6-1.el6xen.src.rpm
cd ~/rpmbuild/SPEC
rpmbuild -bp kernel-xen.spec
cd ~/rpmbuild/BUILD/linux-3.8.6
cp /boot/config-3.8.6-1.el6xen.x86_64 .config
make prepare
make all

Now that you have the kernel sources ready, get the Nvidia driver 295.75, the patch, patch it and build it.

wget http://uk.download.nvidia.com/XFree86/Linux-x86_64/295.75/NVIDIA-Linux-x86_64-295.75.run
wget https://dl.dropboxusercontent.com/u/61491808/NVIDIA-Linux-x86_64-295.75.patch
bash ./NVIDIA-Linux-x86_64-295.75.run --extract-only
patch < NVIDIA-Linux-x86_64-295.75.patch
cd NVIDIA-Linux-x86_64-295.75
export IGNORE_XEN_PRESENCE=y
export SYSSRC=~/rpmbuild/BUILD/linux-3.8.6
cp /usr/include/linux/version.h $SYSSRC/include/linux/
./nvidia-installer -s

And there you have it Nvidia driver 295.75 that builds cleanly and works against 3.8.6 kernels. The same xorg.conf given in part 3 of this series will continue to work.

It is really quite disappointing that all this is necessary. What is more concerning is that the ability to use a monitor like the T221 is diminishing by the day. Without the ability to override what RandR returns, it may well be gone completely soon. It seems the only remaining option is to write a fakerandr library (similar to fakexinerama). Any volunteers?

It seems that Nvidia drivers are both losing features and becoming more bloated at the same time. 295.75 is 56MB. 304.88 is 65MB. That is 16% bloat for a driver that is regressively missing a feature, in this case an important one. Can there really be any doubt that the quality of software is deteriorating at an alarming rate?

Clevo M860TU / Sager NP8662 / mySN XMG5 GPU (GTX260M / FX 3700M) Replacement / Upgrade and Temperature Management Modifications

Recently, my wife’s Clevo M860TU laptop suffered a GPU failure. Over our last few Borderlands 2 sessions, it would randomly crash more and more frequently, until any sort of activity requiring 3D acceleration refused to work for more than a few seconds. The temperatures as measured by GPU-Z looked fine (all our computers get their heatsinks and fans cleaned regularly), so it looked very much like the GPU itself was starting to fail. A few days later, it failed completely, with the screen staying permanently blank.

The original GPU in it was an Nvidia GTX260M. These proved near impossible to come by in MXM III-HE form factor. Every once in a while a suitable GTX280M would turn up on eBay, but the prices were quite ludicrous (and consequently they would never sell, either). Interestingly, Nvidia Quadro FX 3700M MXM III-HE modules seem to be fairly abundant and reasonably priced. This is interesting considering that they cost several times more than the GTX280M new. Their spec (128 shaders, 75W TDP) is identical.

MXM-III HE Nvidia Quadro FX 3700M

MXM-III HE Nvidia Quadro FX 3700M

The GTX260M has 112 shaders and a lower TDP of 65W, so the cooling was going to be put under increased strain (especially since I decided to upgrade it from a dual core to a quad core CPU at the same time – more on that later). Having fitted it all (it is a straight drop-in replacement, but make sure you use shims and fresh thermal pads for the RAM if required to ensure proper thermal contact with the heatsink plate), I ran some stress tests.

Within 10 minutes of OCCT GPU test, it hit 97C, and started throttling and producing errors. I don’t remember what temperatures the GTX260M was reaching before, but I am quite certain it was not this high. I had to find a way to reduce the heat production of the GPU. Given the cooling constraints in a laptop, even a well designed one like the Clevo M860TU, the only way to reduce the heat was by reducing either the clock speed or the voltage – or both. Since the heat produced by a circuit is proportional to the multiple of the clock speed and the square of the voltage, reducing the voltage has a much bigger effect than reducing the clock speeds. Of course, reducing the voltage necessitates a reduction in clock speed to maintain stability. The only way to do this on an Nvidia GPU is by modifying the BIOS. Thankfully, the tools for doing so are readily available:

After some experimentation, it wasn’t difficult to find the optimal setting given the cooling constraints. The original settings were:

  • Core: 550MHz
  • Shaders:1375MHz
  • Memory: 799MHz (1598MHz DDR)
  • Voltage: 1.03V (Extra)
  • Temperature: Throttles  at 97C and gets unstable (OCCT GPU test)
  • FPS: ~17

The settings I found that provided 100% stability and reduced the temperatures down to a reasonable level are as follows:

  • Core: 475MHz
  • Shaders: 1250MHz
  • Memory: 799MHz (1598MHz DDR)
  • Voltage: 0.95V (Extra)
  • Temperature: 82C peak (OCCT GPU test)
  • FPS: ~16

The temperature drop is very significant, but the performance reduction is relatively minimal. It is worth noting that OCCT is specifically designed to produce maximum heat load. Playing Borderlands 2 and Crysis with all the settings set to maximum at 1920×1200 resulted in peak temperatures around 10C lower than the OCCT test.

While I had the laptop open I figured this would be a good time to upgrade the CPU as well. Not that I think that the 2.67Hz P9600 Core2 was underpowered, but with the 2.26GHz Q9100 quad core Core2s being quite cheap these days, it seemed like a good idea. And considering that when overclocking the M860TU from 1066 to 1333FSB I had to reduce the multiplier on the P9600 (not that there was often any need for this), the Q9100′s lower multiplier seemed like a promising overall upgrade. The downside, of course, was that the Q9100 is rated to a TDP of 45W compared to P9600′s 25W. Given the heatsink on the Clevo M860TU is shared between the CPU and the GPU, this no doubt didn’t help the temperatures observed under OCCT stress testing. Something could be done about this, too, though.

Enter RMClock – a fantastic utility for tweaking VIDs to achieve undervolting on x86 CPUs at above minimum clock speed. Intel Enhanced SpeedStep reduces both the clock speed and the voltage when applying power management. The voltage VID and clock multipliers are overrideable (within the minimum and maximum for both hard-set in the CPU), which means that in theory, with a very good CPU, we could run the maximum multiplier and minimum VID to reduce power saving. In most cases, of course, this would result in instability. But, it turns out, my Q9100 was stable under several hours of OCCT testing at minimum VID (1.05V) at top multiplier (nominal VID 1.275V). This resulted in a 10C drop in peak OCCT CPU load tests, and a 6C drop in peak OCCT GPU load tests (down to 76C from 82C peak).

WQUXGA a.k.a. OMGWTF – IBM T221 3840×2400 204dpi Monitor – Part 5: When You Are Really Stuck With a SL-DVI

I recently had to make one of these beasts work bearably well with only a single SL-DVI cable. This was dictated by the fact that I needed to get it working on a graphics card with only a single DVI output, and my 2xDL-DVI -> 2xLFH-60 adapter was already in use. As I mentioned previously, I found the standard 1xSL-DVI’s worth 13Hz to be just too slow when it comes to a refresh rate (I could see the mouse pointer skipping along the screen), but the default 20Hz from 2xSL-DVI was just fine for practically any purpose.

So, faced with the need to run with just a single SL-DVI port, it was time to see if a bit of tweaking could be applied to reduce the blanking periods and squeeze a few more FPS out of the monitor. In the end, 17.1Hz turned out to be the limit of what could be achieved. And it turns out, this is sufficient for the mouse skipping to go away and make the monitor reasonably pleasant to use.

(Note: My wife disagrees – she claims she can see the mouse skipping at 17.1Hz. OTOH, she is unable to read my normal font size (MiscFixed 8-point) on this monitor at full resolution. So how you get along with this setup will largely depend on whether your eyes’ sensitivity is skewed toward high pixel density or high frame rates.)

The xorg.conf I used is here:

Section "Monitor"
  Identifier    "DVI-0"
  HorizSync    31.00 - 105.00
  VertRefresh    12.00 - 60.00
  Modeline "3840x2400@17.1"  165.00  3840 3848 3880 4008  2400 2402 2404 2406 +hsync +vsync
EndSection

Section "Device"
  Identifier    "ATI"
  Driver        "radeon"
EndSection

Section "Screen"
  Identifier    "Default Screen"
  Device        "ATI"
  Monitor        "DVI-0"
  DefaultDepth    24
  SubSection "Display"
    Modes    "3840x2400@17.1"
  EndSubSection
EndSection

The Modeline could easily be used to create an equivalent setting in Windows using PowerStrip or a similar tool, or you could hand-craft a custom monitor .inf file.

In the process of this, however, I have discovered a major limitation of some of the Xorg drivers. Generic frame buffer (fbdev) and VESA (vesa) drivers do not support Modelines, and will in fact ignore them. ATI’s binary driver (fglrx) also doesn’t support modelines. Linux CCC application mentions a section for custom resolutions, but there is no such section in the program. So if you want to use a monitor in any mode other than what it’s EDID reports, you cannot use any of these drivers. This is a hugely frustrating limitation. In the case of fbdev driver, it is reasonably forgiveable because it relies on whatever modes the kernel frame buffer exposes. In the case of the VESA driver it is understandable that it only supports standard VESA modes. But ATI’s official binary driver lacking this feature is quite difficult to forgive – it has clearly be dumbed down too far.

Getting the Best out of the MacBook Pro Retina 15 Screen in VMware Fusion

I make no secret of the fact that I am neither a fan of Apple nor a fan of virtualization. But sometimes they make for the best available option. I have recently found myself in such a situation. My current employer, mercifully, allows employees a choice of something other than vanilla Windows machines to work on, and there was an option of getting a MacBook Pro. As you can probably guess from some of the previous articles here, I find the single most important productivity feature of a computer to be the screen resolution, an opinion I appear to share with Linus Torvalds. So I opted for the 15″ MacBook Pro Retina.

Unfortunately, the native Linux support on that machine still isn’t quite perfect. Since speed is not a concern in this particular case, I opted to run Linux using VMware Fusion on OSX. Unfortunately, VMware Fusion cannot handle full 2880×1800 resolution of the display and with lower resolutions running in full screen mode the quality is badly degraded by blurring and aliasing. The solution is to create a custom 2880×1800 mode in /etc/X11/xorg.conf that fits within VMware virtual graphic driver’s capabilities. This took a bit of working out since the mode had to fit within horizontal and vertical refresh rates of the driver and the total pixel clock the driver allows. The following are the settings that work for me:

Section "Monitor"
        Identifier "MacBookPro"
        HorizSync 30.0 - 90.0
        VertRefresh 30.0 - 60.0
        ModeLine "2880x1800C" 358.21 2880 2912 4272 4304 1800 1839 1852 1891
EndSection

Section "Screen"
        Identifier "Default Screen"
        Monitor "MacBookPro"
        DefaultDepth 24
        SubSection "Display"
                Modes "2880x1800C"
        EndSubSection
EndSection

The result is being able to run a full screen 2880×1800 mode, and it looks absolutely superb.

Virtual Performance – Or Lack Thereof

Updated: Results for VMware ESXi 5.0.0 and the newly released VMware Player 5.0.0 have been added.

People always seem very shocked when I suggest that virtualization comes with a very substantial performance penalty even when virtualization hardware extensions are used. Concerningly, this surprise often comes from people who have already either committed their organization’s IT infrastructure to virtualization, or have made firm plans to do so. The only thing I can conclude in these cases, unbelievable as it may appear, is that they haven’t done any performance testing of their own to assess the solution they are planning to adopt.

So I decided to document some basic performance tests that show just how substantial the performance hit of virtualization is.

Test Setup

Hardware:
Core2 Quad 3.2GHz
8GB of RAM
2x500GB 7200rpm SATA DM RAID1 for the main system
1x250GB 7200rpm SATA for testing

Virtual Test Configuration (VMware Player 4.0.4, Xen 4.1.2 (PV and HVM), KVM (RHEL6), VirtualBox 4.1.18):
CPU Cores: 4 (all)
RAM: 6GB
Disk: System booting off the 2×500 RAID1. Raw 250GB SATA disk passed to the VM.

Disk write caching was enabled in the VMware configuration. You may think that this unfairly gives the VM configuration an advantage, but as you will see from the results, even with this “cheat”, the performance is still very disappointing compared to bare metal. In any case, the amount of disk I/O is negligible – the caches and the working set always fit into memory.

Physical Test Configuration:
CPU Cores: 4 (all)
RAM: 6GB (limited using mem=6G boot parameter)
Disk: Booting directly off the same 250GB SATA disk used for VM testing, with the same kernel and configuration.

The Test

The test performed is the compile of the vanilla 2.6.32.59 Linux kernel. This is the script used for testing:

#!/bin/bash

echo Cleaning...
make clean > /dev/null 2>&1
make mrproper > /dev/null 2>&1
sync
echo 3 > /proc/sys/vm/drop_caches
echo Configuring...
make allmodconfig > /dev/null 2>&1
echo Syncing...
sync
find . -type f -print0 | xargs --null cat > /dev/null 
echo "Timing build..."
time (make -j16 all > /dev/null 2>&1)

The source tree is cleaned and all caches dropped. The allmodconfig configuration is used to get some degree of testing of disk I/O by creating the maximum number of files. Caches are then primed by pre-loading all the source files. This is done in order to more accurately measure the CPU and RAM subsystems without bottlenecking on disk I/O. The CPU in the system has 4 cores, and 16 build threads are used to ensure the CPU and memory I/O are saturated, but without causing enough memory pressure to cause swapping.

On the host and in the guest, all unnecessary services and processes were stopped (especially crond which could theoretically cause additional load on the system that would distort the results).

All tests were carried out 3 times in a row, and the best result for each is considered here (the differences between the runs were minimal).

This is very much a redneck, brute-force test. There isn’t much finesse to it. But I like tests like this because they cannot be cheated with the sort of smoke and mirrors illusions that virtualization software is very good at applying.

Results

Bare metal: 1,042.523s (100%)
Xen 4.1.2 (PV): 1,316.984s (79.16%)
VMware ESXi 5.0.0: 1,361.321s (76.58%)
VMware Player 5.0.0: 1,478.732s (70.50%)
VMware Player 4.0.4: 1,520.023s (68.59%)
KVM (RHEL6): 1,691.849s (61.62%)
Xen 4.1.2 (HVM): 2,839.442s (36.72%)
VirtualBox 4.1.18: 8,876.945s (19.06%)

Note: No, this is not a typo – VirtualBox really is that bad.

To make this difference easier to visualise, here it is on graphs

Virtualization Performance - Time in Seconds

Virtualization Performance – Time in Seconds

 

To give a better idea of relative performance, here it is in % points, with bare metal being 100%.

Virtualization Performance - Relative Difference

Virtualization Performance – Relative Difference

The difference is substantial even with the least poorly performing hypervisor. Virtualization performance is over a 5th (21%) down with paravirtualized Xen down compared to bare metal, and nearly a quarter (24%) lower than bare metal with VMware ESXi, and even worse with KVM. Or if you prefer to look at it the other way around, bare metal is more than a quarter as fast again (26.32%) as the best performing hypervisor on the same hardware.

Don’t get me wrong – virtualization is handy for all sorts of low-performance tasks. In cases where it is used to consolidate a number of mostly idle systems into one mostly idle system, it brings clear benefits. (Except maybe in the case of VirtualBox – the performance there is just too appalling for anything, and HVM Xen is pretty poor, too.) But for uses where performance is important, thoughts of virtualizing need to undergo a serious reality check. Even if your system is designed to scale completely horizontally, requiring 26%+ of extra hardware (best case scenario, it could be a lot worse depending on which hypervisor you use) is likely to put a significant strain on your budget and running costs.

Note: It is worth stressing that these tests are carried out on hardware with VT-x, and support for this is enabled and used for all the tested hypervisors. So the results here are based on optimal hardware support.

The Best £5 Headset Ever: PCLine PCL-MH36

I have been using one of these PCL-MH36 headsets for ages, and I’ve been so happy with it that it never occurred to me to mention just how good it is. The sound quality is quite superb and easily measured up against headsets several times the price. It has never let me down, and it sounds better than most desk speakers I have heard. The sound quality is excellent across the entire frequency range and it regularly surprises me just how many of the frequencies get lost when you listen to sound through average computer speakers – but this dirt cheap headset makes sure you hear it all.

If you are in need of a new headset these are seriously worth a look, even if you are not on a budget. This seems to be a rare case of getting considerably more than the price tag might indicate.

Layman’s Pipedreamt Musings on the State of Present Day Bionics

Not an article along the usual lines this time, but I was so surprised by just how far bionic technology has come that I felt I had to write some musings on it.

While doing research into the state of present day bionics technology I started thinking about the implied limits of the current technology. Just how far are we from the Matrix-like brain-in-a-jar philosophical thought exercise being a realistic possibility? From what I can make out, most of the technology, albeit crude and unwieldy in a lot of cases, appears to actually be available today.

Bionic limbs controlled by processed signals picked up from the brain and peripheral nerves have been around for a long time. While there are still issues with implanting sensors directly into the brain causing an inflammatory response in the tissue known as gliosis which causes electrodes to get insulated and prevents them from picking up the signal, the technology has been tested pretty successfully. If that can be done, then arguably the same technology could be used for all muscle control.

But what about other vital organs?

Heart: Permanent heart replacement devices have already been used.

Kidneys: Dialysis machines have been in use for decades, and although bulky they are capable of performing the function of kidneys indefinitely. Small, implantable devices are also becoming available.

Spleen: Considering that people who have had their spleens removed for various reasons (injury, cancer or other illness) live for decades without significant health consequences, spleen doesn’t appear to be a vital organ.

Liver: Since liver is one of the few vital organs that regenerates, most of the approaches to artificial replacements have been based on biotech and regenerative technologies. Still, it would appear that Extracorporeal Liver Assistance Devices (ELADs) do already exist.

Digestive system including pancreas: Intravenous provision of nutrients for coma patients appears to have been in use for a long time. Thus, in the context of what might be considered to be a full bionic chassis, the digestive system could be omitted and the appropriate nutrients introduced into the blood stream directly.

Lungs: Artificial lungs appear to be the most problematic part. The devices available today, both the bulky external ones and the more modern implantable ones all suffer from the same problem – blood clotting. The life expectancy on the large external devices is typically around a day, while the smaller implantable ones have been able to sustain animals in trials for around 5 days before they started to introduce blood clots into the blood stream that lead to problems like stroke.

In light of all that, it would appear that ghost-in-the-shell possibility is rapidly making a transition from science fiction to science fact. Sure – we aren’t quite there yet. But it is quite amazing just how close to plausibly implementable the very concept actually is today, at least from the point of view of a layman who only has limited information available.

If you are involved in this field of research and can confirm, deny, correct, extend or elucidate any of the above points, please, do post a comment.

RedSleeve Linux Public Alpha

Here is something that I have been working on of late.

RedSleeve Linux is a 3rd party ARM port of a Linux distribution of a Prominent North American Enterprise Linux Vendor (PNAELV). They object to being referred to by name in the context of clones and ports of their distribution, but if you are aware of CentOS and Scientific Linux, you can probably guess what RedSleeve is based on.

RedSleeve is different from CentOS and Scientific Linux in that it isn’t a mere clone of the upstream distribution it is based on – it is a port to a new platform, since the upstream distribution does not include a version for ARM.

The reason RedSleeve was created is because ARM is making inroads into mainstream computing, and although Fedora has supported ARM for a while, it is a bleeding edge distribution that puts the emphasis on keeping up with the latest developments, rather than long term support and stability. This was not an acceptable solution for the people behind this project, so we set out to instead port a distribution that puts more emphasis on long term stability and support.

More/Better Internal Storage on the Toshiba AC100 – Part 2

Following my research for the previous article about the performance of SD/CF/USB flash modules, the only conclusion I could reach is that most of them are pretty dire. The only notable exception among the SD cards seems to be the latest generation of the SanDisk Extreme Pro (95MB/s) cards that just about managed to squeeze out enough performance on random writes to match a 7200rpm disk. Still, this is pretty dire compared to any reasonable SSD, so I wanted to see what else could be done about installing extra storage with good performance into a Toshiba AC100.

What I came across is this: SuperTalent RC8 USB stick. It may look like a USB stick, but it is actually a full-on SSD, featuring a SandForce 1200 flash controller. I figured this was worth a shot, even though the specifications indicate it is rather large (far too large to fit inside an AC100 in it’s standard form). Stripped out of the casing, however, it looks like RC8 might just be fittable inside the Toshiba AC100.

This is what I ended up with. There appears to be only one place inside an AC100 where a bare RC8 circuit board could be fitted. You will need the following:

1) P3MU mini-PCIe USB break-out module

2) SuperTalent RC8 USB stick

3) Custom made USB cable (male and female type A USB connectors, some single core wire, and some skill with a soldering iron)

Measure out exactly how long you need the cable to be – there is no room to tuck away excess able inside an AC100. Here is what my cable layout ended up looking like.

AC100 motherboard with P3MU and custom USB cable fitted

AC100 motherboard with P3MU and custom USB cable fitted

This is what it looks like with the top panel fitted. Note the large cut-out that has been made below the mini-PCIe slot access hole.

AC100 modified to receive RC8 USB SSD

AC100 modified to receive RC8 USB SSD

And again with the screws fitted. Note that one of the screw holes is in the area that had to be cut out. This shouldn’t affect the structural integrity of the AC100, though. Also note that the right speaker cable has been re-routed slightly to now go over the LED ribbon cable.

AC100 modified to receive RC8 SSD

AC100 modified to receive RC8 SSD

This is what it looks like with the RC8 attached. Now you can see why the cut-out in the top panel was exactly the shape it was – I specifically cut out the minimum possible amount to allow the RC8 to fit.

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

I also put a piece of thin transparent sticky tape over it to hold in in place, just to make sure nothing can short out against the underside of the keyboard.

Toshiba AC100 with the SuperTalent RC8 SSD

Toshiba AC100 with the SuperTalent RC8 SSD

And that is pretty much it. Put the keyboard back in and bolt it all together. The metal part of the USB connector will sit a tiny bit above the line of the panel, but the only way you’ll notice it once you put the keyboard back on is by knowing that there is a tiny bulge there.

Your AC100 should now be able to handle ~ 2000 IOPS on both random reads and random writes, along with much better life expectancy that having proper flash management brings.

At this point I would like to point out just how impressed I am with the SuperTalent RC8 USB SSD. Not only is the performance fenomenal (especially for a USB stick), but it really behaves like a SATA SSD – to the point where you can use tools like hdparm and smartctl on it (yes, it even supports SMART).

Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

Having spent a considerable amount of time, effort, and ultimately money trying to find decently performing SD, CF and USB flash modules, I feel I really need to ensure that I make the lives of other people with the same requirements easier by publishing my findings – especially since I have been unable to find a reasonable comprehensive data source with similar information.

Unfortunately, virtually all SD/microSD (referred to as uSD from now on), CF and USB flash modules have truly atrocious performance for use as normal disks (e.g. when running the OS from them on a small, low power or embedded device), regardless of what their advertised performance may be. The performance problem is specifically related to their appalling random-write performance, so this is the figure that you should be specifically paying attention to in the tables below.

As you will see, the sequential read and write performance of flash modules is generally quite good, as is random-read performance. But on their own these are largely irrelevant to overall performance you will observe when using the card to run the operating system from, if the random-write performance is below a certain level. And yes, your system will do several MB of writing to the disk just by booting up, before you even log in, so don’t think that it’s all about reads and that writes are irrelevant.

For comparison, a typical cheap laptop disk spinning at 5400rpm disk can typically achieve 90 IOPS on both random reads and random writes with typical (4KB) block size. This is an important figure to bear in mind purely to be able to see just how appalling the random write performance of most removable flash media is.

All media was primed with two passes of:

 dd if=/dev/urandom of=/dev/$device bs=1M oflag=direct

in order to simulate long term use and ensure that the performance figures reasonably accurately reflect what you might expect after the device has been in use for some time.

There are two sets of results:

1) Linear read/write test performed using:

dd if=/dev/$device of=/dev/null    iflag=direct
dd if=/dev/zero    of=/dev/$device oflag=direct

The linear read-write test script I use can be downloaded here.

2) Random read/write test performed using:

iozone -i 0 -i 2 -I -r 4K -s 512m -o -O +r +D -f /path/to/file

In all cases, the test size was 512MB. Partitions are aligned to 2MB boundaries. File system is ext4 with 4KB block size (-b 4096) and 16-block (64KB) stripe-width (-E stride=1,stripe-width=16), no journal (-O ^has_journal), and mounted without access time logging (-o noatime). The partition used for the tests starts at half of the card’s capacity, e.g. on a 16GB card, the test partition spans the space from 8GB up to the end. This is in done in order to nullify the effect of some cards having faster flash at the front of the card.

The data here is only the first modules I have tested and will be extensively updated as and when I test additional modules. Unfortunately, a single module can take over 24 hours to complete testing if their performance is poor (e.g. 1 IOPS) – and unfortunately, most of them are that bad, even those made by reputable manufacturers.

The dd linear test is probably more meaningful if you intend to use the flash card in a device that only ever performs large, sequential writes (e.g. a digital camera). For everything else, however, the dd figures are meaningless and you should instead be paying attention to the iozone results, particularly the random-write (r-w). Good random write performance also usually indicates a better flash controller, which means better wear leveling and better longevity of the card, so all other things being similar, the card with faster random-write performance is the one to get.

Due to WordPress being a little too rigid in it’s templates to allow for wide tables, you can see the SD / CF / USB benchmark data here. This table will be updated a lot so check back often.