More/Better Internal Storage on the Toshiba AC100 – Part 2

Following my research for the previous article about the performance of SD/CF/USB flash modules, the only conclusion I could reach is that most of them are pretty dire. The only notable exception among the SD cards seems to be the latest generation of the SanDisk Extreme Pro (95MB/s) cards that just about managed to squeeze out enough performance on random writes to match a 7200rpm disk. Still, this is pretty dire compared to any reasonable SSD, so I wanted to see what else could be done about installing extra storage with good performance into a Toshiba AC100.

What I came across is this: SuperTalent RC8 USB stick. It may look like a USB stick, but it is actually a full-on SSD, featuring a SandForce 1200 flash controller. I figured this was worth a shot, even though the specifications indicate it is rather large (far too large to fit inside an AC100 in it’s standard form). Stripped out of the casing, however, it looks like RC8 might just be fittable inside the Toshiba AC100.

This is what I ended up with. There appears to be only one place inside an AC100 where a bare RC8 circuit board could be fitted. You will need the following:

1) P3MU mini-PCIe USB break-out module

2) SuperTalent RC8 USB stick

3) Custom made USB cable (male and female type A USB connectors, some single core wire, and some skill with a soldering iron)

Measure out exactly how long you need the cable to be – there is no room to tuck away excess able inside an AC100. Here is what my cable layout ended up looking like.

AC100 motherboard with P3MU and custom USB cable fitted

AC100 motherboard with P3MU and custom USB cable fitted

This is what it looks like with the top panel fitted. Note the large cut-out that has been made below the mini-PCIe slot access hole.

AC100 modified to receive RC8 USB SSD

AC100 modified to receive RC8 USB SSD

And again with the screws fitted. Note that one of the screw holes is in the area that had to be cut out. This shouldn’t affect the structural integrity of the AC100, though. Also note that the right speaker cable has been re-routed slightly to now go over the LED ribbon cable.

AC100 modified to receive RC8 SSD

AC100 modified to receive RC8 SSD

This is what it looks like with the RC8 attached. Now you can see why the cut-out in the top panel was exactly the shape it was – I specifically cut out the minimum possible amount to allow the RC8 to fit.

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

Toshiba AC100 with the SuperTalent RC8 USB SSD installed

I also put a piece of thin transparent sticky tape over it to hold in in place, just to make sure nothing can short out against the underside of the keyboard.

Toshiba AC100 with the SuperTalent RC8 SSD

Toshiba AC100 with the SuperTalent RC8 SSD

And that is pretty much it. Put the keyboard back in and bolt it all together. The metal part of the USB connector will sit a tiny bit above the line of the panel, but the only way you’ll notice it once you put the keyboard back on is by knowing that there is a tiny bulge there.

Your AC100 should now be able to handle ~ 2000 IOPS on both random reads and random writes, along with much better life expectancy that having proper flash management brings.

At this point I would like to point out just how impressed I am with the SuperTalent RC8 USB SSD. Not only is the performance fenomenal (especially for a USB stick), but it really behaves like a SATA SSD – to the point where you can use tools like hdparm and smartctl on it (yes, it even supports SMART).

Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

Having spent a considerable amount of time, effort, and ultimately money trying to find decently performing SD, CF and USB flash modules, I feel I really need to ensure that I make the lives of other people with the same requirements easier by publishing my findings – especially since I have been unable to find a reasonable comprehensive data source with similar information.

Unfortunately, virtually all SD/microSD (referred to as uSD from now on), CF and USB flash modules have truly atrocious performance for use as normal disks (e.g. when running the OS from them on a small, low power or embedded device), regardless of what their advertised performance may be. The performance problem is specifically related to their appalling random-write performance, so this is the figure that you should be specifically paying attention to in the tables below.

As you will see, the sequential read and write performance of flash modules is generally quite good, as is random-read performance. But on their own these are largely irrelevant to overall performance you will observe when using the card to run the operating system from, if the random-write performance is below a certain level. And yes, your system will do several MB of writing to the disk just by booting up, before you even log in, so don’t think that it’s all about reads and that writes are irrelevant.

For comparison, a typical cheap laptop disk spinning at 5400rpm disk can typically achieve 90 IOPS on both random reads and random writes with typical (4KB) block size. This is an important figure to bear in mind purely to be able to see just how appalling the random write performance of most removable flash media is.

All media was primed with two passes of:

 dd if=/dev/urandom of=/dev/$device bs=1M oflag=direct

in order to simulate long term use and ensure that the performance figures reasonably accurately reflect what you might expect after the device has been in use for some time.

There are two sets of results:

1) Linear read/write test performed using:

dd if=/dev/$device of=/dev/null    iflag=direct
dd if=/dev/zero    of=/dev/$device oflag=direct

The linear read-write test script I use can be downloaded here.

2) Random read/write test performed using:

iozone -i 0 -i 2 -I -r 4K -s 512m -o -O +r +D -f /path/to/file

In all cases, the test size was 512MB. Partitions are aligned to 2MB boundaries. File system is ext4 with 4KB block size (-b 4096) and 16-block (64KB) stripe-width (-E stride=1,stripe-width=16), no journal (-O ^has_journal), and mounted without access time logging (-o noatime). The partition used for the tests starts at half of the card’s capacity, e.g. on a 16GB card, the test partition spans the space from 8GB up to the end. This is in done in order to nullify the effect of some cards having faster flash at the front of the card.

The data here is only the first modules I have tested and will be extensively updated as and when I test additional modules. Unfortunately, a single module can take over 24 hours to complete testing if their performance is poor (e.g. 1 IOPS) – and unfortunately, most of them are that bad, even those made by reputable manufacturers.

The dd linear test is probably more meaningful if you intend to use the flash card in a device that only ever performs large, sequential writes (e.g. a digital camera). For everything else, however, the dd figures are meaningless and you should instead be paying attention to the iozone results, particularly the random-write (r-w). Good random write performance also usually indicates a better flash controller, which means better wear leveling and better longevity of the card, so all other things being similar, the card with faster random-write performance is the one to get.

Due to WordPress being a little too rigid in it’s templates to allow for wide tables, you can see the SD / CF / USB benchmark data here. This table will be updated a lot so check back often.

 

More/Better Internal Storage on the Toshiba AC100

One of the unfortunate things about the AC100 is that the internal storage isn’t removable, and thus isn’t easily upgradable or replaceable. The latter could be an issue in the longer term because it is flash memory, so it will eventually wear out, and I since it is relatively basic eMMC, I don’t expect the flash controller to be particularly advanced when it comes to wear leveling and minimizing write amplification. Using the SD slot is an option, but if we are running the operating system from it, we cannot use it for removable media, which could be handy. We could use a USB stick instead, but then we lose the only USB port on the machine. There is no SATA controller inside the AC100.

What can be done about this? Well, models that have a 3G modem have it on a mini-PCIe USB card. Even though Tegra 2 has a PCIe controller built into it, the mini-PCIe slot isn’t fully wired up – only USB lines are connected. Since most of us can tether a data connection via our phones, and since this is more cost effective than paying for two separate mobile connections, the 3G module isn’t particularly vital. The main issue that the slot only has USB wired up. So what we would need is a USB mini-PCIe SSD. Is there such a thing? It turns out that there is. I have been able to find two:

  1. EMPhase Mini PCIe USB S1 SSD
  2. InnoDisk miniDOM-U SSD

The specification of the two modules is virtually identical (both use SLC flash among other similarities), so I decided to investigate both of them. Unfortunately, having contacted an EMPhase re-seller, they called me back having spoken to the manufacturer and talked me out of buying one, citing unspecified issues.

My local InnoDisk re-seller was more interested in selling me a product, but there were two reasons why despite very good pre-sales service I ultimately decided against buying one of these. The first and foremost was the performance specification. According to the manufacturer’s own figures, the random access performance with 4KB blocks is 1440 random read IOPS and 30 random write IOPS. Considering the price per GB of these modules is approximately 4x that of similarly performing SLC SD cards, this module was discarded on the basis of cost effectiveness.

Having discarded the above modules, there are still a few alternative options available. The low risk, tidy options include an SD mini-PCIe USB adapter and a micro-SD mini-PCIe USB adapter. They are very reasonably priced so I got one of each for testing, and I am pleased to say that they work absolutely fine in the AC100. Here is what they look like fitted.

Dual micro-SD mini-PCIe USB Adapter

Dual micro-SD mini-PCIe USB Adapter

 

SD mini-PCIe USB Adapter

SD mini-PCIe USB Adapter

The SD cards will appear as USB disks. If you use the dual micro-SD adapter you can RAID the two cards together.

Unfortunately, I have found that the best results are achieved using a single SD card, purely because I haven’t found any micro-SD cards that have reasonable performance when it comes to random-write IOPS. SD cards fare a little better, but the best SD card I have found in terms of random write IOPS still tops out at 133 random write IOPS using 4KB blocks. Still, it is over 4x of the marketed figures for the InnoDisk SSD at 4x lower price per GB, and the performance does scrape past what I would consider minimal requirements for pleasant use.

I am currently putting together a list of SD, micro SD and USB flash devices and consistent benchmark performance figures for them, which should hopefully help you to choose the ones most suitable for your application. I hope to have the article up reasonably soon, but don’t expect it too soon – benchmarking SD cards takes a long time to do properly.

Alleviating Memory Pressure on Toshiba AC100

After all the upgrades and tweaks to the AC100 (screen upgrade to 1280×720, cooling improvements and boosting the clock speed by over 40%), only one significant issue remains: it only has 512MB of RAM. Unfortunately, the memory controller initialization is done by the closed-source boot loader, so even if we were to solder in bigger chips (Tegra2 can handle up to 1GB of RAM), it is unlikely in the extreme that it would just work.

So, other than increasing the physical amount of memory, can we actually do anything to improve the situation? Well, as a matter of fact, there are a few things.

Clawing Back Some Memory

By default, the GPU gets allocated a hefty 64MB of RAM out of 512MB that we have. This is quite a substantial fraction of our memory, and it would be nice to claw some of it back if we are not using it. I find the Nvidia’s Tegra binary accelerated driver to be too buggy to use under normal circumstances, so I use the basic unaccelerated frame buffer driver instead. There are two frame buffer allocations on the AC100: the internal display and the HDMI port. The latter is only intended for use with TVs which means we shouldn’t need a resulition of more than 1920×1080 on that port. The highest resolution display we can have on the internal port is 1280×720. That means that the maximum amount of memory used by those two frame buffers is 8100KB + 3600KB 11700KB. To be on the safe side, let’s call that 16MB. That still leaves us 48MB that we should be able to safely reclaim. We can do that by telling the kernel that there is extra memory at certain addresses using the following boot parameters:

mem=448M@0M mem=48M@464M

Make sure the accelerated binary Tegra driver is disabled in your xorg.conf, reboot and you should now have 496MB of usable RAM instead of 448MB. It’s just over an extra 10%, which should make a noticeable difference given how tight the memory is to begin with.

If you aren’t using the HDMI interface, my tests show that it is in fact possible to reduce the GPU memory to just 2MB with no ill effects, when using the 1280×720 display panel, because the frame buffer seems to operate in 16-bit mode by default:

mem=448M@0M mem=62M@450M

That leaves a total of 510MB of for applications.

Memory Compression

In the recent kernels, there are two modules that are very useful when we have plenty of CPU resources but very little memory – just the case on the AC100. They are zcache and zram. On the 3.0 kernels instead of zram we can use frontcache which is similar but has the advantage that it is aware and cooperates with zcache. Since at the time of writing this 3.0 isn’t quite as polished and stable for the AC100 as 2.6.38, let’s focus on zram instead.

Assuming you have compiled zcache support into your kernel, all you need to do to enable it is add the kernel boot paramter “zcache”. From there on, your caches should be compressed, thus increasing the amount they can store.

zram provides a virtual block device backed by RAM, but the contents are compressed, so it should always end up using less than the amount of memory it presents as a block device (unless all of the data is uncompressible, which is very unlikely). To err on the side of caution we shouldn’t set this to more than half of the total memory across all the zram devices. To ensure optimal performance, we should also set the number of zram devices to be the same as the number of CPUs cores in the system to make sure that all CPUs end up being used (each zram device handler is a single thread).

To set the number of zram devices to 2 (Tegra2 has 2 CPU cores), we need to create the file /etc/modprobe.d/zram.conf containing the following line:

options zram num_devices=2

Then once we load the zram module (modprobe zram), we should see device nodes called /dev/zram*. We can configure the devices:

echo <memory_size_in_bytes> > /sys/block/<zram_device>/disksize

The amount of memory assigned to each zram device should be such that their total combined size doesn’t exceed half of the total physical memory in the system.

Then we can create swap headers on those zram devices using mkswap (e.g. mkswap /dev/zram0) and enable swapping to them (swapon -p100 /dev/zram0).

We should now have some compressed RAM for swapping to instead of swapping to a slow SD card.

Tweaks

It turns out that some of the default settings on Linux distributions aren’t as sensible as they could be. By default the amount of stack space each thread is allocated is 8MB. This is unnecessarily large and results in more memory consumption than is necessary. Instead we can set the soft limit to 256KB using “ulimit -s 256″. Ideally we should make this happen automatically at startup by creating a file /etc/security/limits.d/90-stack.conf containing the following:

* soft stack 256

Some users have reported that this can increase the amount of available memory after booting by a a rather substantial amount. Since this is a soft limit, programs that require more stack space can still allocate it by asking for it.

Choice of Software

One of the most commonly used types of software nowdays is a web browser, and unfortunately, most web browsers have become unreasonably bloated in recent years. This is a problem when the amount of memory is as limited as in it is on most ARM machines. Firefox and to a somewhat lesser extent Chrome require a substantial amount of memory. However, there is another reasonably fully featured alternative that works on ARM – Midori. Midori is based on the Webkit rendering engine, the same one that is used by Chrome and Safari. However, it’s memory footprint is approximately half of the other browsers. Unfortunately, it’s JavaScript support isn’t quite as good as on Firefox and Chrome yet, but it is sufficiently good for most things, and if memory pressure is a serious issue, you might want to try it out.

Toshiba AC100 Smartbook – Underdog to Winner in 5 Months

I have had the Toshiba AC100 for as long as I have had the Genesi Efika MX, but on top of suffering some of the same frustrating limitations, the stability of the drivers for it in the Linux kernel just wasn’t quite up to the task until recently, and few sane people would consider using Android on a device without a touchscreen, as Toshiba appear to have envisaged.

Since Efika MX‘s major problems – the painfully low screen resolution (a problem it shares with the AC100) and the completely unusable touchpad – turned out to be easily fixable, the AC100 went on the shelf for a while. This was a shame since the AC100 had about three times the CPU power of the Efika MX (dual core Cortex A9 1GHz vs Efika’s single core Cortex A8 800MHz), but having a fully working, usable system was more important.

All that changed recently. ChromeOS 2.6.38.3 kernel brought with it more support for Tegra2 based devices, including the development board that the AC100 is based on, and shortly afterwards thanks to the awesome people in the AC100 community, a lot of things fell into place, including stable NVEC support (what keyboard/mouse/LEDs are connected to on the AC100) and sound support. But most importantly for me, the recent kernel also came with the kernel level display panel setup. No longer entirely at the mercy of what the boot loader configures, it became possible to use a higher resolution screen!

So, I carried out the upgrade to 1280×720 using the same TFT panel that I used to upgrade the Efika. All that was required was a small kernel patch. Without a change to the boot loader data the machine still starts up in 1024×600, and the kernel boot-up output is corrupted until the console font is re-set, but since setfont is called very early during the Fedora boot-up sequence, it is a problem that isn’t hard to live with. From there on, everything works absolutely fine in 1280×720. An article on the details of the upgrade procedure and the required kernel patch will follow shortly.

The nvidia binary driver works OK-ish – most of the time, but it isn’t particularly stable – every once in a while you will do something that causes the screen output to get partially corrupted. This is not related to the higher resolution, all the problems occur at 1024×600, too. Unfortunately, it doesn’t look like nvidia are showing that much interest in improving this quickly, and since their approachability and interest in helping their user community is as non-existent as ever, I wouldn’t expect driver improvements any time soon. Still, for normal use the standard unaccelerated frame buffer driver is rock solid, and the acceleration in the nvidia driver does work well enough if you want to play videos at full screen resolution, and there is even a Tegra accelerated version of flash available so YouTube works, too. All of this is a considerable improvement on the Efika MX.

Power management is also completely functional on the AC100 now, so it’s battery life is virtually identical to the Efika MX.

It is amazing how much difference 5 months can make. AC100 went from being the underdog to being an outright winner. It now matches the Efika on battery life and screen resolution, and soundly beats it by a large margin on the touchpad (the touchpad on the AC100 is actually extremely good), performance and features (no YouTube without Flash).

Best of all, it is no more expensive than the Efika MX. Here in UK – new ones can be had for as little as £170, which is less than you’ll pay for an Efika with a non-US keyboard.

In fact, the price/performance in terms of CPU power is actually better on the AC100 than it is on the SheevaPlug, and that’s before the added convenience of also having a screen/keyboard attached for troubleshooting. Because of this, AC100s are now used for extending my ARM compile farm.

WQUXGA a.k.a. OMGWTF – IBM T221 3840×2400 204dpi Monitor – Part 3: ATI vs. Nvidia

It is at times like this that I get to fully consider just how bad a decision it was to jump ship from ATI to Nvidia when it came to graphics cards. But now that sense has been forced back upon me, I will hopefully not consider such madness again for the best part of the next decade.

Due to the ATI drivers being fundamentally unable to handle the T221 reliably, I bit the bullet and decided to put my old 8800GT card back in. The first WTF came about when it transpired that ATI drivers cannot be uninstalled from Windows XP using their bundled uninstaller in Safe or VGA modes. This is quite bad when you consider that it could be the ATI drivers that are making the machine not boot into normal mode. Credit to it, though, the ATI uninstaller was not too bad once I ran it in normal mode, and after using it to remove all ATI software and uninstalling the ATI devices in Device Manager, there wasn’t enough left to cause problem on the next reboot, during which the machine contained an Nvidia card. Everything booted up fine, and after a quick run of the Auslogics Registry Cleaner (just to make sure – easily the best registry cleaner I have used to date), everything was ready for the installation of Nvidia drivers. Everything went quite painlessly, and a reboot later I had the T221 configured for 2x1920x2400@20Hz mode. The only thing that didn’t come up perfectly by default is that I had to add the 1920×2400@20Hz mode in Nvidia Control Panel (click the Customize button).

By this point, the superior features of Nvidia were already becoming apparent:

  • Text and low-resolution mode anti-aliasing in firmware – such modes look vastly better than on ATI hardware).
  • Until the driver enables the secondary port, it remains disabled. This is really nice on the T221 because it means you don’t get the same thing on the screen twice during the BIOS POST and early stages of the boot process. I can imagine this also being annoying on a multi-head setup.
  • The primary port wasn’t switching for no apparent reason in Windows with multiple screens plugged in.
  • Best of all – Windows XP drivers Just Work ™. They don’t forget their settings between reboots.
  • No tearing down the middle of the screen where the two halves meet. With the ATI card, the mouse couldn’t be drawn on both halves at the same time. In the middle, you could make it virtually disappear, Not a big deal, but yet another example of general bugginess. Also, in games the tearing along the same line disappeared (I always run with vsync forced to on, and it was still visible from time to time with the ATI card).

Just the properly working drivers would have easily convinced me of the error of my recent ways, but all the other niceties really make for a positive difference to the experience.

After I got Windows working (it only took 20 minutes, after giving up on ATI after wasting a whole day on getting it to work properly and remember the settings between reboots), it was time to get things working in Linux. The first thing that jumped out at me about this part of the exercise is just how much better ATI’s Linux drivers are compared to their Windows drivers. It is obvious that they are actually being developed by somebody competent. Unlike the Windows drivers, the Linux drivers worked out of the box, and the only unusual thing that I needed to do was to make sure Fake Xinerama was configured and preloaded. Removing them was a simple case of:

# rpm -e fglrx

Simple, efficient, reliable. Seems ATI‘s Windows driver team have a lot to learn from their Linux driver team.

The machine came up fine with the nouveau drivers loaded, but I wanted to get Nvidia’s binary drivers working. The experience here was a little more problematic than it had been with the ATI drivers. The nvidia-xconfig and nvidia-settings utilities weren’t as intuitive as the ATI configuration utility, and the setup suffered from a particularly annoying problem where GPU scaling would default to on. This resulted in the screen mode being left stretched and unusable, but sometimes just starting the nvidia-settings program would fix some of it. In the end I just gave up on it and wrote my own xorg.conf according to the documentation – and that worked perfectly. You may want to set the following environment variable to force vsync in GL modes (e.g. for mplayer’s GL output)

# export __GL_SYNC_TO_VBLANK=1

This ensured there was no tearing visible during video playback.

One thing worth noting is that Nvidia drivers bring their own Xinerama layer with them, so the Xorg Xinerama should be disabled. There is also an option for faking the Xinerama information (NoTwinViewXineramaInfo), so no need for Fake Xinerama, either.

In conclusion, it is quite clear that Nvidia win hands down in terms of features and user experience, especially on Windows, due to their more stable drivers are more intuitive configuration utilities. The story is different on Linux – I would put ATI slightly ahead on that platform, at least in terms of configuration utilities. Having to use Fake Xinerama isn’t a big deal for the technically minded. Even on Linux, however, in terms of the overall outcome and the end experience, I feel Nvidia still come out ahead, since ATI drivers still occasionally produce visible tearing when playing back high definition video.

All this made me think about what is the most important thing about a product such as graphics cards. In the end it is not just about performance. Performance is only a part of the overall package. What I find is that the most important thing about a product is the whole experience of configuring it and using it. How easy is it to get to working under edge case conditions? How reliable is it – once it is working does it stay working? Are there any experience ruining artifacts such as tearing visible in applications, even with vsync enabled? These sorts of things along with the crowning touches such as anti-aliasing of low resolution modes and only having one active video output until the drivers specifically enable the others are what really impacts the experience. And based on my experience of Nvidia and ATI cards over the past few years, I hope somebody talks some sense to me if I consider an ATI product again – except perhaps if their FireGL team starts writing their Windows drivers.

WQUXGA a.k.a. OMGWTF – IBM T221 3840×2400 204dpi Monitor – Part 2: Windows

When I set out to do this, I thought getting everything working under Windows would be easier than it was under Linux. After all, the drivers should be more mature and AMD would have likely put more effort into making sure things “just work” with their drivers. The experience has shown this expectation was unfounded. Getting the T221 working in SL-DVI 3840×2400@13Hz mode was trivial enough, but getting the 2xSL-DVI 2x1920x2400@20Hz mode working reliably has proven to be quite impossible.

The first problem has been the utter lack of intuitiveness in the Catalyst Control Center. It took a significant amount of research to finally find that the option for desktop stretching across two monitors lies behind a right click menu on an otherwise unmarked object:

CCC Desktop Stretch Option

CCC Desktop Stretch Option

Results, however, were intermittent. Sometimes the resolution for the second half of the screen would randomly get mis-set, sometimes it would work. Sometimes the desktop stretching would fail. Eventually, when it all worked (and it would usually require a lot of unplugging of the secondary port to get a usable screed back), it would be fine for that Windows session, but it would all go wrong again after a reboot. The screen would just go to sleep at the point where the login screen should come up, and the only way to wake it up is to unplug the secondary DVI link, log in, and then plug in the second cable, usually a few times, before it would come up in a usable mode. Then the same resolution and desktop stretching configuration process would have to be repeated – with a non-deterministic number of attempts required, using both the Windows display settings configuration and the Catalyst Control Center.

At first I thought it could be due to the fact that I am using a 4870X2 card, so I disabled one of the GPUs. That didn’t help. Then I tried using a different monitor driver, rather than the “Default Monitor” which is purely based on the EDID settings the monitor provides. I tried a ViewSonic VP2290b driver (this was a rebranded T221), and a custom driver created using PowerStrip based on the EDID settings, and neither helped. Since I only use Windows for occasional gaming and not for any serious work, this isn’t a show stopping issue for me, but I am stunned that AMD‘s Linux drivers are more stable and usable than the Windows ones when using even slightly unusual configurations.

To add a final insult to injury, 4870X2 card doesn’t end up running the monitor with one GPU running each 1920×2400 section. Instead, one GPU ends up running both, and the 2nd GPU remains idle. At first I attributed the tearing between the two halves of the screen to be due to each half being rendered by a different GPU. Unfortunately, considering that all tests show that one GPU remains cold and idle while the other one is shown to be under heavy load, I have to conclude that this is not the case. This is particularly disappointing because the experience is both visually bad (tearing between the two 1920×2400 sections) and poorly performing (one GPU always remains idle and the frame rates suffer quite badly – 7-9fps in the Crysis Demo Benchmark). I clearly recall that my Nvidia 9800GX2 card I had before had a configuration option to enable dual-screen dual-GPU mode.

I am just about ready to give up on AMD GPUs, purely because the drivers are of such poor quality and lacking important features (e.g. requirement of fakexinerama under Linux, something that Nvidia drivers have a built in option for). I’m going to dig out my trusty old 8800GT card and see how that compares.