Virtually Gaming, Part 2: Evolution – Consolidation and Move to KVM

In the previous article in this series, I detailed the journey to my original configuration with a single host providing multiple gaming capable virtual machines as a multi-seat workstation. But things have changed since then – many game distribution platforms such as Steam, GOG and Desura have native Linux versions, and many games have been ported to run natively on Linux. The vast majority of the ones that haven’t now work perfectly under WINE.

Consequently, the ideal solution has changed as well. In the original configuration, there were 3 seats on the system – two Windows VMs for gaming and one Linux VM for more serious use. At least one of the Windows VMs could now be removed, and it’s use replaced with WINE and native ports.

At the same time KVM advanced greatly in features and stability, and is now much better aligned with the requirements of this multi-seat workstation project. Perhaps most importantly, the latest QEMU even provides a feature that provides a much better workaround for the issue I had to patch Xen’s hvmloader for: max-ram-below-4g (option to the -machine parameter). Setting this to 1GB comprehensively works around the IOMMU compatibility bug of the Nvidia NF200 PCIe bridges on the EVGA SR-2, without any negative side effects.

Even better, KVM also includes patches the neuter the Nvidia driver’s ability to detect it is running in the VM (add kvm=off to the list of options passed to the -cpu parameter). That means that modifying the GPU firmware or hardware to make it appear as a Quadro or Tesla card is no longer required for using it in a virtual machine. This is a massive advantage over the original Xen solution for most people.

Summary of the most significant changes:

  • Host system updated to EL7 (CentOS)
    Required to facilitate easier running of more recent kernels and Steam (no more need to build and update an additional package set to support Steam as on EL6, including glibc). On the downside – this necessitates putting up with systemd.
  • Xen replaced by KVM
  • Windows 7 VM now uses UEFI instead of legacy BIOS
    This does away with all of legacy VGA complications such as VGA arbitration and the UEFI OVMF firmware even downloads and executes the PCI devices’ BIOS during the VM’s POST, which results in the full splash screen and even UEFI BIOS configuration menus being available during the VM boot on the external console.
  • XP x64 VM removed
    Superseded by using native Linux game ports and WINE for the rest (so far every XP compatible game I have tried works)

Some of the extra repositories I used for this are:

OVMF UEFI and SeaBIOS Firmware repository from here:

Mainline kernel from elrepo repository:

Bleeding edge QEMU (needed for the max-ram-below-4g option):

The full libvirt xml configuration file I use for QEMU is here:

<domain type='kvm' xmlns:qemu=''>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <sysinfo type='smbios'>
      <entry name='vendor'>GENERIC</entry>
      <entry name='version'>GENERIC</entry>
      <entry name='date'>01/01/2014</entry>
      <entry name='release'>0.91</entry>
      <entry name='manufacturer'>GENERIC</entry>
      <entry name='product'>GENERIC</entry>
      <entry name='version'>GENERIC</entry>
      <entry name='serial'>1</entry>
      <entry name='uuid'>11111111-1111-1111-1111-111111111111</entry>
      <entry name='sku'>GENERIC</entry>
      <entry name='family'>GENERIC</entry>
    <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
    <topology sockets='1' cores='4' threads='1'/>
  <clock offset='localtime'/>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' io='native'/>
      <source dev='/dev/zvol/normandy/edi'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    <interface type='bridge'>
      <mac address='52:54:00:11:22:33'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    <hostdev mode='subsystem' type='pci' managed='no'>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    <hostdev mode='subsystem' type='pci' managed='no'>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    <hostdev mode='subsystem' type='pci' managed='no'>
        <address domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='if=pflash,format=raw,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd'/>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,kvm=off'/>
    <qemu:arg value='-machine'/>
    <qemu:arg value='pc-i440fx-2.2,max-ram-below-4g=1G,accel=kvm,usb=off'/>

The reason for the qemu:commandline section is that libvirt and especially virt-manager do not actually understand all possible QEMU parameters. The ones that they don’t support directly are in this section to avoid errors and complaints from virsh and virt-manager in normal use.

You may also notice that there are some unusual sections and values in there, so let me touch upon them in groups.

Windows Activation and Associated Checks

When you first activate Windows with a key, it keeps track of several important details of the hardware in order to detect whether the same installation has been moved into another machine. Most licenses (e.g. OEM ones) are not transferable to another machine. So in order to ensure that our installation is portable (e.g. if we upgrade to a different hypervisor at a later date), we set the various values to something static, easily memorable and predictable, so that if we ever need to migrate the VM to another host, it will not cause deactivation issues. The important settings are here (these are not in all cases complete sections, only the fragments required for this purpose, see above for the full configuration):

  <sysinfo type='smbios'>
      <entry name='vendor'>GENERIC</entry>
      <entry name='version'>GENERIC</entry>
      <entry name='date'>01/01/2014</entry>
      <entry name='release'>0.91</entry>
      <entry name='manufacturer'>GENERIC</entry>
      <entry name='product'>GENERIC</entry>
      <entry name='version'>GENERIC</entry>
      <entry name='serial'>1</entry>
      <entry name='uuid'>11111111-1111-1111-1111-111111111111</entry>
      <entry name='sku'>GENERIC</entry>
      <entry name='family'>GENERIC</entry>
    <smbios mode='sysinfo'/>
    <disk type='block' device='disk'>

Nvidia Bugs/Features Workarounds

The following sections are required in order to work around the NF200 PCIe bridge bugs (max-ram-below-4g=1G) and the Nvidia driver feature that disables GeForce GPUs in virtual machines (kvm=off):

    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,kvm=off'/>
    <qemu:arg value='-machine'/>
    <qemu:arg value='pc-i440fx-2.2,max-ram-below-4g=1G,accel=kvm,usb=off'/>

CPU Configuration

    <topology sockets='1' cores='4' threads='1'/>

The reason this is important is because most non-server editions of Windows only allow up to two CPU sockets. By default QEMU presents each CPU core as being on a separate socket. That means that no matter how many CPUs you pass to your Windows VM, while they will all show up in Device Manager, only a maximum of two will be used (you can verify this using Task Manager). What the above configuration block does is instruct libvirt to tell QEMU to present four cores in a single CPU socket, so that all are usable in the Windows VM.

VFIO and Kernel Drivers

In my system I have two identical Nvidia GPUs. Numerically, the second one is primary (host), and the first one is the one I am passing to a virtual machine. I am also passing the NEC USB 3.0 controller to the VM. This is the script I wrote (in /etc/sysconfig/modules/) to bind the devices intended for the VM to the VFIO driver:


nvidia1=`lspci | grep "GTX 780 Ti" | head -1 | awk '{print $1;}'`
hda1=`echo $nvidia1 | sed -e 's/\.0$/.1/'`

nvidia2=`lspci | grep "GTX 780 Ti" | tail -1 | awk '{print $1;}'`
hda2=`echo $nvidia2 | sed -e 's/\.0$/.1/'`

nec=`lspci | grep "NEC" | awk '{print $1;}'`

echo nvidia        > /sys/bus/pci/devices/0000:$nvidia2/driver_override
echo snd-hda-intel > /sys/bus/pci/devices/0000:$hda2/driver_override

echo vfio-pci      > /sys/bus/pci/devices/0000:$nvidia1/driver_override
echo vfio-pci      > /sys/bus/pci/devices/0000:$hda1/driver_override
echo vfio-pci      > /sys/bus/pci/devices/0000:$nec/driver_override

modprobe vfio-pci

echo 10de 1284     > /sys/bus/pci/drivers/vfio-pci/new_id
echo 10de 0e0f     > /sys/bus/pci/drivers/vfio-pci/new_id
echo 1033 0194     > /sys/bus/pci/drivers/vfio-pci/new_id

echo 0000:$nvidia1 > /sys/bus/pci/devices/0000:$nvidia1/driver/unbind
echo 0000:$hda1    > /sys/bus/pci/devices/0000:$hda1/driver/unbind
echo 0000:$nec     > /sys/bus/pci/devices/0000:$nec/driver/unbind

echo 0000:$nvidia1 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:$hda1    > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:$nec     > /sys/bus/pci/drivers/vfio-pci/bind

modprobe nvidia

 Note that the PCI device IDs will change if you add more hardware to the machine – that is why I wrote this script, rather than assigned the devices statically by ID. The above script works for me on my hardware – you will almost certainly need to modify it for your configuration, but it should at least give you a reasonable idea of the approach that works.

Important: The devices this identifies have to match what is in your libvirt XML config file in the relevant hostdev sections. You will have to adjust that manually for your configuration, either using virsh edit or virt-manager.

Also depending on your hardware, you may need to do the initial Windows installation on the emulated GPU rather than the real one (e.g. if you are using a USB controller for the VM that requires additional drivers, as is the case with the USB 3.0 controller I am using for my VM). Otherwise you will get display output but be unable to use your keyboard/mouse during the installation.

Gaming on Linux: Steam

Pre-packaged Steam binary used to be available form the rpmfusion repository, but this no longer appears to be there. Thankfully, there is also a maintained negativo17’s repository for Steam for Fedora 20+, which installs and runs fine on EL7. You may also need to grab a few RPMs from Fedora 19 because EL7 doesn’t ship with a full complement of 32-bit libraries. The ones I found I needed are these:


The reason these are from Fedora 19 is because F19 is virtually identical in terms of package versions to EL7.

Typically, the Steam RPM installation is a one-off, mostly to bootstrap the initial run, and install the dependencies. After that, a local version of Steam will be installed in the user’s home directory in ~/.local/share/Steam/. In light of the recent Steam bug resulting in deletion of the user’s entire home directory, I implemented a solution that runs Steam as a separate steam user, from that user’s own home directory. That way should anything similar to this ever happen, the only thing that would be deleted is the steam user’s home directory rather than any important files not related to running Steam games.

To do this, you will need to add a steam user, and give it necessary permissions:

$ sudo adduser steam
$ sudo usermod -a -G audio,games,pulse-access,video steam

Add the following to /etc/sudoers.d/steam:

%games ALL = (steam) NOPASSWD: /bin/steam

Create the following script (e.g. /usr/local/bin/

xhost +SI:localuser:steam
chgrp audio /run/user/$UID /run/user/$UID/pulse
chmod 750 /run/user/$UID /run/user/$UID/pulse
sudo -u steam /usr/bin/steam
sudo -u steam pkill dbus-launch

From there on, when you invoke, it will launch steam as the steam user, and pass the graphical output to the Xorg session of the logged in user. The net result is that any potentially damaging bug in Steam or associated games can only do damage to the files owned by the steam user. This security model is not dissimilar to the Android security model where every application runs under it’s own user, for similar security reasons.

Gaming on Linux: WINE

There are two obvious options for this:

1) PlayOnLinux

2) More traditional WINE (I use the one from DarkPlayer’s repository)

I only had to make one configuration change to WINE, and that is to disable the dwrite.dll library in WINE (to disable it, run winecfg, go to Libraries -> add dwrite.dll, edit dwrite.dll entry and set it to disabled). I am using XP version emulation, which isn’t even supposed to include dwrite.dll, and the problem it causes is that fonts are invisible in Steam and some other applications.

End Result

The end result is a much cleaner, virtual machine configuration: e.g. no missing RAM like before with Xen, due to the NF200 bug workaround, and no need for hardware modification of my GeForce cards. The performance seems very smooth, and so far the entire setup has been completely trouble free.

There is also one fewer virtual machine and one fewer GPU and one fewer VM in the system without any loss of functionality. Should I require an additional seat in the future, it will most likely be a Linux one, and implemented using a Xorg multi-seat configuration.

Virtually Gaming, Part 1: In the Beginning – Hardware and Xen

For about two years now I have managed to stick to the “No Windows on bare metal.” policy. This was instated for many reasons, including security and ease of backups (it is difficult to beat ZFS snapshots and send/receive functionality). The key reason for using Windows at all has been gaming, and both myself and my wife do play various games, mostly of the co-op FPS genre. While native Linux support has increased dramatically in that time, the availability of native Linux games still hasn’t quite reached parity with availability on the Windows platform.

Combining the “No Windows on bare metal.” policy with the requirement for high performance gaming capability meant that the only solution that fits is PCI passthrough of a high end GPU to the virtual machine. In this article I will describe the journey to the solution over the past two years, including (often unfortunate) choices of hardware, software, working around hardware, firmware, driver and software bugs, crippling and limitations, and other bumps on the road to virtualized gaming.


When I first embarked on this project, it was an off-shoot of the project to upgrade my workstation. While there was nothing wrong with my Quad Core 2 in terms of performance, I needed to get a second machine up and running for my wife. So, somewhat optimistically, I thought this would be an ideal opportunity to solve three problems at the same time:

  1. Get a gaming grade workstation up and running for my wife
  2. Virtualize the Windows part of my dual-boot setup so I never have to reboot for the sake of joining a game when my friends invite me
  3. Implement the “No Windows on bare metal.” policy

The motherboard that caught my eye was the EVGA SR-2. It seemed to fit all of the necessary requirements:

  1. Plenty of CPU power (dual socket, capable of taking up to two 6-core Xeons)
  2. Full support for plenty of ECC memory (after the last build I have vowed to never build another machine without ECC RAM, having spent days troubleshooting a stock-setting stability issue that turned out to be marginal memory)
  3. Plenty of PCIe slots (7 x16 slots, with 64 usable PCIe lanes between them)
  4. VT-d support (originally listed in the spec, and confirmed with EVGA tech support prior to purchase – a claim that turned out to be rather stretching the truth)

A sizable investment into the motherboard, a pair of 6-core X5650 Xeons, and 48GB (6x8GB) of registered ECC RAM later, problems began.

Hardware Problems

The first motherboard I got turned out to have a faulty PCIe slot #1. The retailer I bought the motherboard from went bust a few weeks after my purchase, but EVGA generally have excellent RMA service, and I registered the motherboard as soon as I had received it to qualify for the full 10 year manufacturer’s warranty that was offered on this motherboard.

In order to not put my build on hold, before I RMA-ed the faulty SR-2, I bought another, second hand SR-2 on eBay. I thoroughly tested it, and to this day, this is the SR-2 that has been completely fault free in the main workstation that was the product of this project. It turns out, I was quite lucky to have bought a second motherboard – because the replacement that was sent was also faulty, and failed to reliably finish POST-ing with either of my CPUs in either socket. That got RMA-ed as well, and the replacement is currently in use as a prototyping rig for the next incarnation of this workstation, but that motherboard also has problems which cause it to fail to boot on a hot reboot (I am putting off RMA-ing it until the prototyping stage of the project is completed and being without a working prototyping machine for a week won’t be a problem.

In conclusion: Beware EVGA warranty replacement motherboards – they are all refurbished items that were sent back as faulty, and either repaired or the fault was never reproducible by their testing team so they got recycled as is. Always test any refurbished replacements extremely thoroughly (all slots, sockets, ports and features) when you receive them – if you get a faulty replacement, EVGA will pay for the shipping costs back to them for another replacement, but only within the first month after you receive the replacement, so acting quickly and thoroughly is of vital importance to avoid courier costs that can quickly add up to a lot.

More RAM

At this time I looked into using 96GB of RAM on the SR-2. This turned out to be very difficult as the machine would generally refuse to POST, except after a fresh CMOS reset. This was particularly annoying because the CPUs themselves (which contain the MCH) officially support 192GB of RAM each. After a lot of trial and error, I found a way to make the machine reliably post with 96GB of RAM:

  1. Use dual-ranked (this is important, single ranked won’t work for 96GB!) x4 registered 1600MHz 1.35V DIMMs
  2. Boot the machine with only 6 DIMMs. Go to the memory settings, and manually set all of the memory timings to what they defaulted to. Make sure you set the command rate to 2T (defaults to 1T).
  3. If you are overclocking, make sure you set the MCH strap to 1600MHz.

Do this and your SR-2 should POST with 96GB. It may require a few attempts where the motherboard re-sets itself and re-attempts the POST, but both of mine successfully POST within 30 seconds.

All of the symptoms indicate that there is a BIOS bug in timeouts at various stages of the POST that cause some initializations to fail and time out when more than 48GB of RAM is used. Officially, EVGA only claim the SR-2 supports up to 48GB of RAM, and it is unlikely they will be fixing this BIOS bug.


Back when I began this project (late 2012), the only hypervisor with notable reports of GPU passthrough success without requiring a lot of manually applied experimental patches was Xen, so this was what I chose for the project. Additionally, my previous tests indicated that the performance overheads of using Xen were among the lowest of all the available hypervisors, so it seemed like a win-win situation.

The primary GPU in the machine was an ageing but perfectly adequate GeForce 8800GT that came from my previous workstation. Then I had to select a suitable GPU for passthrough to a virtual machine. Nvidia passthrough only worked on expensive Quadro (and not all Quadros, only the expensive ones), Tesla and Grid cards which they refer to as “MultiOS compatible”. The cost of most of those made them not an option worth considering. That meant trying an ATI card, so I got a cheap passively cooled single-slot Radeon HD6450. This is where a whole array of real problems began:

  1. EVGA SR-2 motherboard uses a pair of NF200 PCIe bridges to multiplex 32 PCIe lanes available on the upstream Intel 5520 PCIe hub into 64 PCIe lanes available for GPUs. NF200 bridges have severe bugs and limitations when it comes to compatibility with VT-d. They bypass IOMMU for DMA transfers, so when the VM tries to access RAM within it’s virtual address space that overlaps the physical address of a PCI BAR (aperture) that belongs to a hardware device, the memory writes will hit the BAR, which will crash the machine (and maybe corrupt your disks, if the BAR being trampled belongs to a disk controller). The solution to this was to write a hvmloader patch that marked all of the IOMEM areas from the host as reserved. This was an ugly bodge that resulted in a fair amount of memory in the domU (what Xen calls guest VMs) becoming unusable, but it worked (and with enough RAM it wasn’t a major problem).
  2. More than likely related to point 1, this motherboard appears to have broken (or non-existent) support for interrupt remapping, which means that any devices passed to a VM have to have dedicated, unshared interrupts. If you pass a device sharing an interrupt to the VM, the VM will most likely crash the entire host. Problems 1 and 2 are very similar in symptoms (host crash), which made them quite difficult to troubleshoot and get to the bottom of because no one change to the configuration made the problem go away. It took some help from the Xen developers and a fair amount of guesswork to figure it all out. The only solution is to move cards around to different slots until all of the hardware you intend to pass through to virtual machines has dedicated interrupts that aren’t shared with other hardware. This can be fiddly, but it is generally achievable – in my final configuration, I am successfully passing two GPUs and three USB controllers to VMs.
  3. ATI cards suffer from terrible drivers that fail to re-initialize the card without full BIOS level re-POST-ing (and said re-POST-ing doesn’t happen when the VM is rebooted, only when the entire physical machine is rebooted). The consequence is that they work OK when the VM is first booted up after a host reboot, but subsequent VM reboots result in massive performance degradation, glitches, and sometimes complete host crashes. While some of this is being worked on (e.g. functionality to reset the GPU via a bus reset from Xen dom0), it is still not available in the current released version. This particular problem turned out to not be easily solvable (having already written a patch for Xen’s hvmloader, I was very keen to avoid having to write any more to implement PCI bus resetting functionality for the Xen pci-stub driver. To at least be able to prove the concept, I bought the cheapest Nvidia Quadro that is supported for GPU passthrough (Quadro 2000), and this worked absolutely fine. Having finally found a solution that works perfectly, I went on to find ways of making GeForce cards work with PCI passthrough through fooling the Nvidia driver into initializing them even though they weren’t expensive enough, by modifying the cards’ ID number into an equivalent Quadro card. As discussed in previous articles, Nvidia cards up to and including the Fermi generation can be modified into equivalent Quadro cards by changing the appropriate ID strap bits in the cards’ BIOS using nvflash. Kepler cards require a small hardware modification. The easiest modifications are GTX 680 to Tesla K10 (remove one resistor) and GTX780Ti to Quadro K6000 (add one large, easy to solder resistor across appropriate pins on the EEPROM). I am currently running a pair of GTX 780Ti cards.

Issues 1 and 2 listed above are why I said that claiming the SR-2 supports VT-d was seriously stretching the truth. On a well designed workstation motherboard, the above problems should never have arisen. After all that, and many, many man-days invested in it working around the various bugs mentioned above, I have Xen working on the system, with EL6 (CentOS) dom0, and two domUs, one running XP x64 and one running Windows 7 x64. The hardware passed through on PCIe level is:

XP x64:

  • Intel ICH10 HD Audio
  • 2x ICH10 USB
  • GeForce GTX 780Ti

Windows 7 x64:

  • NEC USB 3 controller
  • GeForce GTX 780Ti

GRUB options:

 kernel /xen.gz noreboot unrestricted_guest=1 msi=1
 module <kernel and options> intel_iommu=on pcie_ports=compat

Note that unrestricted_guest=1 and pcie_ports=compat are required on the SR-2, but may not be required if you hardware behaves better. If your IOMMU implementation is good and includes ACS functionality, you shouldn’t need unrestricted_guest=1.

pcie_ports=compat is required because without it the SR-2 makes the PCI hotplug driver flap very quickly on one of the PCI devices built into the south bridge chipset, which causes an interrupt flood that makes the machine grind to a halt. (Have I mention enough times yet that the SR-2 is extremely buggy?)

Xen domU config:


disk=[ '/dev/zvol/ssd/mydomu,raw,hda,rw', '/dev/sr0,raw,hdc:cdrom,rw' ]
vif=[ 'mac=00:11:22:33:44:55,bridge=br0,model=e1000', ]
pci = [ '07:00.0', '07:00.1', '00:1b.0', '00:1a.1' ]

Obviously you will need to change things like PCI addresses, MAC addresses, block device paths, and suchlike to suit your own system.


options xen-pciback permissive=1 hide=(07:00.0)(07:00.1)(00:1b.0)(00:1a.1)

Note the PCI IDs in the xen-pciback module options correspond to the PCI IDs in the Xen domU configuration. You may not need permissive=1 if you have better hardware than I do.

And Another Thing

One thing I feel I have to mention is that I have had extremely bad experience with every SAS card I have tried to use in the SR-2 with virtualization. This includes two different LSI cards, an Adaptec card and a 3ware card. They all work fine in a normal bare metal setup, and cause all kinds of crash inducing problems, some more difficult to debug than others when IOMMU is enabled and VMs are running with PCI devices passed through to them. SATA cards (I tried Silicon Image and Marvell), OTOH, seem to always work just fine, with no problems whatsoever, including when using 1:5 SATA port multipliers. In some cases this is caused by the SAS controller being a native PCI-X chip and using a phantom PCI-X to PCIe bridge. In other cases it seems to be caused by the SAS card’s driver trying to do some interesting DMA accesses that crash the entire host when virtual machines are running with PCI devices passed through to them. In short – avoid using SAS cards and stick with SATA – but then again I find that to be good advice to follow regardless.

This setup has worked without any significant problems for the past two years. But things have changed in that time. There is now a native Linux version of Steam, and many games have native Linux ports. It is time that this long term reliable system is updated accordingly. More on that in the next article.

Chromebook Pixel – Long Term Review

I have been using my Chromebook Pixel for nearly a year now, so I feel it has been long enough to form a reasonably objective view, which may be useful to others who are considering buying one.

When I bought it, I was looking for a worthy successor to my venerable ThinkPad T60. The ThinkPad had been upgraded as far as it would go, with a 2.33GHz Core 2 Duo, 3GB of RAM, and most importantly, a 2048×1536 screen. It is still quite a usable machine, but the main reasons why I was looking for a replacement were battery life (90 minutes with the extended capacity battery on a good day), and weight (I haven’t weighed it, but when carrying it around for any length of time it feels like it weighs a tonne. All in all, barely livable with for the commute to work.

The Pixel was promising to address all the issues I had with the ThinkPad – it weighs a fraction as much, the battery life is about 6 hours, depending on the load, the other features are no worse, with the screen being a significant improvement on the ThinkPad. Since I use Linux (EL6), I needed to make sure all of the hardware is fully supported, which was the main reason why I didn’t choose a Macbook Pro Retina – the only other contender at the time.

Needless to say, ChromeOS only lasted for long enough to enable developer mode to facilitate installing a proper Linux distribution.

How did this very promising spec on paper work out in reality? Well, my experience is very mixed. The performance is more than sufficient, even for light gaming loads (e.g. Left 4 Dead 2 with maxed out settings at 1280×800, quarter of native resolution). The screen is nothing short of amazing. The touchpad is reliable. The battery life is good. But that is where the good things I can say about it end.

There are two things that let it down quite badly. The keyboard is less than perfect – it lacks a number of keys: PgUp, PgDn, Home, End, Delete, Insert, F11 and F12. While inconvenient, this is reasonably workable around using a custom keyboard map and key combinations using AltGr.

The fundamental thing that makes the Chromebook Pixel nearly unusuable is the amount of heat it produces. Under any load above idle, the aluminium casing gets too hot to touch for any length of time. Under a gaming load, even the plastic keys on the keyboard get so hot they are painful to touch. The CPU itself doesn’t overheat (it tops out at about 85C), but the outer casing gets past 47C within a few minutes of Left4Dead 2.

Chromebook Pixel Temperature

47C may not sound like a lot, but given the high thermal conductivity of aluminium, 47C is actually very uncomfortable to touch. This problem isn’t unique to the Chromebook, either – I had a similar problem with the Macbook Pro Retina I was using at work previously. Consequently, I can only strongly recommend against getting the Chromebook Pixel.

Due to these issues, I am still using my old ThinkPad more frequently than the Pixel. My commute to work machine is now an ARM based Chromebook (XE303C12), which stays stone cold even under a heavy load, the battery lasts 6-8 hours, and is even lighter than the Pixel. It’s touchpad is quite terrible, but I can live with that in return for it not burning me as soon as I ask it to compile something for me.

All I can say is – beware the marketing hype and sexy looks. A laptop that looks fantastic on paper can easily turn out to be nearly useless due to how hot it gets.

Microsoft Security Essentials on 64-bit XP

Yet another Windows related article – this detour from more typical content is expected to be short lived.

Microsoft Security Essentials was never officially supported on 64-bit Windows XP, but version 2 nevertheless installed on it and worked fine. Version 4 (version 3 never existed) refuses to install directly, saying that the version of Windows is unsupported. However, if you install version 2, the version 4 installer will happily run and install version 4 as an upgrade. It will pop up a message every time you log in warning that XP64 is EOL, but otherwise it will work just fine. So the trick is to install version 2 and then upgrade to version 4.

You may be wondering why this is relevant. My findings are that most realtime anti-malware programs thoroughly cripple performance. I used to run ClamWin+ClamSentinel as one of the least bad options, but even this was quite crippling. MSSE, on the other hand, is much more lightweight, and has thus far proved itself to be as effective in tests as most of the alternatives. The overall performance of the system is now much more acceptable.

Chrome Installer Error 0xc0000005 on Windows XP

I don’t tend to write much about Windows because it’s usefulness to me is limited to functioning as a Steam boot loader, and even that usefulness is somewhat diminished with Steam and an increasing number of games being available for Linux. Unfortunately, I recently had to do some testing that needed to be carried out using a Windows application, and I noticed that Chrome reported the above error when attempting to update itself.

The Chrome installer crash with the opaque 0xc0000005 error code on XP64 (Chrome is still supported on XP, even though MS is treating XP as EOL). Googling the problem suggested disabling the sandbox might help, but this isn’t really applicable since the problem occurs with the installer, not once Chrome is running (it runs just fine, it’s updating it that triggers the error).

A quick look at the crash dump revealed that one of the libraries dynamically linked at crash time was the MS Application Verifier, used for debugging programs and sending them fake information on what version of Windows they are running on. Uninstalling the MS Application Verifier cured the problem.

Steam on EL6 (RHEL6 / Scientific Linux 6 / CentOS 6)

The fact that Steam have decided to only officially support .deb based distributions, and only relatively recent ones at that has been a pet peeve of mine for quite some time. While there are ways around the .deb only official package availability (e.g. alien), the library requirements are somewhat more difficult to reconcile. I have finally managed to get Steam working on EL6 and I figure I’m probably not the only one interested in this, so I thought I’d document it.

Different packages required to do this have been sourced from different locations (e.g. glibc from fuduntu project, steam src.rpm from (not really a source rpm, it just packages the steam binary in a rpm), most of the rest from more recent Fedoras, etc.). I have rebuilt them all and made them available in one place:

You won’t need all of them, but you will need at least the following:


First install some the dependencies from the standard distribution packages:

yum install gtk2-engines.i686 \
            openal-soft.i686 \
            alsa-plugins-pulseaudio.i686 \

The install the updated packages:

rpm -Uvh glibc-2.15-60.el6.i686.rpm \
         glibc-2.15-60.el6.x86_64.rpm \
         glibc-common-2.15-60.el6.x86_64.rpm \
         glibc-devel-2.15-60.el6.x86_64.rpm \
         glibc-headers-2.15-60.el6.x86_64.rpm \
         libtxc_dxtn-1.0.0-2.1.i686.rpm \
         SDL2-2.0.3-2.el6.i686.rpm \
         steam- \
         xz-5.0.5-1.el6.x86_64.rpm \
         xz-compat-libs-5.0.5-1.el6.x86_64.rpm \
         xz-libs-5.0.5-1.el6.x86_64.rpm \

If you have pyliblzma from EPEL installed (required by, e.g. mock), updated xz-lzma-compat package will trigger a python bug that causes a segfault. This will incapacitate some python programs (yum being an important one). If you encounter this issue and you must have pyliblzma for other dependencies, reinstall the original xz package versions after you run steam for the first time. Updated xz only seems to be required when the steam executable downloads updates for itself.

Finally, run steam, log in, and let it update itself.

One of the popular games that is available on Linux is Left 4 Dead 2. I found that on ATI and Nvidia cards it doesn’t work properly in full screen mode (blank screen, impossible to Alt-Tab out), but it does work on Intel GPUs. It works on all GPU types in windowed mode. Unfortunately, it runs in full screen mode by default, so if you run it without adjusting its startup parameters you may have to ssh into the machine and forcefully kill the hl2_linux process. To work around the problem, right click on the game in your library, and go to properties:

Steam Game Properties

Click on the “SET LAUNCH OPTIONS…” button:

Steam Game Properties 2

You will probably want to specify the default resolution as well as the windowed mode to ensure the game comes up in a sensible mode when you launch it.
Add “-windowed -w 1280 -h 720″ to the options, which will tell L4D2 to start in windowed mode with 1280×720 resolution. The resolution you select should be lower than your monitor’s resolution.

Steam Game Launch Options

If you did all that, you should be able to hit the play button and be greeted with something resembling this:

Left4Dead 2 with Steam on Linux

ATI cards using the open source Radeon driver (at least with the version 7.1.0 that ships with EL6) seem to exhibit some rendering corruption, specifically some textures are intermittently invisible. This leads to invisible party members, enemies, and doors, and while it is entertaining for the first few seconds it renders the game completely unplayable. I have not tested the ATI binary driver (ATI themselves recommend the open source driver on Linux for older cards and I am using a HD6450).

Nvidia cards work fine with the closed source binary driver in windowed mode, and performance with a GT630 constantly saturates 1080p resolutions with everything turned up to maximum. I have not tested with the nouveau open source driver.

With Intel GPUs using the open source driver, everything works correctly in both windowed and full screen mode, but the performance is nowhere nearly as good as with the Nvidia card. With all the settings set to maximum, the performance with the Intel HD 4000 graphics (Chromebook Pixel) is roughly the same at 1920×1200 resolution as with the Radeon HD6450, producing approximately 30fps. The only problem with playing it on the Chromebook Pixel is that the whole laptop gets too hot to touch, even with the fan going at full speed. Not only does the aluminium casing get too hot to touch, the plastic keys on the keyboard themselves get painfully hot. But that story is for another article.

QNAP TS-421 – Review, Modification and RedSleeve Linux


With the RedSleeve Linux release rapidly approaching, I needed a new server. The current one is a DreamPlug with an SSD and although it has so far worked valiantly with perfect reliability, it doesn’t have enough space to contain all of the newly build RPM packages (over 10,000 of them, including multiple versions the upstream distribution contains), and is a little lower on CPU (1.2GHz single core) and RAM (512MB) than ideal to handle the load spike that will inevitably happen once the new release becomes available. I also wanted a self contained system that doesn’t require special handling with many cables hanging off of it (like SATA or USB external disks). I briefly considered the Tonido2 Plug, but between the slower CPU (800MHz) and the US plug, it seemed like a step backward just for the added tidyness of having an internal disk.


The requirements I had in mind needed to cover at least the following:
3) At least a 1.2GHz CPU
4) At least 512MB of RAM
5) Everything should be self contained (no externally attached components)


Very quickly the choice started to focus on various NAS appliances, but most of them had relatively non-existant community support for running custom Linux based firmware. The one exception to this is QNAP NAS devices which have rather good support from the Debian community; and where there is a procedure to get one Linux distribution to run, getting another to run is usually very straightforward. After a quick look through the specifications, I settled on the QNAP TS-421, which seems to be the highest spec ARM based model:

CPU: 2GHz ARMv5 Marvell Kirkwood (same as in the DreamPlug but 66% higher clock speed)
RAM: 1GB (twice as much as DreamPlug)
SATA: 4x 3.5″ SATA disk trays, based on the excellent Marvell 88SX7042 PCIe SATA controller
eSATA: 2x
Ethernet: 2x Gigabit (same as DreamPlug)
USB: 2x 2.0, 2x 3.0


At the time when I ordered the QNAP TS-421, it was listed as supporting 4TB drives – the largest air filled that were available at the time. I ordered 4x 4TB HGST drives because they are known to be more reliable than other brands. In the 10 days since then Toshiba announced 5TB drives, but these are not yet commercially available. I briefly considered the 6TB Helium filled Hitachi drives, but these are based on a new technology that has not been around for long enough for long term reliability trends to emerge – and besides, they were prohibitively expensive (£87/TB vs £29/TB for the 4TB model), and to top it all off, they are not available to buy.


Once the machine arrived, it was immediately obvious that the build quality is superb. One thing, however, bothered me immediately – it uses an external power brick, which seems like a hugely inconvenient oversight on an otherwise extremely well designed machine.

In order to make playing with alternative Linux installations I needed to get serial console access. To do this you will need a 3.3V TTL serial cable, same as what is used on the Raspberry Pi. These are cheaply available from many sources. One thing I discovered the hard way after some trial and error is that you need to invert the RX and TX lines between the cable and the QNAP motherboard, i.e. RX on the cable needs to connect to TX on the motherboard, and vice versa. There is also no need to connect the VCC line (red) – leave it disconnected. My final goal was to get RedSleeve Linux running on this machine, the process for which is documented on the RedSleeve wiki so I will not go into it here.


One thing that becomes very obvious upon opening the QNAP TS-421 is that there is ample space inside it for a PSU, which made the design decision to use an external power brick all the more ill considered. So much so that I felt I had to do something about it. It turns out the standard power brick it ships with fits just fine inside the case. Here is what it looks like fitted.

QNAP TS-421 with internalized PSU
QNAP TS-421 with internalized PSU
QNAP TS-421 with internalized PSU

It is very securely attached using double sided foam tape. Make sure you make some kind of a gasket to fit between the PSU and the back of the case – this is in order to prevent upsetting the crefully designed airflow through the case. I used some 3mm thick expanded polyurethane which works very well for this purpose. The cable tie is there just for extra security and to tidy up the coiled up DC cable that goes back out of the case and into the motherboard’s power input port. This necessitated punching two 1 inch holes in the back of the case – one for the input power cable and one for the 12V DC output cable. I used a Q.Max 1 inch sheet metal hole punch to do this. There is an iris type grommet for the DC cable to prevent any potential damage arising from it rubbing on the metal casing.

QNAP TS-421 with cable holes punched through the back of the case

The finished modification looks reasonably tidy and is a vast improvement on a trailing power brick.

QNAP TS-421 running RedSleeve Linux

One other thing worth mentioning is that internalizing the PSU makes no measurable difference to internal temperatures with the case closed. In fact, if anything the PSU itself runs cooler than it does on the outside due to the cooling fan inside the case. The airflow inside the case is incredibly well designed, hence the reason why it is vital you use a gasket to seal the gap between the power input port on the PSU and the back of the case. To give you the idea of just how well the airflow is designed, with the case off, the HGST drives run at about 50-55C idle and 60-65C under load. With the case on they run at about 30C idle and 35C under full load (e.g. ZFS scrub or SMART self tests).

Virtualized Gaming: Nvidia Cards, Part 3: How to Modify 2xx – 4xx series GeForce into a Quadro

There has been a large amount of interest in the previous two articles in this series and many calls for a modifying guide. In this article I will explain the details of how to modify your Fermi based GeForce card into a corresponding equivalent Quadro card. Specifically, you the following:

GeForce Model GPU Quadro Model
GeForce GTS450 GF106 Quadro 2000
GeForce GTX470 GF100 Quadro 5000
GeForce GTX480 GF100 Quadro 6000

The Tesla (2xx/3xx) and Fermi (4xx) series of GPUs can be modified by modifying the BIOS. Earlier cards can also be modified, but the modification is slightly different to what is described in this article. There is no hardware modification required on any of these cards. The modification is performed by modifying what is known as the “straps” that configure the GPU at initialization time. The nouveau project (free open source nvidia driver implementation for Xorg) has reverse engineered and documented some of the straps, including the device ID locations. We can use this to change the device ID the card reports. This causes the driver to enable a different set of features that it wouldn’t normally expose on a gaming grade card, even though the hardware is perfectly capable of it (you are only supposed to have those features if you paid 4-8x more for what is essentially the same (and sometimes even inferior) card by buying a Quadro).

The main benefit of doing this modification is enabling the card to work in a virtual machine (e.g. Xen). If the driver recognizes a GeForce card, it will refuse to initialize the card from a guest domain. Change the card’s device ID into a corresponding Quadro, and it will work just fine. On the GF100 models, it will even enable the bidirectional asynchronous DMA engine which it wouldn’t normally expose on a GeForce card even though it is there (on GF100 based GeForce cards only a unidirectional DMA engine is exposed). This can potentially significantly improve the bandwidth between the main memory and GPU memory (although you probably won’t notice any difference in gaming – it has been proven time and again that the bandwidth between the host machine and the GPU is not a bottleneck for gaming workloads).

Another thing that this modification will enable is TCC mode. This is particularly of interest to users of Windows Vista and later because it avoids some of the graphics driver overheads by putting the card in a mode only used for number-crunching. Note: Although most Quadros have TCC mode available, you may want to look into modifying the card into a corresponding Tesla model if you are planning to use it purely for number crunching. You can use the same method described below, just find a Tesla based on the same GPU with equal or lower number of enabled shader processors, find it’s device ID in the list linked at the bottom of the article, and change the device IDs using the strap.

Before you begin even contemplating this make sure you know what you are doing, and that the instructions here come with no warranty. If you are not confident you know what you are doing, buy a pre-modified card from someone instead or get somebody who does know what they are doing to do it for you.

To do this, you will require the following:

  • NVFlash for Windows and/or NVFlash for DOS
    Note: You may need to use the DOS version – for some reason the Windows version didn’t work on some of my Fermi cards. If you use the DOS version, make sure you have a USB stick or other media set up to boot into DOS.
  • Hex editor. There are many available. I prefer to use various Linux utilities, but if you want to use Windows, HxD is a pretty good hex editor for that OS. It is free, but please consider making a small donation to the author if you use it regularly.
  • Spare Graphics card, in case you get it wrong. If you are new to this, your boot graphics card (the spare one, not the one you are planning to modify) should preferably not be an Nvidia one (to avoid potential embarrassment of flashing the wrong card). Skip this part at your peril.

On Fermi BIOS-es the strap area is 16 bytes long and it starts at file offset 0x58. Here is an example based on my PNY GTX480 card:
0000050: e972 2a00 de10 5f07 ff3f fc7f 0040 0000 .r*..._..?...@..
0000060: ffff f17f 0000 0280 7338 a5c7 e92d 44e9 ........s8...-D.

The very important thing to note here is that the byte order is little-endian. That means that in order to decode this easily, you should re-write the highlighted data as:
7FFC 3FFF 0000 4000 7FF1 FFFF 8002 0000

This represents two sets of straps, each containing an AND mask and an OR mask. The hardware level straps are AND-ed with the AND mask, and then OR-ed with the OR mask.

The bits that control the device ID are 10-13 (ID bits 0-3) and 28 (bit 4). We can ignore the last 8 bytes of the strap since all the bits controlling the device ID is in the first 8 bytes.

This makes the layout of the strap bits we need to change a little more obvious:

Fxx4xxxx xxxxxxxx xx3210xx xxxxxxxx
   ^                ^^^^
   |                ||||-pci dev id[0]
   |                |||--pci dev id[1]
   |                ||---pci dev id[2]
   |                |----pci dev id[3]
   |---------------------pci dev id[4]
F - cannot be set, always fixed to 0

The device ID of the GTX480 is 0x06C0. In binary, that is:
0000 0110 1100 0000
We want to modify it into a Quadro 6000, which has the device ID 0x06D8. In binary that is:
0000 0110 1101 1000

The device ID differs only in the low 5 bits, which is good because we only have the low 5 bits available in the soft strap.

So we need to modify as follows
From:   0000 0110 1100 0000
To:     0000 0110 1101 1000
Change: xxxx xxxx xxx1 1xxx

We only need to change two of the strap bits from 0 to 1. We can do this by only adjusting the OR part of the strap.

It is easier to see what is going on if we represent this as follows:

ID Bit:   4                  32 10
Strap: -xxA xxxx xxxx xxxx xxAx xxxx xxxx xxxx
Old Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  00        00        40        00
       0000 0000 0000 0000 0100 0000 0000 0000
New Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  10        00        60        00
       0001 0000 0000 0000 0110 0000 0000 0000

Note that in the edit mask above, bit 31 is marked as “-“. Bit 31 is always 0 in both AND and OR strap masks.
Bits we must keep the same are marked with “x”. Bits we need to amend are marked with “A”.

So what we need to do is flash the edited strap to the card. We could do this directly in the BIOS, but this would require calculating the strap checksum, which is tedious. Instead we can use nvflash to take care of the strap rewrite for us, and it will take care of the checksum transparently.
The new strap is:
0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x80020000
The second pair is unchanged from where we read from the BIOS above. Make sure you have ONLY changed the device ID bits and that your binary to hex conversion is correct – otherwise you stand a very good chance of bricking the card.

We flash this onto the card using:
nvflash --index=X --straps 0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x00020000
1) The last OR strap is 0x00020000 even though the data in the BIOS reads as if it should be 0x80020000. You cannot set the high bit (the left-most one) to 1 in the OR strap (just like you cannot set it to 0 in the AND strap). Upon flashing nfvlash will turn the high bit to 1 for you and what will end up in the BIOS will be 0x80020000 even though you set it to 0x00020000. This is rather unintuitive and poorly documented.
2) You will need to check what the index of the card you plan to flash is using nvflash -a, and replace X with the appropriate value.

Here is an example (from my GTX480, directly corresponding the the pre-modification fragment above) of how the ROM differs after changing the strap:

0000050: e972 2a00 de10 5f07 ff3f fc7f 0060 0010 .r*..._..?...`..
0000060: ffff f17f 0000 0280 7338 a597 e92d 44e9 ........s8...-D.

The difference at byte 0x6C is the strap checksum that nvflash calculated for us.

Reboot and your card should now get detected as a Quadro 6000, and you should be able to pass it through to your virtual machine without problems. I have used this extensively to enable me to pass my GeForce 4xx series cards to my Xen VMs for gaming. I will cover the details of virtualization use with Xen in a separate article. Note that I have had reports of cards modified using this method also working virtualized using VMware vDGA, so if this is your preferred hypervisor, you are in luck. Quadro 5000 and 6000 are also listed as supported for VMware vSGA virtualization, so that should work, too – if you have tried vSGA with a modified GeForce card, please post a comment with the details.

The same modification method described here should work for modifying any Fermi card into the equivalent Quadro card. Simply follow the same process. You may find this list of Nvidia GPU device IDs useful to establish what device ID you want to modify the card to. The GPU should match between the GeForce card the the Quadro/Tesla/Grid you are modifying to – so check which Nvidia card uses which GPU.

Many thanks to the nouveau project for reverse engineering and documenting the initialization straps, and all the people who have contributed to the effort.

In the next article I will cover modifying Kepler GPU based cards. They are quite different and require a different approach. There are also a number of pitfalls that can leave you chasing your tail for days trying to figure out why everything checks out but the modification doesn’t work (i.e. the card doesn’t function in a VM).

Virtualized Gaming: Nvidia Cards, Part 2: GeForce, Quadro, and GeForce Modified into a Quadro – Higher End Fermi Models

Following the success with QuadForce 2450 modification (GeForce GTS450 -> Quadro 2000), I went on to investigate whether the same modification will work on the GTX470 to turn it into a Quadro 5000 and on a GTX480 to turn it into a Quadro 6000. Modifying a GTX580 into a somewhat obscure Quadro 7000 was also undertaken.

Model Core Configuration Memory Channels Memory
GeForce GTX470 448:56:40 5x 1.25GB
GeForce GTX480 480:60:48 6x 1.50GB
Quadro 5000 352:44:40 5x 2.50GB
Quadro 6000 448:56:48 6x 6.00GB

In all three cases, the modifications were successful, and they all worked as expected – features like VGA passthrough work on the 5000 and 6000 models and gaming performance is excellent, as you would expect – I can play Crysis at 3840×2400 in a virtual machine. Again, the extra GL functions aren’t there (if you compare the output of glxinfo between a real Quadro and a QuadForce, you will find a number of GL primitives missing), so some aspects of OpenGL performance are still crippled. PhysX support is also a little hit-and-miss. In a VM, on Windows 7 it seems to work on Quadro cards; on XP it appears to not be working. On bare metal on Windows XP it works. This appears to be due to the Quadro driver itself, rather than due to the cards not being genuine Quadros.

Finally, the GF100 based cards (GTX470/480) also get an extra feature enabled by the modification – second DMA channel. Normally there is a unidirectional DMA channel between the host and the card. Following the modification, the second DMA channel in the other direction is activated. This has a relatively moderate impact on gaming performance, but it can have a very large impact on performance of I/O bound number crunching applications since it increases the memory bandwidth between the card and the system memory (you can read and write to/from the GPU memory at the same time). Compare the CUDA-Z Memory report for the GTX470 before and after modifying it into a Quadro 5000 – GTX470 only has a unidirectional async memory engine, but after modifying it the engine becomes bidirectional:

GTX470 CUDA-Z MemoryQuadForce 5000 CUDA-Z Memory

The same happens on the GTX480 – it’s async engine also becomes bidirectional after modification.

Quadro 7000 is a little different from the other two. It doesn’t have dual DMA channels, and Nvidia don’t list it as MultiOS capable. The drivers do not do the necessary adjustments to make it work with VGA passthrough. That means that, unfortunately, the gain from modifying a GTX580 is questionable in terms of what you will gain. Note, however, that the Quadro 7000 was never aimed at the virtualization market; it was only available as a part of the QuadroPlex 7000 product – an external GPU enclosure designed for driving multiple monitors for various visualisation work. Hence the lack of MultiOS support on it.

Here is how the QuadForce 5470 does in SPECviewperf (GTX470 = 100%):

QuadForce 5470 SPECviewperf

Compared to the QuadForce 2450, the performance improvements are more modest – the only real difference is observable in the lightwave benchmark.

Unfortunately, my QuadForce 6480 is currently in use, so I cannot get measurements from it, but since the they are both based on the GF100 GPU, the results are expected to be very similar.

On the QuadForce 7580 there was no observed SPEC performance improvement.

I have since acquired a Kepler Based 4GB GTX680 and successfully modified it into Quadro K5000. Modifying it into a Grid K2 also works, but there don’t appear to be any obvious advantages from doing so at the moment (K5000 works fine for virtualization passthrough, even though it wasn’t listed as MultiOS last time I checked). This QuadForce K5680 is why my GTX470 became free for testing again. More on Quadrifying Keplers in the next article. I also have a GTX690 now (essentially two 680s on the same PCB), which will be replacing the QuadForce 6480, so this will also be written up in due time. Unfortunately, however, quadrifying Keplers in most cases requires some hardware as well as BIOS modifications. I will post more on all this soon, along with a tutorial on soft-modding.

Virtualized Gaming: Nvidia Cards, Part 1: GeForce, Quadro, and GeForce Modified into a Quadro

Recently I built a new system with the primary intention of running Linux the vast majority of the time and never having to stop what I am doing to reboot into Windows every time I wanted to play a game. That meant gaming in a VM, which in turn meant VGA passthrough. I am an Enterprise Linux 6 user, and Fedora is too bleeding edge for me. What I really wanted to run is KVM virtualization, but the support for VGA passthrough didn’t seem to work for me with EL6 packages, even after a selective update to much newer kernel, qemu and libvirt related packages. VMware ESX won’t work with PCI passthrough on my EVGA SR-2 motherboard because EVGA, in their infinite wisdom, decided to put all the PCIe slots behind Nvida NF200 routers/bridges which don’t support PCIe ACS functionality, which ESX requires for PCI passthrough. That left me with Xen as the only remaining option. I now mostly have Xen working the way I want – not without issues, but I will cover virtualized gaming and Xen details in another article. For now, what matters is that Xen VGA passthrough currently only works with ATI cards and Nvidia Quadro (but not GeForce) cards.

ATI cards are not an option for me due to various driver bugs (e.g. handling monitors on which refresh rate is dependant on resolution due to bandwidth limitations), lack of features (no option to use anything but EDID modes, to the extent of completely ignoring monitor driver .inf files; the custom mode feature used to exist in the drivers (the documentation for it can still be found on the AMD website) but has been removed at some point) and most importantly, lack of multiple DL-DVI outputs on cards more recent than the Radeon HD4xxx series (Radeon HD5xxx and later cards only come with a single DL-DVI port – on those that come with a second DVI port, even though it physically looks like a DL, it only provides a single link).

Nvidia GeForce cards don’t work in a virtual machine, at least not without unmaintained patches that don’t work with all cards and guest operating systems.

That leaves Nvidia Quadro cards. Unfortunately, those are eyewateringly expensive. But, on paper, the spec lists the same GPUs used on GeForce and Quadro cards. This got me looking into what makes a Quadro a Quadro and a few days of research and a weekend of experimentation yielded some interesting and very useful results. While it looks like some features such as certain GL functions are disabled in the chips (probably by laser cutting), some features are purely down to the driver deciding whether to enable them or not. It turns out, making cards work in a VM is one of the driver-depentant features.

Phase 1: Verify That Quadros Cards Work in a VM When GeForce Don’t

Looking at the specification and feature list of Quadro cards, Quadro 2000, 4000, 5000 and 6000 models support the “MultiOS” feature, which is what Nvidia calls VGA passthrough. So, the first thing I did was acquire a “cheap” second hand quadro Quadro 2000 on eBay. Cheap here being a relative term because a second hand Quadro costs between 3 and 8 times the amount the equivalent (and usually higher specification) GeForce card costs. The Quadro card proved to work flawlessly, but the Quadro 2000 is based on a GF106 chip with only 192 shaders, so gaming performance was unusable at 3840×2400 (I will let go of my T221 monitors when they are pried out of my cold, dead fingers). Gaming at 1920×1200 was just about bearable with some detail level reductions, but even so it was borderline.

Here is how the genuine Quadro 2000 shows up in GPU-Z and CUDA-Z:

Quadro 2000 GPU-ZQuadro 2000 CUDA-Z CoreQuadro 2000 CUDA-Z MemoryQuadro 2000 CUDA-Z Performance

And here are the genuine Quadro 2000 SPECviewperf11 results:

Viewset Composite
catia-03 23.86
ensight-04 16.63
lightwave-01 43.12
maya-03 36.25
proe-05 7.07
sw-02 32.21
tcvis-02 18.82
snx-01 17.50

Phase 2: Get an Equivalent GeForce Card and Investigate What Makes a Quadro a Quadro

The next item on the acquisition list was a GeForce GTS450 card. On paper the spec for a GTS450 is identical to a Quadro 2000:
192 shaders
1GB of GDDR5
Note: There are some models that are different despite also being called GTS450. Specifically, there is an OEM model that only has 144 shaders, and there is a model with 192 shaders but with GDDR3 memory rather than GDDR5. The DDR3 model may be more difficult to modify due to various differences, and the 144 shader model may not work properly as a Quadro 2000.

Armed with the information I dug out, I set out to modify the GTS450 into a QuadForce (a splice between a Quadro and a GeForce – and Gedro just doesn’t sound right). This was successful, and the card now detected as a Quadro 2000, and everything seemed to work accordingly. The VGA passthrough worked, and since the GTS450 is clocked significantly higher than the Quadro 2000, the gaming performance was improved to the point where 1920×1200 performance was quite livable with. What didn’t improve to Quadro levels is OpenGL performance of certain functions that appear to have been disabled on the GeForce GPUs. Consequently, SPECviewperf11 results are much lower than on a real Quadro 2000 card, but the GeForce GTS450 scores higher on every gaming test since games don’t use the missing functionality, and the GeForce card is clocked higher. It is unclear at the moment whether the extra GL functionality was disabled on the GPU die by laser cutting or whether it is disabled externally to the GPU, e.g. by different hardware strapping or pin shorting via the PCB components – more research into this will need to be done by someone more interested in those features than me. Since the stamped-on GPU markings are different between the GTS450 (GF106-250, checked against 3 completely different GDDR5 GTS450 cards) and the Quadro 2000 (GF106-875 on the one I have), it seems likely the extra GL functionality is laser cut out of the GPU.

Here is how the GTS450 modified to Quadro 2000 shows up in GPU-Z and CUDA-Z:
QuadForce 2000 GPU-ZQuadForce 2000 CUDA-Z CoreQuadForce 2000 CUDA-Z MemoryQuadForce 2000 CUDA-Z Performance

CUDA-Z performance seems to scale with the clock speeds, so the faux-Quadro card wins.

Here are the SPECviewperf11 results for a GTS450 before and after modifying it into a Quadro 2000. As you can see, in this test those missing GL functions make a huge difference, but in some tests there is still a substantial improvement:


Viewset Composite
catia-03 3.33
ensight-04 20.67
lightwave-01 10.80
maya-03 5.38
proe-05 0.36
sw-02 6.75
tcvis-02 0.35
snx-01 2.37

QuadForce 2450:

Viewset Composite
catia-03 3.24
ensight-04 17.83
lightwave-01 10.72
maya-03 7.75
proe-05 0.37
sw-02 6.87
tcvis-02 0.35
snx-01 2.35

Here is the data in chart form (relative performance, real Quadro 2000 = 100%).

GTS450 vs. Quadro 2000

As you can see the real Quadro dominates in all tests except ensignt-04 where it gets soundly beaten by the GeForce card. Modification does seem to improve some aspects of performance. In particular, Maya results seem to improve by a whopping 44% following the modification.

If you are only interested in support and VGA passthrough for virtual machines, modifying a GeForce card to a Quadro can be an extremely cost effective solution (especially if your budget wouldn’t stretch to a real Quadro card anyway). If you are only interested in performance of the kind measured by SPECviewperf, then depending on the applications you use, a real Quadro is still a better option in most cases.

Note: I am selling one of my Quadrified GTS450 cards. I bought several fully expecting to brick a few in the process of attempting to modify them, but the success rate was 100% so I now have more of them than I need.