Virtualized Gaming: Nvidia Cards, Part 3: How to Modify 2xx – 4xx series GeForce into a Quadro

here has been a large amount of interest in the previous two articles in this series and many calls for a modifying guide. In this article I will explain the details of how to modify your Fermi based GeForce card into a corresponding equivalent Quadro card. Specifically, you the following:

GEFORCE MODELGPUQUADRO MODEL
GeForce GTS450GF106Quadro 2000
GeForce GTX470GF100Quadro 5000
GeForce GTX480GF100Quadro 6000

The Tesla (2xx/3xx) and Fermi (4xx) series of GPUs can be modified by modifying the BIOS. Earlier cards can also be modified, but the modification is slightly different to what is described in this article. There is no hardware modification required on any of these cards. The modification is performed by modifying what is known as the “straps” that configure the GPU at initialization time. The nouveau project (free open source nvidia driver implementation for Xorg) has reverse engineered and documented some of the straps, including the device ID locations. We can use this to change the device ID the card reports. This causes the driver to enable a different set of features that it wouldn’t normally expose on a gaming grade card, even though the hardware is perfectly capable of it (you are only supposed to have those features if you paid 4-8x more for what is essentially the same (and sometimes even inferior) card by buying a Quadro).

The main benefit of doing this modification is enabling the card to work in a virtual machine (e.g. Xen). If the driver recognizes a GeForce card, it will refuse to initialize the card from a guest domain. Change the card’s device ID into a corresponding Quadro, and it will work just fine. On the GF100 models, it will even enable the bidirectional asynchronous DMA engine which it wouldn’t normally expose on a GeForce card even though it is there (on GF100 based GeForce cards only a unidirectional DMA engine is exposed). This can potentially significantly improve the bandwidth between the main memory and GPU memory (although you probably won’t notice any difference in gaming – it has been proven time and again that the bandwidth between the host machine and the GPU is not a bottleneck for gaming workloads).

Another thing that this modification will enable is TCC mode. This is particularly of interest to users of Windows Vista and later because it avoids some of the graphics driver overheads by putting the card in a mode only used for number-crunching. Note: Although most Quadros have TCC mode available, you may want to look into modifying the card into a corresponding Tesla model if you are planning to use it purely for number crunching. You can use the same method described below, just find a Tesla based on the same GPU with equal or lower number of enabled shader processors, find it’s device ID in the list linked at the bottom of the article, and change the device IDs using the strap.

Before you begin even contemplating this make sure you know what you are doing, and that the instructions here come with no warranty. If you are not confident you know what you are doing, buy a pre-modified card from someone instead or get somebody who does know what they are doing to do it for you.

To do this, you will require the following:

  • NVFlash for Windows and/or NVFlash for DOS
    Note: You may need to use the DOS version – for some reason the Windows version didn’t work on some of my Fermi cards. If you use the DOS version, make sure you have a USB stick or other media set up to boot into DOS.
  • Hex editor. There are many available. I prefer to use various Linux utilities, but if you want to use Windows, HxD is a pretty good hex editor for that OS. It is free, but please consider making a small donation to the author if you use it regularly.
  • Spare Graphics card, in case you get it wrong. If you are new to this, your boot graphics card (the spare one, not the one you are planning to modify) should preferably not be an Nvidia one (to avoid potential embarrassment of flashing the wrong card). Skip this part at your peril.

On Fermi BIOS-es the strap area is 16 bytes long and it starts at file offset 0x58. Here is an example based on my PNY GTX480 card:
0000050: e972 2a00 de10 5f07 ff3f fc7f 0040 0000 .r*..._..?...@..
0000060: ffff f17f 0000 0280 7338 a5c7 e92d 44e9 ........s8...-D.

The very important thing to note here is that the byte order is little-endian. That means that in order to decode this easily, you should re-write the highlighted data as:
7FFC 3FFF 0000 4000 7FF1 FFFF 8002 0000

This represents two sets of straps, each containing an AND mask and an OR mask. The hardware level straps are AND-ed with the AND mask, and then OR-ed with the OR mask.

The bits that control the device ID are 10-13 (ID bits 0-3) and 28 (bit 4). We can ignore the last 8 bytes of the strap since all the bits controlling the device ID is in the first 8 bytes.

This makes the layout of the strap bits we need to change a little more obvious:

Fxx4xxxx xxxxxxxx xx3210xx xxxxxxxx
   ^                ^^^^
   |                ||||-pci dev id[0]
   |                |||--pci dev id[1]
   |                ||---pci dev id[2]
   |                |----pci dev id[3]
   |---------------------pci dev id[4]
F - cannot be set, always fixed to 0

The device ID of the GTX480 is 0x06C0. In binary, that is:
0000 0110 1100 0000
We want to modify it into a Quadro 6000, which has the device ID 0x06D8. In binary that is:
0000 0110 1101 1000

The device ID differs only in the low 5 bits, which is good because we only have the low 5 bits available in the soft strap.

So we need to modify as follows
From:   0000 0110 1100 0000
To:     0000 0110 1101 1000
Change: xxxx xxxx xxx1 1xxx

We only need to change two of the strap bits from 0 to 1. We can do this by only adjusting the OR part of the strap.

It is easier to see what is going on if we represent this as follows:

ID Bit:   4                  32 10
Strap: -xxA xxxx xxxx xxxx xxAx xxxx xxxx xxxx
Old Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  00        00        40        00
       0000 0000 0000 0000 0100 0000 0000 0000
New Strap:
AND-0: 7F        FC        3F        FF
       0111 1111 1111 1100 0011 1111 1111 1111
OR-0:  10        00        60        00
       0001 0000 0000 0000 0110 0000 0000 0000

Note that in the edit mask above, bit 31 is marked as “-“. Bit 31 is always 0 in both AND and OR strap masks.
Bits we must keep the same are marked with “x”. Bits we need to amend are marked with “A”.

So what we need to do is flash the edited strap to the card. We could do this directly in the BIOS, but this would require calculating the strap checksum, which is tedious. Instead we can use nvflash to take care of the strap rewrite for us, and it will take care of the checksum transparently.
The new strap is:
0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x80020000
The second pair is unchanged from where we read from the BIOS above. Make sure you have ONLY changed the device ID bits and that your binary to hex conversion is correct – otherwise you stand a very good chance of bricking the card.

We flash this onto the card using:
nvflash --index=X --straps 0x7FFC3FFF 0x10006000 0x7FF1FFFF 0x00020000
Note:
1) The last OR strap is 0x00020000 even though the data in the BIOS reads as if it should be 0x80020000. You cannot set the high bit (the left-most one) to 1 in the OR strap (just like you cannot set it to 0 in the AND strap). Upon flashing nfvlash will turn the high bit to 1 for you and what will end up in the BIOS will be 0x80020000 even though you set it to 0x00020000. This is rather unintuitive and poorly documented.
2) You will need to check what the index of the card you plan to flash is using nvflash -a, and replace X with the appropriate value.

Here is an example (from my GTX480, directly corresponding the the pre-modification fragment above) of how the ROM differs after changing the strap:

0000050: e972 2a00 de10 5f07 ff3f fc7f 0060 0010 .r*..._..?...`..
0000060: ffff f17f 0000 0280 7338 a597 e92d 44e9 ........s8...-D.

The difference at byte 0x6C is the strap checksum that nvflash calculated for us.

Reboot and your card should now get detected as a Quadro 6000, and you should be able to pass it through to your virtual machine without problems. I have used this extensively to enable me to pass my GeForce 4xx series cards to my Xen VMs for gaming. I will cover the details of virtualization use with Xen in a separate article. Note that I have had reports of cards modified using this method also working virtualized using VMware vDGA, so if this is your preferred hypervisor, you are in luck. Quadro 5000 and 6000 are also listed as supported for VMware vSGA virtualization, so that should work, too – if you have tried vSGA with a modified GeForce card, please post a comment with the details.

The same modification method described here should work for modifying any Fermi card into the equivalent Quadro card. Simply follow the same process. You may find this list of Nvidia GPU device IDs useful to establish what device ID you want to modify the card to. The GPU should match between the GeForce card the the Quadro/Tesla/Grid you are modifying to – so check which Nvidia card uses which GPU.

Many thanks to the nouveau project for reverse engineering and documenting the initialization straps, and all the people who have contributed to the effort.

In the next article I will cover modifying Kepler GPU based cards. They are quite different and require a different approach. There are also a number of pitfalls that can leave you chasing your tail for days trying to figure out why everything checks out but the modification doesn’t work (i.e. the card doesn’t function in a VM).