Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

Having spent a considerable amount of time, effort, and ultimately money trying to find decently performing SD, CF and USB flash modules, I feel I really need to ensure that I make the lives of other people with the same requirements easier by publishing my findings – especially since I have been unable to find a reasonable comprehensive data source with similar information.

Unfortunately, virtually all SD/microSD (referred to as uSD from now on), CF and USB flash modules have truly atrocious performance for use as normal disks (e.g. when running the OS from them on a small, low power or embedded device), regardless of what their advertised performance may be. The performance problem is specifically related to their appalling random-write performance, so this is the figure that you should be specifically paying attention to in the tables below.

As you will see, the sequential read and write performance of flash modules is generally quite good, as is random-read performance. But on their own these are largely irrelevant to overall performance you will observe when using the card to run the operating system from, if the random-write performance is below a certain level. And yes, your system will do several MB of writing to the disk just by booting up, before you even log in, so don’t think that it’s all about reads and that writes are irrelevant.

For comparison, a typical cheap laptop disk spinning at 5400rpm disk can typically achieve 90 IOPS on both random reads and random writes with typical (4KB) block size. This is an important figure to bear in mind purely to be able to see just how appalling the random write performance of most removable flash media is.

All media was primed with two passes of:

 dd if=/dev/urandom of=/dev/$device bs=1M oflag=direct

in order to simulate long term use and ensure that the performance figures reasonably accurately reflect what you might expect after the device has been in use for some time.

There are two sets of results:

1) Linear read/write test performed using:

dd if=/dev/$device of=/dev/null    iflag=direct
dd if=/dev/zero    of=/dev/$device oflag=direct

The linear read-write test script I use can be downloaded here.

2) Random read/write test performed using:

iozone -i 0 -i 2 -I -r 4K -s 512m -o -O +r +D -f /path/to/file

In all cases, the test size was 512MB. Partitions are aligned to 2MB boundaries. File system is ext4 with 4KB block size (-b 4096) and 16-block (64KB) stripe-width (-E stride=1,stripe-width=16), no journal (-O ^has_journal), and mounted without access time logging (-o noatime). The partition used for the tests starts at half of the card’s capacity, e.g. on a 16GB card, the test partition spans the space from 8GB up to the end. This is in done in order to nullify the effect of some cards having faster flash at the front of the card.

The data here is only the first modules I have tested and will be extensively updated as and when I test additional modules. Unfortunately, a single module can take over 24 hours to complete testing if their performance is poor (e.g. 1 IOPS) – and unfortunately, most of them are that bad, even those made by reputable manufacturers.

The dd linear test is probably more meaningful if you intend to use the flash card in a device that only ever performs large, sequential writes (e.g. a digital camera). For everything else, however, the dd figures are meaningless and you should instead be paying attention to the iozone results, particularly the random-write (r-w). Good random write performance also usually indicates a better flash controller, which means better wear leveling and better longevity of the card, so all other things being similar, the card with faster random-write performance is the one to get.

Due to WordPress being a little too rigid in it’s templates to allow for wide tables, you can see the SD / CF / USB benchmark data here. This table will be updated a lot so check back often.

 

39 thoughts on “Flash Module Benchmark Collection: SD Cards, CF Cards, USB Sticks

    • JFFS2 is ancient, the whole block device must be scanned at mount time (slow) and it is intended for use on raw NAND rather than on a normal block device that (in theory) does it’s own wear leveling).

      LogFS is also designed for raw NAND.

      For raw NAND devices, UBIFS is probably the best choice at the moment, but either way, it is not relevant to normal flash media such as SD, CF or USB media.

      NilFS2 could be used, and it does make a great improvement to random-write performance by virtue of all of it’s writes always being sequential, but it is not without it’s problems. It is impossible to tell how much free space you really have on the device, and it’s garbage collection method can actually cause a significantly increased flash wear (unless the underlying flash media does no wear leveling of it’s own at all, which is unlikely).

  1. Some googling reveals excellent results on Sandisk Class 4 cards. People are supposedly getting on the order of ~250 random write IOPS! Interestingly, “faster” class 10 models are actually slower in random writes.

      • I think the results there are misleading because all the tests were done with relatively small test sizes. A lot of cards, particularly SanDisk and Pretec seem to have the ability to cheat the benchmarks (including iozone) with smaller test sizes. What “smaller” means in this case varies, sometimes < 16MB, sometimes as much as 128MB. This is one of the reasons why I am running my own tests with 512MB test sizes, and if I find reason to suspect that a card is still managing to cheat it's way to inflated figures I will up the test size.

        I suspect that some cards might also be specifically optimized for the Crystal Disk Mark test pattern since that is what most people seem to use, but this article seems to have a more comprehensive and accurate set of results: http://www.tomshardware.com/charts/2011-sd-cards/CrystalDiskMark-3.0-x64,2719.html

        Unfortunately, it also shows the SanDisk Class 10 Extreme SD card to have about 20 IOPS, which undermines the credibility of the results – I have one of those and the performance is more like 3 IOPS. It is possible that they didn't overwrite the cards a couple of times with random data to ensure that the performance figures reflect the cards' performance after longer term use.

        Bottom line – if the data I felt I could trust already existed elsewhere I wouldn't be bothering with this bit of research of my own.

  2. Good news! :D

    Some guy benchmarked this beauty in his phone. We’re talking about at least 120 random write IOPS on a 64GB microSDXC for ~£80. Sounds like a reasonable price (though it comes dangerously close to the price of an AC100 ;-))

    • The test appears to be only 50MB in size, which allows the card to cheat. I’d be interested to see what his results are with 512MB test size. I suspect it’ll be in low single figures.

      And regarding the cost – yes, it is getting silly. Proper SATA SSDs are actually cheaper per GB nowdays than the (supposedly) more advanced SD cards.

    • Yes, if I can find some uSD cards that achieve a level of performance that wouldn’t be painful to use.

      There are also issues surrounding 2-disk RAID0 optimization WRT making sure that block size, chunk size and block group size all align optimally. You can read through the article on disk and file system optimization here to get the gist of what I’m talking about.

      For example, to ensure that you don’t write more data than you have to on flash, you want to make sure that chunk size = block size. But block group size is only adjustable in increments of 8 blocks, and since you only have 2 disks, that means that the only way this will align is if you have chunk size = 8 blocks. The downside is that for a 1 block write you will still have to write 8 blocks, so performance and longevity with writes smaller than 8 blocks will suffer, but on the plus side you can make sure that the superblocks are spread across both disks rather than just one. The 8-block chunks can be mitigated to some extent by using 1KB file system blocks instead of 8KB ones, but that means more metadata which means more book keeping related overheads.

      There is unfortunately no ideal solution to the problem other than adding a 3rd disk, which in this case isn’t possible.

  3. Gordan,

    Here are the results of the Sandisk Extreme Pro (8GiB, Rated 95MB/s) on my AC100. I used Ubuntu 12.04 installed on the internal drive to test this. I do not know why the sequential speed is so slow on the AC100. Becouse using the same test on a different pc it seems limited by the usb card reader (~20MB/s).

    Sequential results:
    Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
    Write: 1.7 MB/s 3.3 MB/s 5.6 MB/s 7.0 MB/s 8.1 MB/s 8.7 MB/s 8.9 MB/s 9.2 MB/s 9.6 MB/s
    Read: 3.3 MB/s 4.8 MB/s 6.0 MB/s 7.0 MB/s 7.6 MB/s 8.0 MB/s 8.1 MB/s 8.3 MB/s 8.5 MB/s

    Random results however, are very good:
    KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
    524288 4 162 606 1172 133

    • Thanks, I added the results to the tables. Seems like there is a new winner among the SD cards.

      The read speed does indeed seem slow, especially compared to the writes. No idea why that might be the case. I haven’t observed that artifact before. The only thing that comes to mind that is different is that this is a UHS-I card (all the cards I tested are lower spec).

      Remember to re-format the card with -E discard option before you install the OS onto it. :)

  4. I have been reading your articles, that cooling solution along with clock speed would probably amaze toshiba itself :)

    I have to ask you though about the screen upgrade, does it make watching movies enjoyable? I feel that the regular screen have far to low quality (not just resoloution) to do anything on it other than web browsing.

    Ps, I wish you made the ac200 if its ever made, not “who should not be named”.

    • Thank you. :)
      The cooling solution is nothing special, it’s just a matter of doing what you can given the minimal space that is available.
      I have to say I was quite shocked by the overclockability of the Tegra2 – my experience with other Nvidia chips showed them to come pre-overclocked past stable limits from the factory.
      Screen-wise, I don’t think there is anything wrong with the quality of the standard display panel. I don’t really use my AC100 for watching videos, but what matters to me is screen resolution. I find 1280×720 to just about be usable. 1024×600 isn’t enough for anything.

  5. I suggest you using a faster filesystem first of all.

    I made myself a benchmark of all the Linux supported filesystems on http://girlyngeek.blogspot.com.es/2011/04/ultimate-linux-filesystems-benchmark.html and you can see that while on sequential accesses extX filesystems are good, on random they’re badly behind any other filesystem (Macintosh’s, Windows’, Irix’s, Reiser’s, etc).

    I suggest you to use XFS for your tests (using NTFS or HFS on Linux is unrealistic, because they’re not fully supported) as btrfs is still evolving and unfinished, and ReiserFS are abandoned.

    • Indeed, I was actually planning to do some file system benchmarking on random-write constrained devices such as the flash media referred to in this article, but haven’t gotten around to it yet. The benchmark in this article is purely intended to assess the raw performance of the cards. For the iozone test applied, the performance is not file system dependent (the performance figures are indistinguishable between using the file system and the raw device).

      When I do the file system benchmarks I will probably focus on the likes of zfs and nilfs, as they show the most promise for slow-random-write media.

  6. Is this the correct way to get the 2 MB boundaries:
    sudo fdisk -S 32 -H 64 /dev/sda
    Units = cylinders of 4096 * 512 = 2097152 bytes
    Making the filesystem is clearer:
    sudo mkfs.ext4 -b 4096 -E stride=1,stripe-width=16 -O ^has_journal /dev/sdX2

    You should make people aware that the dd commands will erase the device.
    I’ve got a USB stick (http://www.sandisk.com/products/usb/drives/extreme/ ) which seems to give good results at randomness. Even better on an USB3-PCIe card. Older (>1y) sticks I have to abort after 2 hours.
    What is the formatting here? HTML?

      • I meant code or URL formatting …

        I mounted like this btw:
        sudo mount -o noatime /dev/sda2 /mnt
        Which resulted in:
        rw,noatime,user_xattr,barrier=1,stripe=16
        I just asked before I tested and posted wrongly. Here are the results for the SanDisk Extreme USB 3.0 (16G, up to 190 MB/s read, 55 MB/s write) on the USB2 port:
        Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
        Write: 7.2 MB/s 12.1 MB/s 17.5 MB/s 20.7 MB/s 23.9 MB/s 10.6 MB/s 24.0 MB/s 23.9 MB/s 24.0 MB/s
        Read: 7.5 MB/s 11.9 MB/s 17.4 MB/s 20.8 MB/s 23.3 MB/s 23.4 MB/s 24.4 MB/s 25.0 MB/s 25.5 MB/s

        random random bkwd record stride
        KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
        524288 4 699 1496 1447 558

        On the USB3 port (expansion card PCI-e 1.0a – 705 MB/s) speed and random reads are better:
        Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
        Write: 8.9 MB/s 41.4 MB/s 52.5 MB/s 57.0 MB/s 57.1 MB/s 11.8 MB/s 26.1 MB/s 50.1 MB/s 56.2 MB/s
        Read: 20.1 MB/s 33.5 MB/s 54.2 MB/s 79.3 MB/s 109 MB/s 106 MB/s 127 MB/s 140 MB/s 141 MB/s

        random random bkwd record stride
        KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
        524288 4 943 2451 2612 566

        Both have the drop at writing 128K. Do you know what that means?

        Will test some µSD with an USB adapter. But on the USB2 port, right?

        • I usually do the dd tests on the raw device, to avoid any chance of the FS itself skewing the figures (e.g. nilfs2 will convert all writes into linear writes, which utterly cheats the test).

          No iozone results for random read and random write?

          I have noticed the same thing regarding a throughput dip at 128KB on some devices. If I had to guess, it could be that the device has 64KB of cache or buffers of some description. I don’t really know for sure. The dip you are seeing is much bigger than I’d seen before, though.

      • No iozone results for random read and random write?
        They are right there.
        I aborted iozone with two µSDs (unlabeled class 4, Transcend class 10) after two hours. The Transcend seemed to be broken (sfdisk: ERROR: sector 0 does not have an msdos signature) and I could only revive it with Windows.

        • It is not unheard of for the slow flash media to take 1-2 days to complete that test. As is evident from the test results, most flash modules have atrocious random write performance.

  7. Great tests, I’ve been comparing many types of approaches to flash as well. And you hit it on the mark.
    FTL devices are not very good, and if you think about it for a minute you’ll understand why they never can be.
    The approach of putting an incompatible layer in between the user and the drive is not going to be helped by extra instructions such as trim. They solve only one problem and require modification of FS drivers!
    Why not just make all FTL firmware offer a standard DIRECT access?
    Then we could all use proper solutions, like flash filesystems.

    • Not all of the devices tested are bad – the SuperTalent USB stick is very good, but it is expensive. All modern SATA SSDs offer excellent performance. It is the generic cheap USB sticks, CF and SD cards that perform extremely poorly, which is disappointing. Proper SSDs show that good performance is easily possible, while expensive high end SD and CF cards are more expensive and perform extremely poorly.

      • I was talking from a more theoretical standpoint. When hiding layers from software (filesystems) it’s easy to get into the situation where the right hand doesn’t know what the left hand is doing.
        (reduced control, performance, functionality, and strategies for bad blocks that can result in unexpected behaviour)

        For a real life performance example, you could for example compare the internal memories of Nokia N900. The memory without FTL is noticably faster. So much faster that it is recommended to reorganise the files you use most often onto the ubifs system.
        (The difference in performance is way beyond trivial)

        I know SSDs *can* perform well with huge internal ram and processing integrated on the device.
        But it’s also possible to create a device that stores data onto cheese at 60Mb/s.
        Does that mean it’s a good solution?

        IMO a good solution for removable flash devices would be to have single/multiple addressable spaces on the devices with read, write, erase instructions, DMA and a very high quality eeprom for storage of bad block and fs metadata. This would ultimately result in cheaper devices, better performance and higher reliability.

  8. Isn’t this pretty decent?

    SanDisk Extreme USB 3.0, 16GB
    ~$20 on Ebay

    I just noticed this thumbdrive has been benchmarked by another user. But here goes:

    O_DIRECT feature enabled
    Record Size 4 KB
    File size set to 524288 KB
    SYNC Mode.
    OPS Mode. Output is in operations per second.

    File stride size set to 17 * record size.
    random random …
    KB reclen write rewrite read reread read write read …
    524288 4 462 1186 1181 570

    Stripped, but with connector on, it looks like this:
    http://www3.picturepush.com/photo/a/13207936/img/13207936.jpg

    • Repost for clarity…
      KB______reclen__write__rewrite__randomread__randomwrite
      524288__4______462___1186____1181_______ 570

      • Those figures look very good if they are in fact real. I cannot comment further because I haven’t tested it myself.

  9. “The partition used for the tests starts at half of the card’s capacity, e.g. on a 16GB card, the test partition spans the space from 8GB up to the end. This is in done in order to nullify the effect of some cards having faster flash at the front of the card.”

    That doesn’t work, the FTL maps which physical blocks are used, we have no way to alter that unless someone hacks the firmware. Your 512MB file size will help though, the slc cache on some flash drives is never that big. I’m not that happy about running your iozone test, I do it once only per card. A 4kB write can mean a huge write at chip level. At a guess, the max:4k write speed ratio is proportional to the total written at chip level vs sent in 4k chunks. This is the main reason it’s so slow. It does make a nice burn in check on new cards though…

    A 4MB Sandisk ultra I was testing (to death!) yesterday had 4MB erase blocks. So if you don’t know erase block size, I’d suggest starting partitions at 4MB boundaries, bigger would be safer. Look at this page and run flashbench to determine erase block size: http://blogofterje.wordpress.com/2012/01/14/optimizing-fs-on-sd-card/

    Here’s the results from my Lexar 8GB professional 300x. Known for being pretty good as a boot card. Pages appear to be 16kB and erase segment appears to be 2MB from flashbench test. Block size seems to be 128k, but I’ve not worked out how to make use of that fact yet. Random 4k write is not good, it started out twice as fast according to atop, then dropped off at some point. I’m still wondering about the discrepancy between atop figures and iozones results. atop seemed to think it was managing around 50OPS during random write and around 185 during write/rewrite. random read tallied.

    CompactFlash ATA device
    Model Number: LEXAR ATA FLASH CARD
    Serial Number: 41803914600000500A50
    Firmware Revision: 20090202

    Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
    Write: 3.8 MB/s 8.0 MB/s 19.5 MB/s 28.3 MB/s 37.0 MB/s 43.4 MB/s 44.5 MB/s 44.7

    MB/s 45.0 MB/s
    Read: 11.7 MB/s 20.3 MB/s 31.0 MB/s 40.7 MB/s 48.9 MB/s 54.3 MB/s 55.3 MB/s 55.9

    MB/s 56.2 MB/s

    524288 4 63 62 2594 17.5

    • Regarding doing the test from half way up the device – it neuters one thing that some cards have been observed to do, which is have the front of the card on fast SLC and the rest on slow MLC. Such modules are optimized for FAT, and FAT is at the beginning of the disk. As for FTL layer, this is effectively nullified out by the fact that the card is primed with a full overwrite from /dev/urandom.

  10. Next one: Sandisk Ultra 4GB The replacement for the one that died showed up today, it’s pretty good for 200x, tests finished already! I’ve had a kingston 32gb 266 grinding away since yesterday…

    I tried begining and middle on this one, there was a 10% random write difference on 32mb size, I doubt it would be noticeable at 512: It was actually somewhat faster in 4k and 8k dd test after I overwrote with urandom? It’s occurred to me that writing with zero will force erase to all blocks. Zeros have to be erased to set to 1, writing can only change bits to 0. I haven’t tried this yet.

    CompactFlash ATA device
    Model Number: SanDisk SDCFH-004G
    Serial Number: AGZ091113180632
    Firmware Revision: HDX 7.07

    factory partition (sectors): /dev/sdb1 * 300 7813119 3906410 b W95 FAT32

    flashbench pointed at a 4MB erase segment, so partitioned at 4096 block size, stripe 1024, stride 1, 8192 sector start to partitions. for testing. The factory partition is a bit odd, I may play with that.

    Block: 4K 8K 16K 32K 64K 128K 256K 512K 1M
    Write: 2.8 MB/s 5.0 MB/s 12.6 MB/s 24.4 MB/s 26.3 MB/s 28.1 MB/s 27.7 MB/s 28.2 MB/s 27.3 MB/s
    Read: 6.4 MB/s 10.6 MB/s 15.8 MB/s 21.0 MB/s 25.0 MB/s 27.6 MB/s 27.7 MB/s 27.7 MB/s 27.8 MB/s

    KB reclen write rewrite read reread read write
    524288 4 73 66 1753 49

    • On further thought /dev/zero should wipe drives fine for the iozone tests, but for the dd test, the FTL could well find that blocks already contain suitable data and skip writing it.

      • It depends on how clever the FTL is. In modern SSDs, the controller performs compression and block level deduplication. So if you cat /dev/zero to the disk, it’ll flatline the SATA interlink and write nothing at all to the flash. While most SD cards controllers may not have advanced to that point yet, it is only a matter of time before they do.

  11. Kingston 32GB 266x Ultimate. definitely not quite the same one as TomsHardware tested a few years back, this one has a triangular hologram but is otherwise identical, including part number.

    CompactFlash ATA device
    Model Number: ULTIMATE CF CARD
    Serial Number: 5B3802690003DC9A
    Firmware Revision: Ver7.02K

    Factory format:
    /dev/sdb1 * 8128 62455679 31223776 c W95 FAT32 (LBA)

    Next time I’m testing snail card, I’ll do it in KB/sec and convert to OPS, the decimal would be handy.

    524288 4 71 62 1335 1

    Write: 2.1 MB/s 4.1 MB/s 7.5 MB/s 15.6 MB/s 35.7 MB/s 36.9 MB/s 38.1 MB/s 38.3 MB/s 38.2 MB/s
    Read: 12.4 MB/s 18.5 MB/s 24.5 MB/s 29.5 MB/s 55.7 MB/s 64.9 MB/s 69.7 MB/s 73.1 MB/s 73.6 MB/s

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>