43 comments

  • matharmin 3 days ago ago

    > Overall, I haven’t seen many issues with the drives, and when I did, it was a Linux kernel issue.

    Reading the linked post, it's not a Linux kernel issue. Rather, the Linux kernel was forced to disable queued TRIM and maybe even NCQ for these drives, due to issues in the drives.

    • happyPersonR 2 days ago ago

      Hopefully there are drives that don’t have that issue?

  • Prunkton 3 days ago ago

    Since it’s kind of related, here’s my anecdote/data point on the bit rot topic: I did a 'btrfs scrub' (checksum) on my two 8 TB Samsung 870 QVO drives. One of them has been always on (10k hours), while the other hasn’t been powered on a single time in 9 months and once in 16 months.

    No issues were found on either of them.

    • diggan 3 days ago ago

      How much have been written to each of them across their lifetime?

      • Prunkton 3 days ago ago

        very little, about 25 TB written on the always-on one. The offline one just does diffs, so probably <12 TB. Both are kind of data dumps, which is outside their designed use case. That's why I included data integrity checks in my backup script before the actual rsync backup runs. But again no issues so far

    • modzu 2 days ago ago

      if only i could trust btrfs scrub

      • kassner 2 days ago ago

        What can’t you? I’m curious, as I somewhat rely on it.

  • vardump 4 days ago ago

    I wonder how long those drives can be powered off before they lose the data. And until they lose all functionality when the critical bookkeeping data disappears.

    • magicalhippo 4 days ago ago

      This would depend on how worn they are. Here's an article describing a test a YouTuber did[1] that I watched some time ago. The worn drives did not fare that well, while the fresh ones did ok. Those were TLC drives though, for QLC I expect the result is overall much worse.

      [1]: https://www.tomshardware.com/pc-components/storage/unpowered...

      • 0manrho 4 days ago ago

        I remember that post. Typical Tom's quality (or lack there of).

        The only insight you can gleam from that is that bad flash is bad, and worn bad flash is even worse, and that's frankly a stretch given the lack of sample sizes or a control group.

        The reality is that its non trivial to determine data retention/resilience in a powered off state, at least as it pertains to a coming to a useful and reasonably accurate generalism of "X characteristics/features result in poor data retention/endurance when powered off in Y types of devices," and being able to provide the receipts to back that up. There are far more variables than most people realize going on under the hood with flash and how different controllers and drives are architected(hardware) and programmed(firmware). Thermal management is a huge factor that is often overlooked or misunderstood and that has substantial impact on flash endurance (and performance). I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.

        All that said, the general mantra remains true: more layers per cell generally means data per cell is more fragile/sensitive, but that's generally in the context of write cycle endurance.

        • ffsm8 3 days ago ago

          First time I hear such negativity about tomshardware but the only time I actually looked at one of their tests in detail was with their series that tests for burn-in for consumer OLED TVs and displays. But the other reviews I glances at in that contexts looked pretty solid from a casual glance

          Can you elaborate wrt the reason for your critique considering they're pretty much just testing from the perspective of the consumer? I thought their explicit goal is not to provide highly technical analysis and niche preferences but instead look at it for John Doe that's thinking about buying X, and what it would mean for his usecases. From my mental model of that perspective, they're reporting was pretty spot on and not shoddy, but I'm not an expert on the topic

          • AdrianB1 3 days ago ago

            As someone that I read Tom's since it was ran by Thomas, I found the quality of the articles a lot lower than almost 30 years ago. I don't remember when I stopped checking it daily, but I guess it is over 15 years ago.

            Maybe the quality looks good to you, but maybe you don't know what it used to be 25 years ago to compare to. Maybe it is a problem of wrong baseline.

          • magicalhippo 3 days ago ago

            The article I linked to is basically just a very basic retelling of the video by some YouTuber. I decided to link to it as I prefer linking to text sources rather than videos.

            The video isn't perfect, but I thought it had some interesting data points regardless.

          • 0manrho 3 hours ago ago

            > they're pretty much just testing from the perspective of the consumer

            Yes. that's their schtick. Do just enough so that the average non-tech literate user doesn't know any better. And if you're just a casual consumer/reader, It's fine. Not great, not even necessarily accurate, but most of their readership don't know enough to know any better (and that's on purpose). I don't believe their intentionally misleading people. Rather - simply put - It's evident the amount of fucks they give regarding accuracy, veracity, depth, and journalism in general is decidedly less than their competition.

            If you're trying to gain an actual technical insight with any real depth or technical merit, toms is absolutely not the place to go. Compare to ServeTheHome (servers, networking, storage, and other homelab and enterprise space related stuff), GN (gaming focused), RTings.com (Displays and peripherals), to name a few to see the night and difference between people that know what they're talking about and strive to be accurate and frame things in the right context, and compare that with what Toms does.

            Again, depends on what the user is looking for, but Toms is catering to the casual crowd, aka people who don't know any better and aren't gonna look any deeper. Which is totally fine, but it's absolutely not a source for nuance, insight, depth, rigor, or anything like that.

            The article in question[0] is actually a great example of this. They found a youtube video of someone buying white-label drives, with no control to compare it to, nor further analysis to confirm that the 4 drives in question actually all had the same design, firmware, controller, and/or NAND flash underneath (absolutely not a given with bargain bin white label flash, which these were, and it can make a big difference). I'm not trying to hate on the youtuber, there's nothing wrong with their content, but rather with how Toms presents it as an investigation into unpowered SSD endurance while in the same article they themselves admit: "We also want to say this is a very small test sample, highlighted out of our interest in the topic rather than for its hard empirical data." This is also why I say I don't believe their trying to be disingenuous. Hell, I give them credit for admitting that. But it is not a quality or reliable source that informs us of anything at all about the nature of flash at large, or even the specific flash in question, because we don't know what the specific flash in question is Again, just because they're the same brand, model and capacity does not mean they're all the same, even for name brand devices. Crucial's MX500 SSD's for example have been around for nearly a decade now, and the drives you buy today are VERY MUCH different from the ones you could buy of the same capacities in 2017 for example.

            Don't even get me started on their comments/forums.

            0: https://www.tomshardware.com/pc-components/storage/unpowered...

        • Eisenstein 2 days ago ago

          > I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.

          I would love this.

        • tart-lemonade 2 days ago ago

          > I could go into more specifics if interested (storage at scale/speed is my bread and butter), but this post is long enough.

          I would read an entire series of blog posts about this.

          • 0manrho 3 hours ago ago

            Not my work, but ask and ye shall recieve: https://www.storagereview.com/

            They primarily focus on Storage (SSD's and HDD's) but also evaluate storage controllers, storage-focused servers/NAS/DAS/SAN/etc and other such storage-adjacent stuff. For an example of the various factors that differentiate different kinds of SSD's, I'd recommend the above's article reviewing Micron's 7500 line of SSD's[0]. It's from 2023, but still relevant and you don't have to read the whole thing. Heck just scroll through the graphs and it's easy to see this shit is far from simple even when you're accounting for using the same storage controllers and systems and testing methodologies and what not.

            If you want to know about the NAND (or NOR) flash itself, and what the difference/usecases are at a very technical level, there's stuff like this from Micron "NAND Flash 101 NAND vs. NOR Comparison"[1]

            If that's too heavy for you (it is a whitepaper/technical paper after all), and you want a more light read on some of the major differences between Enterprise Flash and Consumer flash, SuperSSD has a good article on that [2] as well as many other excellent articles.

            Wanna see some cool use cases for SSD's that aren't so much about the specific low-level technicals of the storage device itself, but rather how they can be assembled into arrays and networked storage fabrics in new and interesting ways ServeTheHome as some interesting articles such as their "ZFS without a Server Using the NVIDIA BlueField-2 DPU"[3]

            Apologies for responding 2 days late. I would be happy to answer any specific questions, or recommend other resources to learn more.

            Personally my biggest gripe that I've not really seen anyone do proper analysis on is the thermal dynamics of storage devices and the impact that has (especially on lifespans). We know this absolutely has an effect just from deploying SSD's at scale and seeing in practice how otherwise identical drives within the same arrays and in the same systems have differing lifespans with the number one differentiating factor being peak temperatures and temperature delta's (high delta T can be just as bad or worse than just high temp, although that comes with a big "it depends"). Haven't seen a proper testing methodology really trying to take a crack at it (because that's a time consuming, expensive, and very difficult task, far harder to control for relevant variables than in GPU's imo, due in part to the many many different kinds of SSD's from different NAND flash chips, different heatsinks/form factors, wide variety in where they're located within systems, etc etc). Take note that many SSD's, save for those that explicitly are built for "extreme/rugged environments" have thermal limits that are much lower than other components in a typical server. Often the operating range spec is something like -10C to 50C for SSD's (give or take 10C on either end depending on the exact device), meanwhile GPU's and CPU's can operate at over 80C which - while not a preferred temp - isn't out of spec, especially under load. Then consider the physical packaging of SSD devices as well as where they are located in a system can mean they often don't get adequate cooling; M.2 form factor SSD's are especially prone to issues in this regard, even in many enterprise servers both due to where they're located in relation to airflow or other hot components (often have some NIC/GPU/DPU/FPGA sitting right above them or a nearby onboard chip(set) dumping heat into the board which raises the thermal floor/ambient temps). There's a reason the new EDSFF form factor has so many different specs to account for larger heatsinks and cooling on SSDs [4][5][6]

            I've barely even touched on things like networked arrays, the explosion in various accelerators and controllers for storage, NVMeoF/RoCE/Storage Fabrics, Storage Class Memory, HA storage, Transition flash, DRAM and Controllers within SSD's, Wear-leveling and Error Correction, PLP, ONFI, SLC/MLC/TLC/QLC and the really fun stuff like PCIe root topologies, NVME zoned namespacing, Computational Storage, CXL, GPUDirectStorage/BaM, Cache Coherency, etc etc etc.

            0: https://www.storagereview.com/review/micron-7500-pro-7500-ma...

            1: https://user.eng.umd.edu/~blj/CS-590.26/micron-tn2919.pdf (Direct PDF link)

            2: https://www.superssd.com/kb/consumer-vs-enterprise-ssds/

            3: https://www.servethehome.com/zfs-without-a-server-using-the-...

            4: https://americas.kioxia.com/en-us/business/ssd/solution/edsf...

            5: https://americas.kioxia.com/en-us/business/ssd/solution/edsf...

            6: https://members.snia.org/document/dl/27231 (Direct PDF link, Technical Whitepaper on the Standard if you really want to dive deep into what EDSFF is)

  • tracker1 2 days ago ago

    My 3 of the first SATA SSDs are still in use from over a decade ago... I first had them in my home server as OS and Cache drives respectively, they later went into desktop use for a couple years when the server crashed and I replaced it with a mini pc that was smaller, faster and quieter. Then they eventually wound up in a few desk-pi cases with RPi 4 8GB units, and I handed them off to a friend a couple months ago. They're all still working and only 1 error between the 3 of them. IIRC, all 240gb crucial drives from early 2010's.

    I've never had any spinning drives come close to that level of reliability. I've only actually had one SSD or NVME fail, and that's the first gen Intel drive in my desktop that had a firmware bug and one day showed up as an 8mb empty drive. It was a 64gb unit and I was so impressed by the speed, but tired of symlinking directories to the HDD for storage needs I just bumped to 240gb+ models and never looked back.

    Currently using a Corsair MP700 Pro drive (gen 5 nvme) in my desktop. Couldn't be happier... Rust and JS projects build crazy fast.

  • Havoc 3 days ago ago

    Have had enough consumer SSDs fail on me that I ended up building a NAS with mirrored enterprise ones...but 2nd hand ones. Figured between mirrored and enterprise that's an OK gamble.

    Still to be seen how that works out in long run but so far so good.

    • Yokolos 3 days ago ago

      For data storage, I just avoid SSDs outright. I only use them for games and my OS. I've seen too many SSDs fail without warning into a state where no data is recoverable, which is extremely rare for HDDs unless they're physically damaged.

      • yabones 2 days ago ago

        SSDs are worth it to me because the restore and rebuild times are so much faster. Larger HDDs can take several days to rebuild a damaged array, and other drives have a higher risk of failure when they're being thrashed by IO and running hot. And if it does have subsequent drives fail during the rebuild, it takes even longer to restore from backup. I'm much happier to just run lots of SSDs in a configuration where they can be quickly and easily replaced.

      • Havoc 3 days ago ago

        I just don't have the patience for HDDs anymore. Mirrored arrays and backups are going to have to do on data loss.

        That said I only have a couple of TBs...bit more and HDDs do become unavoidable

        • shim__ 3 days ago ago

          I'm using an HDD with SSD cache for /home all non stale will be cached by the SSD

          • Havoc 2 days ago ago

            What mechanism are you using to manage the cache ?

      • vardump 2 days ago ago

        I’m worried about my drives that contain helium. So far so good, all show 100% helium level, but I wonder for how long.

        • Havoc a day ago ago

          Worst case they end up with normal atmosphere...which is what all the other drives are running anyway, no?

          Can't say I've heard of people worrying about this angle before tbh

          • vardump 15 hours ago ago

            I understand normal atmosphere would destroy the heads pretty quickly.

    • PaulKeeble 2 days ago ago

      You can't trust SSDs or HDDs, fundamentally they still have high failure rates regardless. Modern Filesystems with checksums and scrub cycles etc are going to be necessary for a long time yet.

  • justsomehnguy 3 days ago ago

    > The reported SSD lifetime is reported to be around 94%, with over 170+ TB of data written

    Glad for the guy, but here are a bit different view on the same QVO series:

        Device Model:     Samsung SSD 870 QVO 1TB
        User Capacity:    1,000,204,886,016 bytes [1.00 TB]
       
        == /dev/sda
          9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
        177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       406
        241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354606366027
        == /dev/sdb
          9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
        177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       402
        241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354366033251
        == /dev/sdc
          9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
        177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       409
        241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       352861545042
        == /dev/sdd
          9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40778
        177 Wear_Leveling_Count     0x0013   060   060   000    Pre-fail  Always       -       403
        241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       354937764042
        == /dev/sde
          9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       40779
        177 Wear_Leveling_Count     0x0013   059   059   000    Pre-fail  Always       -       408
        241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       353743891717
    
    NB you need to look at the first decimal number in 177 Wear_Leveling_Count to get the 'remaining endurance percent' value, ie 59 and 60 here

    While overall it's not that bad, losing only 40% after 4.5 years - it means what in another 3-4 years it would be down to 20% if the usage pattern wouldn't change and the system wouldn't hit the write amplification. Sure, someone had that "brilliant" idea ~5 years ago to use a desktop grade QLC flash as a ZFS storage for PVE...

    • yrro 3 days ago ago

      Have a look at the SSD Statistics page of the device statistics log (-l smart). This has one "Percentage Used Endurance Indicator" value, which is 5 for three of these disks, and 6 for one of them. So based on that, the drives still have ~95% of their useful life left.

      As I understand it, the values in the device statistics log have standardized meanings that apply to any drive model, whereas any details about SMART attributes (as in the meaning of a particular attribute or any interpretation of its value apart from comparing the current value with the threshold) are not. So absent a data sheet for this particular drive documenting how to interpret attribute 177, I would not feel confident interpreting the normalized value as a percentage; all you can say is that the current value is > the threshold so the drive is healthy.

      • justsomehnguy 2 days ago ago

            == /dev/sda
            177 Wear_Leveling_Count     PO--C-   059   059   000    -    406
            0x07  0x008  1              42  N--  Percentage Used Endurance Indicator
            == /dev/sdb
            177 Wear_Leveling_Count     PO--C-   060   060   000    -    402
            0x07  0x008  1              42  N--  Percentage Used Endurance Indicator
            == /dev/sdc
            177 Wear_Leveling_Count     PO--C-   059   059   000    -    409
            0x07  0x008  1              43  N--  Percentage Used Endurance Indicator
            == /dev/sdd
            177 Wear_Leveling_Count     PO--C-   060   060   000    -    403
            0x07  0x008  1              42  N--  Percentage Used Endurance Indicator
            == /dev/sde
            177 Wear_Leveling_Count     PO--C-   059   059   000    -    408
            0x07  0x008  1              42  N--  Percentage Used Endurance Indicator
        
        Yeah, it's better for sure. I did 'smartctl -x | grep Percent', it's easier.

        I said about 177 because it's the same numbers what PVE gives in the webui and I didn't found the obvious 'wearout/lifeleft' I'm accustomed to see in the SMART attributes.

    • wtallis 2 days ago ago

      Building an array of five 1TB QLC drives seems like a really odd decision, like somebody started with the constraint that they must use exactly five SSDs and then tried to optimize for cost.

      The 4TB models obviously will hold up better under 170+ TB of writes than the 1TB drives will, and it wouldn't be surprising to see less write amplification on the larger drives.

      • 2 days ago ago
        [deleted]
  • bullen 2 days ago ago

    I have used Samsung PM897 1.92TB for a year and it's okish... it is slow on the indexing when folders become saturated and as the drive fills up it becomes slower.

    Solution is to remove some files... and pray it lasts half of a 64GB Intel X25-E!

    It should last 30x shorter because it is 30x larger?

    Or is this game only about saturation rate?

  • 8cvor6j844qw_d6 3 days ago ago

    I wonder whats the best SATA SSD (M.2 2280) one could get now?

    I have an old Asus with a M.2 2280 slot that only takes SATA III.

    I recall 840 EVO M.2 (if my memory serves me right) is the current drive but looking for a new replacement seems not to be straightforward as most SATA is 2.5 in. Or if its the correct M.2 2280, its for NVMe.

    • Marsymars 2 days ago ago

      I don’t know about best but you can filter pcpartpicker.com to M.2 SATA interface drives.

    • Hendrikto 2 days ago ago

      Most companies stopped making and selling SATA M.2 drives years ago.

      • jonbiggums22 2 days ago ago

        Most companies seem to well on the way for SATA drives in general. There aren't many non-garbage tier options available anymore. I keep ending up buying 870 EVOs even though I don't really love them.

    • more_corn 2 days ago ago

      Samsung and Intel have come out on top on all my tests.

  • more_corn 2 days ago ago

    The critical failure profile is when you fill them up almost full and then hit them with a bunch of writes. If you can avoid that you’re good for years.

  • spaceport 2 days ago ago

    What size of apartment does an hdd noise become an issue in?

    • kassner 2 days ago ago

      It depends on background noise, not sure how objectively it can be measured.

      My apartment is super quiet, you hardly hear anything from the outside, so I can hear a HDD in the living room during the silent parts of movies. In some relative’s house, however, you only notice how loud is the background noise when the power goes off. No wonder why I always have a headache when I go there.