> Last year, we rolled out a new service that changed how data is placed across Magic Pocket. The change reduced write amplification for background writes, so each write triggered fewer backend storage operations. But it also had an unintended side effect: fragmentation increased, pushing storage overhead higher. Most of that growth came from a small number of severely under-filled volumes that consumed a disproportionate share of raw capacity
Me thinking big corps with huge infrastructure bills meticulously model changes like that using the production data they have, so that exact change in all the metrics they care about is known upfront. Turned out they are like me: deploy and see what breaks.
Author here :) We did have high-level metrics and expectations for how this change would behave, but a couple of factors made it much harder to reason about in practice that were happening in parallel.
Data in these systems moves slowly and with a lot of inertia, so the effects show up gradually and can lag behind the change itself. On top of that, the impact wasn’t uniform. Most of the overhead came from a small subset of volumes, so it took time to isolate what was actually driving the increase. These systems are hard to test at scale!
Google recently increased storage from 2 TB to 5 TB on their $20 AI plan, while Dropbox is still stuck at 2 or 3 TB for their $12/$20 plans.
They moved from 1 TB to 2 TB in mid-2019, and I wonder if they ever plan to pass on any of the gains from the past seven years of technological advancements, or if those gains are simply being captured on their side while we keep paying the same.
Aside from bad pricing and us wanting to move our data to servers owned by a European company, the thing that that bothered me the most as a (former) paying customer was the constant upsell pushes. Every time I’d log in to the web interface they would show ads in the web interface (including pop up dialogs) to try to move me to another plan.
I’m already paying 20 Euro per month. Leave me alone.
are these "technological advancements" in storage in the room with us right now? because I'm looking at today's price per TB and it's higher than it was in 2020
did you calculate it with real inflation adjusted price? not the BS numbers in financial media, FED etc. Since 2020 unlimited printer, inflation is not few %.
It’s a shame that such fantastic engineering work is buried behind a product with so many annoyances dictated by the marketing/revenue teams.
I wish Dropbox would make some kind of “classic edition” that removed annoyances from their desktop client.
Until then, I’m using Filen. It’s fine, I have some qualms with it but it runs on every platform including Linux, it’s affordable, and end to end encrypted.
I don't think there's much for Amazon to gain from publishing these sorts of internal details. Amazon's services are used by developers who are looking to tightly optimize their usage. If Amazon were to publish detailed internal information, it's likely that folks would start optimizing applications based on internal details that have the potential to change over time.
Secondly, I think that a lot of companies publish these "tech blogs" as a way to boost recruiting (look at the cool stuff that we're doing, don't you want to join us?). Amazon, of course, doesn't have a recruiting problem. If you want to work on the largest-scale systems, it's already a top destination for you.
Author here. With SMR, you do have large zones that are essentially immutable. However, in this case our extents and volumes are immutable because we do volume level striping for erasure coding. This mean that if any extent changes, the parities have to be rewritten as well. Others, do block level striping, so they can just move data around within disk. There are lots of trade-offs with both approaches. Also, keeping volumes/extents immutable makes reasoning through correctness much simpler.
That's true. Every system has some quantum of storage that must be handled as a unit, whether that is a logical block that can only be discarded entirely or whatever. But I think the relatively gigantic immutable extents discussed here are somewhat unusual.
> Last year, we rolled out a new service that changed how data is placed across Magic Pocket. The change reduced write amplification for background writes, so each write triggered fewer backend storage operations. But it also had an unintended side effect: fragmentation increased, pushing storage overhead higher. Most of that growth came from a small number of severely under-filled volumes that consumed a disproportionate share of raw capacity
Me thinking big corps with huge infrastructure bills meticulously model changes like that using the production data they have, so that exact change in all the metrics they care about is known upfront. Turned out they are like me: deploy and see what breaks.
Author here :) We did have high-level metrics and expectations for how this change would behave, but a couple of factors made it much harder to reason about in practice that were happening in parallel.
Data in these systems moves slowly and with a lot of inertia, so the effects show up gradually and can lag behind the change itself. On top of that, the impact wasn’t uniform. Most of the overhead came from a small subset of volumes, so it took time to isolate what was actually driving the increase. These systems are hard to test at scale!
Google recently increased storage from 2 TB to 5 TB on their $20 AI plan, while Dropbox is still stuck at 2 or 3 TB for their $12/$20 plans.
They moved from 1 TB to 2 TB in mid-2019, and I wonder if they ever plan to pass on any of the gains from the past seven years of technological advancements, or if those gains are simply being captured on their side while we keep paying the same.
Aside from bad pricing and us wanting to move our data to servers owned by a European company, the thing that that bothered me the most as a (former) paying customer was the constant upsell pushes. Every time I’d log in to the web interface they would show ads in the web interface (including pop up dialogs) to try to move me to another plan.
I’m already paying 20 Euro per month. Leave me alone.
Good riddance.
are these "technological advancements" in storage in the room with us right now? because I'm looking at today's price per TB and it's higher than it was in 2020
did you calculate it with real inflation adjusted price? not the BS numbers in financial media, FED etc. Since 2020 unlimited printer, inflation is not few %.
What authoritative number did you have in mind, oh economic sage?
The correct number would still be somewhat negative (deflationary), as you'd expect. BLS says -8% https://data.bls.gov/timeseries/CUUR0000SEEE01?output_view=d...
It’s a shame that such fantastic engineering work is buried behind a product with so many annoyances dictated by the marketing/revenue teams.
I wish Dropbox would make some kind of “classic edition” that removed annoyances from their desktop client.
Until then, I’m using Filen. It’s fine, I have some qualms with it but it runs on every platform including Linux, it’s affordable, and end to end encrypted.
Does Amazon ever publish similar articles about S3?
I don't think there's much for Amazon to gain from publishing these sorts of internal details. Amazon's services are used by developers who are looking to tightly optimize their usage. If Amazon were to publish detailed internal information, it's likely that folks would start optimizing applications based on internal details that have the potential to change over time.
Secondly, I think that a lot of companies publish these "tech blogs" as a way to boost recruiting (look at the cool stuff that we're doing, don't you want to join us?). Amazon, of course, doesn't have a recruiting problem. If you want to work on the largest-scale systems, it's already a top destination for you.
The immutability of extents is dictated by their SMR hardware, I believe.
Author here. With SMR, you do have large zones that are essentially immutable. However, in this case our extents and volumes are immutable because we do volume level striping for erasure coding. This mean that if any extent changes, the parities have to be rewritten as well. Others, do block level striping, so they can just move data around within disk. There are lots of trade-offs with both approaches. Also, keeping volumes/extents immutable makes reasoning through correctness much simpler.
I don't know the full picture behind their decision-making but immutability is much easier to reason about in a distributed system, in general.
That's true. Every system has some quantum of storage that must be handled as a unit, whether that is a logical block that can only be discarded entirely or whatever. But I think the relatively gigantic immutable extents discussed here are somewhat unusual.
[dead]
All this talk about a tool that isn’t open source?
You've never seen like google engineering talks?