2 comments

  • mtmail 34 minutes ago ago

    The solution back then makes sense. It was a couple of years too early for CouchDB, Cassandra or MySQL Cluster (https://en.wikipedia.org/wiki/MySQL_Cluster) which are more suited for write-heavy applications and clustering across servers.

    Later spinning harddrives were replaced by SSD, then NVME. More open source NoSQL and columnular storage solution. Cloud services started offering hosted databases with unlimited scale. 500GB would all be 'hot' ready to be queried in realtime.

    Today I'd see three options

    * multiple cloud servers which receive the data and put it into a managed cloud database, like Google BigQuery. They'll handle all scale, including region replication, backups. You might overpay but likely still less than Oracle software licence.

    * specialist SaaS for IoT, for example ClickHouse. They can handle 10.000 incoming rows per second. The data store later does defragmentation, storing data by date and other optimizations which make it faster to query recent data, vs older data.

    * place it into JSON or CSV files, one per hour, or one day and query with DuckDB.

  • theamk 9 hours ago ago

    Input data rate: 1e6 * 48 / (24*60*60) = ~560 TPS

    Working data size (1 day): (8B meter id + 8B timestamp + 8B value) * 48 * 1e6 = 1.1 gigabyte

    Archival: 1.1GB/day * 15 month = 482 GB .. but looks like mostly write-only data?

    That is pretty small as far as modern servers go. My usual approach for IoT things is event sourcing-like architecture - you have logging servers (for disaster recovery) and processing server(s), which keep things in RAM. If processing servers crash, they restart, and logging servers re-send data from last checkpoint.

    But I am sure that a Postgres can be used as well, or some sort of time-series database too. Based on your description, you really don't need much as far as database go - basically range queries and that's it.