I guess the key questions are: what are the tradeoffs Go makes, and what kind of work is it well suited for?
I think Go's key strengths are syntax simplicity, niceties for large scale codebase (good monorepo support, quick builds, quick startup, easy CLIs, testing and language server built in), and good concurrency constructs.
There are controversial aspects too, like an aversion to annotations and exceptions and anything that hides a plain control flow - as well as a reluctance to evolve the syntax and basic aspects of the language. I like those things personally but that is because I find myself well aligned to Go's philosophy.
I would say that the worst parts of the language would actually be performance, I would not use Go if single threaded performance is critical (and the author Notes this). The GC is not the most efficient either. But I think given that hardware evolution is leaning towards more cores, and scaling up is leaning more towards scaling up containers, Go's choices put it in good stead here.
Go is my favourite language at the moment, has been for a while.
Every tool/language designer makes trade-off choices. I like the ones Golang makes. To me it reeked of experienced folks being in charge. I have grey hairs from all the Bad Things I've had to overcome in software engineering over the decades, in production, with big money at stake, oh and late at night on weekends as a bonus. As I learned about Golang I could tell its designer(s) had too and were trying to address it.
Is it perfect in all ways? Perhaps not. But nothing ever is, so that's a non-interesting criticism in my mind. I know a Good Thing when I see it. One of the few bright spots to getting older.
In regards to number crunching Go indeed made little effort to optimise this use case (especially since many good alternatives already existed at the time). Having said, I'm really hopeful that the SIMD proposals do eventually make it into the language, e.g. this one: https://github.com/golang/go/issues/73787#issuecomment-32081...
Every DataFrame library with a significant user base uses function chaining because that's the best workflow for such stuff. Also notebook support / magic cell comments for iterative EDA.
Python: polars-py, pandas, pySpark
JVM: Spark
R: R
Go can't compete with this even with SIMD support.
My suggestion to the OP talking about compression.
First up, consider just PNG compressing the image for simplicity. It's a mostly black image with color dots. That would generally compress really well with PNG.
But also, knowing the nature of the image, you could pretty easily compress the image by doing offsets to the next pixel. The format could look roughly something like this
[offset byte, color byte, offset byte, color byte].
It will fail in cases where each pixel has a color and will excel when there is a run of black pixels (which there will be a lot of those). It's a dead simple format to implement and read as well.
You can even keep the frame around for writing. You'd just be counting black pixels left to right top to bottom, emitting that number or 255 then 0 resetting and then counting some more.
I did try a large variety of encodings and the best was delta encoding frames followed by RLE encoding similar to what you describe. It did pretty well. However, when things start to move, it only shaved ~10-20% off the size at a significant complexity and compute cost. This was before I allowed each client to have its own frame and it was more common for there to be significant areas of black.
It'd definitely be tricky to get the computational performance that you'd want out of it. I'd imagine it'd be pretty easy to accidentally bust caches.
To solve for that, you could double your frame size and store the prev/next in an alternating fashion. IE [n, p, n, p, n, p] That way when you xor you are always working with highly local memory sets. You'd want to keep the frame basically global to avoid allocating.
If you wanted to be super clever then you could probably SIMD this up doing something like [n, n, n, n, p, p, p, p]. I'm not sure how you'd turn that into RLE in SIMD. I'm not clever enough for that :D (but I'm sure someone has done it).
But as you said, complexity would definitely increase rather than decrease even though you could get much better compute time.
It also looks like half of the screen doen't change (the part that is far away fro the cursor), so compressing the xor of the new and old image would improve compression a lot.
Gaffer (Glenn Fiedler, mentioned in the article) would also say, and I quote, "if you use Euler, then you're a bloody idiot" :) This simulation is using Euler integration.
This uses a simple delta time to smooth updates across frames rather than attempting something more formal. Based on the sister comment I think this is actually Semi-implicit Euler which still makes me an idiot.
If you're doing semi-implicit Euler that's pretty good, per gaffer's article, semi-implicit Euler has the advantage of being symplectic, i.e. its an integration method that conserves the total energy of the system you are simulating (energy conservation! physicists love it!).
the particle motion in your videos looks reasonably natural, there's no obvious signs of particles seemingly gaining energy from nothing (apart from when they are perturbed by the mouse cursor), so as the resulting motion looks natural, what you are doing is fine and there's no actual problem you need to solve.
if instead of simulating independent particles you were trying to accurately simulate cloth or a deformable rigid body e.g. a metallic object crashing into something, where each node in your cloth / rigid body was a particle tethered to its neighbours, that might be a whole different can of worms and justify looking into RK4 or implicit Euler where you need to solve a big linear system of equations each timestep. but you're not doing that, so no need to overcomplicate things!
Let's bear in mind that Australians call their best friends Good Cunts, and try to take it the best possible way :D I don't even disagree with him, it's just too easy to do better.
The friction/damping term you've added is absolutely necessary to counteract the systematic energy gain from Euler integration, and with better integration, you need less / no unphysical damping, leading to more of that delicious chaotic behaviour we're all looking for.
You can even cheese this with infinite amount of computation if you wanted to do it braindead style (which is still super instructive!), by just repeating the step function with a scaled-down dt until the position components agree to within some tolerance.
The rest is an incredibly deep rabbithole, which I've been enjoying since decades :D
>Note that the best-case scenario is the elimination of the overheads above to 0, which is at most ~10% in these particular benchmarks. Thus, it's helpful to consider the proportion of GC overhead eliminated relative to that 10% (so, 7% reduction means 70% GC overhead reduction).
Wow. amazing to see of off-heap allocation can be that good
Meanwhile Java and .NET have had off-heap and arenas for a while now.
Which goes to show how Go could be much better, if being designed with the learnings of others taken into account.
The adoption of runtime.KeepAlive() [0], and the related runtime.AddCleanup() as replacement for finalizers are also learnings from other languages [1].
Recently used MemorySegment in Java, it is extremely good. Just yesterday i implemented Map and List interface using MemorySegment as backing store for batch operations instead of using OpenHFT stuff.
Tried -XX:TLABSize before but wasnt getting the deserved performance.
Not sure about .NET though, havent used since last decade.
I guess the key questions are: what are the tradeoffs Go makes, and what kind of work is it well suited for?
I think Go's key strengths are syntax simplicity, niceties for large scale codebase (good monorepo support, quick builds, quick startup, easy CLIs, testing and language server built in), and good concurrency constructs.
There are controversial aspects too, like an aversion to annotations and exceptions and anything that hides a plain control flow - as well as a reluctance to evolve the syntax and basic aspects of the language. I like those things personally but that is because I find myself well aligned to Go's philosophy.
I would say that the worst parts of the language would actually be performance, I would not use Go if single threaded performance is critical (and the author Notes this). The GC is not the most efficient either. But I think given that hardware evolution is leaning towards more cores, and scaling up is leaning more towards scaling up containers, Go's choices put it in good stead here.
Go is my favourite language at the moment, has been for a while.
Nailed it.
Every tool/language designer makes trade-off choices. I like the ones Golang makes. To me it reeked of experienced folks being in charge. I have grey hairs from all the Bad Things I've had to overcome in software engineering over the decades, in production, with big money at stake, oh and late at night on weekends as a bonus. As I learned about Golang I could tell its designer(s) had too and were trying to address it.
Is it perfect in all ways? Perhaps not. But nothing ever is, so that's a non-interesting criticism in my mind. I know a Good Thing when I see it. One of the few bright spots to getting older.
In regards to number crunching Go indeed made little effort to optimise this use case (especially since many good alternatives already existed at the time). Having said, I'm really hopeful that the SIMD proposals do eventually make it into the language, e.g. this one: https://github.com/golang/go/issues/73787#issuecomment-32081...
Hopefully, as that is another area where other managed languages have finally started to taking seriously.
Julia, .NET, Java (even if preview), Swift.
While writing Assembly isn't that bad, having to deal with the Go Plan 9 Assembler syntax isn't something I am keen on putting up with.
Yes, this is the reason why there are no currently-maintained data-science/dataframe libraries in Go either
Every DataFrame library with a significant user base uses function chaining because that's the best workflow for such stuff. Also notebook support / magic cell comments for iterative EDA.
Python: polars-py, pandas, pySpark JVM: Spark R: R
Go can't compete with this even with SIMD support.
My suggestion to the OP talking about compression.
First up, consider just PNG compressing the image for simplicity. It's a mostly black image with color dots. That would generally compress really well with PNG.
But also, knowing the nature of the image, you could pretty easily compress the image by doing offsets to the next pixel. The format could look roughly something like this
[offset byte, color byte, offset byte, color byte].
It will fail in cases where each pixel has a color and will excel when there is a run of black pixels (which there will be a lot of those). It's a dead simple format to implement and read as well.
You can even keep the frame around for writing. You'd just be counting black pixels left to right top to bottom, emitting that number or 255 then 0 resetting and then counting some more.
Author here. Surprised to see this show up.
I did try a large variety of encodings and the best was delta encoding frames followed by RLE encoding similar to what you describe. It did pretty well. However, when things start to move, it only shaved ~10-20% off the size at a significant complexity and compute cost. This was before I allowed each client to have its own frame and it was more common for there to be significant areas of black.
> and compute cost
It'd definitely be tricky to get the computational performance that you'd want out of it. I'd imagine it'd be pretty easy to accidentally bust caches.
To solve for that, you could double your frame size and store the prev/next in an alternating fashion. IE [n, p, n, p, n, p] That way when you xor you are always working with highly local memory sets. You'd want to keep the frame basically global to avoid allocating.
If you wanted to be super clever then you could probably SIMD this up doing something like [n, n, n, n, p, p, p, p]. I'm not sure how you'd turn that into RLE in SIMD. I'm not clever enough for that :D (but I'm sure someone has done it).
But as you said, complexity would definitely increase rather than decrease even though you could get much better compute time.
It also looks like half of the screen doen't change (the part that is far away fro the cursor), so compressing the xor of the new and old image would improve compression a lot.
Yeah, that would also help with when the image is static with no interactions.
This is what I come to Hacker News for. Thank you!
What a beautiful beast: https://howfastisgo.dev/
Is this on a smart TV or a server feeding the results to a smart TV? Seems like quite an important difference.
you should probably read the article then, it's pretty clearly explained
I did and don’t understand why the title is “simulating particles on a smart TV” if that is not what it is doing.
Ok, maybe that was a little bit clickbaity, but the first sentence should clarify it:
> The challenge, simulate millions of particles in golang, multi-player enabled, cpu only, smart tv compatible.
Usually you wouldn't do that on the server, but if you want performance metrics, it's probably easier to measure on the server than on X clients?
"No client simulation allowed only server."
Seriously
Gaffer (Glenn Fiedler, mentioned in the article) would also say, and I quote, "if you use Euler, then you're a bloody idiot" :) This simulation is using Euler integration.
I suppose I am a bloody idiot.
This uses a simple delta time to smooth updates across frames rather than attempting something more formal. Based on the sister comment I think this is actually Semi-implicit Euler which still makes me an idiot.
Eg, velocity += acceleration * dt; position += velocity * dt;
Although, I add friction in a bad spot so maybe that is what you mean.
If you're doing semi-implicit Euler that's pretty good, per gaffer's article, semi-implicit Euler has the advantage of being symplectic, i.e. its an integration method that conserves the total energy of the system you are simulating (energy conservation! physicists love it!).
the particle motion in your videos looks reasonably natural, there's no obvious signs of particles seemingly gaining energy from nothing (apart from when they are perturbed by the mouse cursor), so as the resulting motion looks natural, what you are doing is fine and there's no actual problem you need to solve.
if instead of simulating independent particles you were trying to accurately simulate cloth or a deformable rigid body e.g. a metallic object crashing into something, where each node in your cloth / rigid body was a particle tethered to its neighbours, that might be a whole different can of worms and justify looking into RK4 or implicit Euler where you need to solve a big linear system of equations each timestep. but you're not doing that, so no need to overcomplicate things!
Let's bear in mind that Australians call their best friends Good Cunts, and try to take it the best possible way :D I don't even disagree with him, it's just too easy to do better.
The friction/damping term you've added is absolutely necessary to counteract the systematic energy gain from Euler integration, and with better integration, you need less / no unphysical damping, leading to more of that delicious chaotic behaviour we're all looking for.
You can even cheese this with infinite amount of computation if you wanted to do it braindead style (which is still super instructive!), by just repeating the step function with a scaled-down dt until the position components agree to within some tolerance.
The rest is an incredibly deep rabbithole, which I've been enjoying since decades :D
https://gafferongames.com/post/integration_basics/
I realise this isn't the most thoughtful comment but I hope the intended spirit comes across when I say, sincerely: ha ha yay (clapping hands)
> On the topic of memory, with millions of particles the server barely breaks over 100mb
Although experimental as of now, but use of arena package is a natual fit here.
The arena experiment is on indefinite hold:
> Note, 2023-01-17. This proposal is on hold indefinitely due to serious API concerns.
https://github.com/golang/go/issues/51317
Potential successor: https://github.com/golang/go/discussions/70257
>Note that the best-case scenario is the elimination of the overheads above to 0, which is at most ~10% in these particular benchmarks. Thus, it's helpful to consider the proportion of GC overhead eliminated relative to that 10% (so, 7% reduction means 70% GC overhead reduction).
Wow. amazing to see of off-heap allocation can be that good
https://go.googlesource.com/proposal/+/refs/heads/master/des...
Meanwhile Java and .NET have had off-heap and arenas for a while now.
Which goes to show how Go could be much better, if being designed with the learnings of others taken into account.
The adoption of runtime.KeepAlive() [0], and the related runtime.AddCleanup() as replacement for finalizers are also learnings from other languages [1].
[0] - https://learn.microsoft.com/en-us/dotnet/api/system.gc.keepa...
[1] - https://openjdk.org/jeps/421
What a coincedence ! :)
Recently used MemorySegment in Java, it is extremely good. Just yesterday i implemented Map and List interface using MemorySegment as backing store for batch operations instead of using OpenHFT stuff.
Tried -XX:TLABSize before but wasnt getting the deserved performance.
Not sure about .NET though, havent used since last decade.
[dead]