Since old Fortran does not have function pointers I used to wonder how does one write gradient descent routine for a user defined function in Fortran.
In other languages, one typically passes the user defined function and it's gradient to the routine as callbacks, so that the routine can call them at will.
It was a moment of great amusement and joy (grinning like a silly kid) when I learned how Fortran did it. It was through a pared down form of a coroutine. Inversion of control.
The gradient descent routine would effectively suspend and 'return' with a flag indicating that it needs the user defined function computed, or the gradient function computed.
These are computed outside and then the routine is called again with these recently computed values. At this point it resumes from where it had yielded. Pretty neat.
The routine's return value indicated whether it is done, or that it is suspended and expects to be called again to resume.
Now I don't remember if the suspended state was stored transparently by the run time, or explicitly in the arguments of the routine (pass by reference, I think), or whether the local, routine specific 'program counter' was managed explicitly or transparently by the compiler and run time.
That is a fascinating comparison! History really does rhyme.
What you described—suspending with a flag and resuming with new values—is surprisingly similar to how the C++20 compiler transforms coroutines into a state machine under the hood.
In tiny_coro, the promise_type essentially manages that "inversion of control" you mentioned, but hides the manual state management behind the co_await syntax. It's really cool to see that these fundamental patterns for "stopping and resuming" computation have existed for so long, just in different forms.
You are absolutely right about the history (Conway coined the term in '63!).
The assembly stack-switching you mentioned is the classic "Stackful" approach (like Fibers or Boost.Context). C++20 went the "Stackless" route (compiler-generated state machines) to avoid the per-stack memory overhead, though it certainly makes the implementation mechanics trickier.
Lua is definitely a gold standard for asymmetric coroutines.
As for Felix—I actually wasn't familiar with it until you mentioned it. Thanks for the pointer! I'll definitely dig into their docs to see how they handle scheduling compared to the C++20 approach.
Haha, you have a really sharp eye!
1. The Baby Wireshark: Unfortunately, my previous GitHub account got flagged/shadowbanned (still not sure why, and the appeal didn't work), so I lost access to that repo. I had to start fresh with this new account.
2. The Tone: You caught me! English is not my first language. I do use LLMs to help polish my grammar because I want to ensure my technical points are clear and I don't accidentally sound rude due to language barriers.
Also, I learned most of my English from reading technical documentation and watching TED talks, so my "default" style tends to be a bit formal—just like an AI! :)
> Unfortunately, my previous GitHub account got flagged/shadowbanned
Oh! That sucks big time. Hope you had a local repo of everything.
BTW I don't think your tone is formal, just suspiciously congratulatory and polite :)
Do checkout Felix when you get time. It has done a lot with coroutines. You were asking about scheduling, it allows you to create more schedulers within the language.
Haha, point taken! I will try to dial down the "customer service mode" :)
Re: GitHub: Yeah, thankfully I had the local git history, so the code was safe. Just lost the community interactions (stars/issues), which was a bummer.
Re: Felix: That sounds exactly like what I need. Being able to define custom schedulers within the language is pretty much the holy grail I'm chasing here. I'll put it on my reading list for tonight.
Oh absolutely! Adam Dunkels' work is legendary.
It's kind of poetic that Protothreads achieved stackless concurrency via C preprocessor macros and Duff's device tricks, and now years later, C++20 finally baked that exact same "state machine via switch-case" logic directly into the compiler.
Same spirit, just less macro magic! :)
Thanks for the reference—I learned a lot from his work.
His use of macros to hide the switch statement is basically the spiritual ancestor of co_await. It is wild to think that C++20 essentially standardized that exact pattern, just moving the "state machine generation" from the preprocessor hacks to the compiler itself.
Fortran has always had function pointers though they were called something different, and could only be used in limited contexts. But look at any numerical integration or root finding library. You just pass the function name in as a parameter to another function.
This looks cool, but says it's simpler than Seastar. Seastar last time I looked at it was fairly simple, but didn't use coroutines at all. Instead it used a futures/promise scheme. It was in C++14, back when C++20 coroutines didn't exist yet.
C++20 coroutines have seemed very complex to me and I haven't sat down to try to understand them. I guess tiny_coro could help for that. I do remember that they were stackless.
I'm not that keen on coroutine and async approaches to concurrency anyway these days. I'd rather use something with actual multitasking, either with Posix threads or with lightweight processes like Erlang's or Go's.
Also, C++ itself is apparently in the process of sinking into a bog. Rust is taking over, which is not entirely good news. I had wanted to spend more time on Ada.
You hit the nail on the head. C++20 coroutines are indeed complex and the barrier to entry is high.
However, that complexity actually forced me to start from first principles. It drove me to tackle the essential problems from the ground up, which gave me a much deeper understanding of how coroutines truly work.
That is exactly why I built this project—I wanted to create a minimal "laboratory" to dissect stackless coroutines without the overhead of a massive framework like Seastar.
Regarding your point on Erlang/Go: That's actually the goal of this scheduler! It implements the M:N threading model (Work-Stealing) to simulate that kind of "lightweight process" concurrency, but giving you manual control over the mechanics.
Hope this helps you finally wrap your head around co_await!
I think there is some hope of a sane wrapper around C++20 coroutines that will make them easier to use. I saw a tutorial a while back mentioning that might eventually become part of the C++ standard. I once tried to use Boost coroutines but it was too much headache and I switched to a different approach.
Erlang's processes and Goroutines are stackful unlike C++ coroutines. Erlang also forbids observable data sharing between processes which avoids a lot of pitfalls. I don't think that can be enforced in C++ or Go.
GHC lightweight threads and its STM library (software transactional memory) could be another thing to look at. I wonder if a useful STM feature is feasible for tiny_coro.
You are spot on about the distinction. C++ chose the stackless path for "zero-overhead" efficiency, but it definitely shifts the burden of safety and usability onto the library developers.
Regarding safety and data sharing: You're right, C++ won't enforce isolation like Erlang. That's the trade-off we make for performance.
However, to address your point about "sane wrappers" and concurrency models: I actually implemented Go-style Channels (CSP) on top of this scheduler to bridge that gap.
It uses co_await to mimic Go's channel behavior, supporting Direct Handoff (skipping the buffer/queue if a receiver is waiting) and optimization via await_suspend returning false to avoid context switches on the fast path.
While a full-blown STM might be too heavy, I found that combining these Channels with Epoch-Based Reclamation (EBR) gives a pretty robust safety net without the overhead of heavy locking.
If you're interested, the Channel implementation is here: https://github.com/lixiasky-back/tiny_coro-build_your_own_MN...
Any Fortran folks around ?
Since old Fortran does not have function pointers I used to wonder how does one write gradient descent routine for a user defined function in Fortran.
In other languages, one typically passes the user defined function and it's gradient to the routine as callbacks, so that the routine can call them at will.
It was a moment of great amusement and joy (grinning like a silly kid) when I learned how Fortran did it. It was through a pared down form of a coroutine. Inversion of control.
The gradient descent routine would effectively suspend and 'return' with a flag indicating that it needs the user defined function computed, or the gradient function computed.
These are computed outside and then the routine is called again with these recently computed values. At this point it resumes from where it had yielded. Pretty neat.
The routine's return value indicated whether it is done, or that it is suspended and expects to be called again to resume.
Now I don't remember if the suspended state was stored transparently by the run time, or explicitly in the arguments of the routine (pass by reference, I think), or whether the local, routine specific 'program counter' was managed explicitly or transparently by the compiler and run time.
That is a fascinating comparison! History really does rhyme. What you described—suspending with a flag and resuming with new values—is surprisingly similar to how the C++20 compiler transforms coroutines into a state machine under the hood. In tiny_coro, the promise_type essentially manages that "inversion of control" you mentioned, but hides the manual state management behind the co_await syntax. It's really cool to see that these fundamental patterns for "stopping and resuming" computation have existed for so long, just in different forms.
Coroutine idea is very old. I am not 100% sure but they were heavily used in Telecom applications.
It's easier in C and old C++ if one can drop down to assembly to switch between different call stacks.
Among modern languages there is Lua. Take a look at Felix too.
https://felix-tutorial.readthedocs.io/en/latest/tut110.html
https://felix.readthedocs.io/en/latest/fibres.html
You are absolutely right about the history (Conway coined the term in '63!). The assembly stack-switching you mentioned is the classic "Stackful" approach (like Fibers or Boost.Context). C++20 went the "Stackless" route (compiler-generated state machines) to avoid the per-stack memory overhead, though it certainly makes the implementation mechanics trickier. Lua is definitely a gold standard for asymmetric coroutines. As for Felix—I actually wasn't familiar with it until you mentioned it. Thanks for the pointer! I'll definitely dig into their docs to see how they handle scheduling compared to the C++20 approach.
BTW what happened to your baby wireshark ?
Also, do you use ChatGPT to frame your responses ? Nothing wrong if you do, but people might notice the tone and wonder.
Haha, you have a really sharp eye! 1. The Baby Wireshark: Unfortunately, my previous GitHub account got flagged/shadowbanned (still not sure why, and the appeal didn't work), so I lost access to that repo. I had to start fresh with this new account. 2. The Tone: You caught me! English is not my first language. I do use LLMs to help polish my grammar because I want to ensure my technical points are clear and I don't accidentally sound rude due to language barriers. Also, I learned most of my English from reading technical documentation and watching TED talks, so my "default" style tends to be a bit formal—just like an AI! :)
> Unfortunately, my previous GitHub account got flagged/shadowbanned
Oh! That sucks big time. Hope you had a local repo of everything.
BTW I don't think your tone is formal, just suspiciously congratulatory and polite :)
Do checkout Felix when you get time. It has done a lot with coroutines. You were asking about scheduling, it allows you to create more schedulers within the language.
Haha, point taken! I will try to dial down the "customer service mode" :) Re: GitHub: Yeah, thankfully I had the local git history, so the code was safe. Just lost the community interactions (stars/issues), which was a bummer. Re: Felix: That sounds exactly like what I need. Being able to define custom schedulers within the language is pretty much the holy grail I'm chasing here. I'll put it on my reading list for tonight.
Also look at Protothreads :)
Oh absolutely! Adam Dunkels' work is legendary. It's kind of poetic that Protothreads achieved stackless concurrency via C preprocessor macros and Duff's device tricks, and now years later, C++20 finally baked that exact same "state machine via switch-case" logic directly into the compiler. Same spirit, just less macro magic! :)
Simon Tatham and Tom Duff too.
https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
The legendary "Coroutines in C" article!
Thanks for the reference—I learned a lot from his work. His use of macros to hide the switch statement is basically the spiritual ancestor of co_await. It is wild to think that C++20 essentially standardized that exact pattern, just moving the "state machine generation" from the preprocessor hacks to the compiler itself.
Fortran has always had function pointers though they were called something different, and could only be used in limited contexts. But look at any numerical integration or root finding library. You just pass the function name in as a parameter to another function.
Fortran has had dummy procedures (subprogram arguments that can be associated with subroutines or functions) since Fortran II.
I see. I stand corrected.
This looks cool, but says it's simpler than Seastar. Seastar last time I looked at it was fairly simple, but didn't use coroutines at all. Instead it used a futures/promise scheme. It was in C++14, back when C++20 coroutines didn't exist yet.
C++20 coroutines have seemed very complex to me and I haven't sat down to try to understand them. I guess tiny_coro could help for that. I do remember that they were stackless.
I'm not that keen on coroutine and async approaches to concurrency anyway these days. I'd rather use something with actual multitasking, either with Posix threads or with lightweight processes like Erlang's or Go's.
Also, C++ itself is apparently in the process of sinking into a bog. Rust is taking over, which is not entirely good news. I had wanted to spend more time on Ada.
You hit the nail on the head. C++20 coroutines are indeed complex and the barrier to entry is high. However, that complexity actually forced me to start from first principles. It drove me to tackle the essential problems from the ground up, which gave me a much deeper understanding of how coroutines truly work. That is exactly why I built this project—I wanted to create a minimal "laboratory" to dissect stackless coroutines without the overhead of a massive framework like Seastar. Regarding your point on Erlang/Go: That's actually the goal of this scheduler! It implements the M:N threading model (Work-Stealing) to simulate that kind of "lightweight process" concurrency, but giving you manual control over the mechanics. Hope this helps you finally wrap your head around co_await!
I think there is some hope of a sane wrapper around C++20 coroutines that will make them easier to use. I saw a tutorial a while back mentioning that might eventually become part of the C++ standard. I once tried to use Boost coroutines but it was too much headache and I switched to a different approach.
Erlang's processes and Goroutines are stackful unlike C++ coroutines. Erlang also forbids observable data sharing between processes which avoids a lot of pitfalls. I don't think that can be enforced in C++ or Go.
GHC lightweight threads and its STM library (software transactional memory) could be another thing to look at. I wonder if a useful STM feature is feasible for tiny_coro.
You are spot on about the distinction. C++ chose the stackless path for "zero-overhead" efficiency, but it definitely shifts the burden of safety and usability onto the library developers. Regarding safety and data sharing: You're right, C++ won't enforce isolation like Erlang. That's the trade-off we make for performance. However, to address your point about "sane wrappers" and concurrency models: I actually implemented Go-style Channels (CSP) on top of this scheduler to bridge that gap. It uses co_await to mimic Go's channel behavior, supporting Direct Handoff (skipping the buffer/queue if a receiver is waiting) and optimization via await_suspend returning false to avoid context switches on the fast path. While a full-blown STM might be too heavy, I found that combining these Channels with Epoch-Based Reclamation (EBR) gives a pretty robust safety net without the overhead of heavy locking. If you're interested, the Channel implementation is here: https://github.com/lixiasky-back/tiny_coro-build_your_own_MN...
Does anyone know where to find the mailing list archives of Stackless Python ?
I want to track how its design evolved over time.
Sincere apologies for the hijack. Hopefully it too will be of interest to those who are interested in coroutines.