I'd call this logging calls to your business logic layer. Then running the logged calls on your business logic layer in a development environment to debug the problem.
Your business logic layer should be separate from your UI/presentation layer.
Makes it easy to test them separately if they're not tightly coupled.
Also if you wanna to reuse your business logic layer in a different UI environment, it's easier to switch to another UI if they're not tightly coupled.
Interesting approach. I often get to the same goal by using the replicated state machine pattern. Where all inputs to a system are recorded. Both methods seem to rely on designing your application in a very specific way to be able to replay inputs and deterministically get the same outputs.
In some senses, that is true of every scheme; you need to ensure you capture all non-determinism and that can be done with either more capturing or less non-determinism.
However, the restrictions for generic replay-based time-travel debugging is mostly just not using shared memory and, as a corollary, not using multiple threads in a process (multiple processes is okay). Deliberately architecting your system in the way described in the article is largely unnecessary as the overhead of these generic schemes is low, much less work, applies to most codebases that could even attempt deliberate re-architecture, and integrates well with existing tooling and visualizers.
You can even lower these restrictions further to include explicit shared memory if you record those accesses. And you can do everything if you just record all accesses. The overhead of each of these schemes increasing as the amount of recording needed to capture these forms of non-determinism increases.
I had huge success writing a trading system where everything went through the same `on_event(Inputs) -> Outputs` function of the core and a thin shell was translating everything to inputs and the outputs to actions. I actually had a handful of these components communicating via message passing.
This worked rather well as most of the input is async messages anyway, but building anything else this way feels very tiresome.
Usually two methods `onMessage(long timeNow, byte[] buf)` and `onTimer(long timeNow, int timerId)`.
All output sinks and a scheduler need to be passed in on construction of the application.
Then you can record all inputs to a file. Outputs don’t need to be recorded because the inputs can be replayed but it is useful for easier analysis when bugs happen.
I have even worked on systems where there were tools that you could paste recorded input and outputs to and they code generated the source code for a unit test. Super useful for reproducing issues quickly.
But you are spot on in that there is an overhead. For example, if you want to open a TCP socket and then read and write to it, you need to create a separate service and serialise all the inputs and outputs in a way that can be recorded and replayed.
Interesting. I built a sort-of-similar system for executing a series of linked, serially-dependent system commands from the TUI used to manage some of our secure appliances. It made writing and debugging such sequences much easier: each command was represented by a struct containing optional pre, post-success, and post-fail log and status line messages, where status could simply default to log.
It meant that the user received meaningful short updates as things progressed, with detailed information in system logs.
This made it much easier for testers and users to report bugs and for developers to understand what to look for in logs.
What they're describing isn't time travel debugging - https://en.wikipedia.org/wiki/Time_travel_debugging.
I'd call this logging calls to your business logic layer. Then running the logged calls on your business logic layer in a development environment to debug the problem.
Your business logic layer should be separate from your UI/presentation layer.
Makes it easy to test them separately if they're not tightly coupled.
Also if you wanna to reuse your business logic layer in a different UI environment, it's easier to switch to another UI if they're not tightly coupled.
Interesting approach. I often get to the same goal by using the replicated state machine pattern. Where all inputs to a system are recorded. Both methods seem to rely on designing your application in a very specific way to be able to replay inputs and deterministically get the same outputs.
In some senses, that is true of every scheme; you need to ensure you capture all non-determinism and that can be done with either more capturing or less non-determinism.
However, the restrictions for generic replay-based time-travel debugging is mostly just not using shared memory and, as a corollary, not using multiple threads in a process (multiple processes is okay). Deliberately architecting your system in the way described in the article is largely unnecessary as the overhead of these generic schemes is low, much less work, applies to most codebases that could even attempt deliberate re-architecture, and integrates well with existing tooling and visualizers.
You can even lower these restrictions further to include explicit shared memory if you record those accesses. And you can do everything if you just record all accesses. The overhead of each of these schemes increasing as the amount of recording needed to capture these forms of non-determinism increases.
Another approach is to record at a lower level and then reconstruct the series of events, eg.g. https://engineering.fb.com/2021/04/27/developer-tools/revers...
How do you structure your program to do this?
I had huge success writing a trading system where everything went through the same `on_event(Inputs) -> Outputs` function of the core and a thin shell was translating everything to inputs and the outputs to actions. I actually had a handful of these components communicating via message passing.
This worked rather well as most of the input is async messages anyway, but building anything else this way feels very tiresome.
I also worked on trading systems.
Usually two methods `onMessage(long timeNow, byte[] buf)` and `onTimer(long timeNow, int timerId)`.
All output sinks and a scheduler need to be passed in on construction of the application.
Then you can record all inputs to a file. Outputs don’t need to be recorded because the inputs can be replayed but it is useful for easier analysis when bugs happen.
I have even worked on systems where there were tools that you could paste recorded input and outputs to and they code generated the source code for a unit test. Super useful for reproducing issues quickly.
But you are spot on in that there is an overhead. For example, if you want to open a TCP socket and then read and write to it, you need to create a separate service and serialise all the inputs and outputs in a way that can be recorded and replayed.
Thanks alot.
How easy one can debug such a systems easily outweighs the costs, if you have a low number of separate services.
OP is basically describing functional programming
Interesting. I built a sort-of-similar system for executing a series of linked, serially-dependent system commands from the TUI used to manage some of our secure appliances. It made writing and debugging such sequences much easier: each command was represented by a struct containing optional pre, post-success, and post-fail log and status line messages, where status could simply default to log.
It meant that the user received meaningful short updates as things progressed, with detailed information in system logs.
This made it much easier for testers and users to report bugs and for developers to understand what to look for in logs.