This is cool! no shade at all to desplega.ai but I would love a version of this that runs locally + does stuff like verifying no tests are flaky. I do this with a few extra steps via claude code + playwright tests. e2e tests are the best way I know for catching UI regressions but they're expensive and annoying to run, so something that looked at a PR and healed / wrote tests in the background as I work on features would be pretty cool.
Why local? Basically I'm just cost sensitive for my own projects and already have this nasty MacBook that only gets like 20% utilization.
If your challenge is flakiness, this should help initially. Unfortunately, there’s a lot of work in our engine, and a custom system to handle operations that goes beyond vanilla Playwright so running it locally would be quite challenging.
Atleast in my industry (highly regulated), I think it would be better if these agentic e2e tools output playwright code instead of keeping it all under the hood, as no risk averse regulated company will use a QA agent which could be nondeterministic when re running the same test
As I mentioned above, a playwright won’t make the cut for many of the serious test cases we’ve seen, you need a whole system that ensures your tests are run and improved immediately. We created this project in a way that supports on-premise deployments, but you’ll need to run the whole engine and eventually use some SLMs/LLMs at different stages.
At the end of the day, is the LLM not just calling Playwright APIs? I’d rather have access to the final set of Playwright API steps that the LLM executed to accomplish a goal, rather than just hoping the LLM will choose the same actions again the second time i run it
We use PW for the interaction with the browser, but really how we represent what to do is in a custom format (could be executed in other frameworks too). So the PW we could generate would be a subset, where the more interesting parts (custom functions) are not really implemented in PW.
Also part of our format is specially finding deterministic way of running steps, with automatic healing when failed. And we also build the whole system in a way that is self-hostable, so in the cases you mention you could be able to have control over what is run and where.
Hey there, been building an MCP to help discover, automate and run E2Es automatically connected to you Cursor / Claude / Codex / etc.
Funny enough, one of the most challenging things while building it was to be able to remotely control the browser that runs locally (been using https://localtunnel.me/ for it), while making sure that it does not impact the user too much.
Also, I wondered if there's someone that is shipping CLIs with an "mcp" command, as seems that having a normal CLI for the funcionatilitie sof the MCP would make a lot of sense + with the option of running it as an MCP if the user wants to.
Have you seen this pattern?
Also, as there's a lot of buzzwording around MCPs, any of you that been using an MCP as a daily driver? For me it was the github one, specially for code search and stuff like that.
Totally! Actually self hosting the localtunnel was key to improve latency, and it was easy enough to do.
Also about the CLI thing you mention, we had cases where people did not use MCP clients, so I actually wanted to expose a way to interact with the tools directly without the need of the server running!
> This server integrates with desplega.ai
This is cool! no shade at all to desplega.ai but I would love a version of this that runs locally + does stuff like verifying no tests are flaky. I do this with a few extra steps via claude code + playwright tests. e2e tests are the best way I know for catching UI regressions but they're expensive and annoying to run, so something that looked at a PR and healed / wrote tests in the background as I work on features would be pretty cool.
Why local? Basically I'm just cost sensitive for my own projects and already have this nasty MacBook that only gets like 20% utilization.
One of the things we used is this algorithm with retries from meta: https://engineering.fb.com/2020/12/10/developer-tools/probab...
If your challenge is flakiness, this should help initially. Unfortunately, there’s a lot of work in our engine, and a custom system to handle operations that goes beyond vanilla Playwright so running it locally would be quite challenging.
Atleast in my industry (highly regulated), I think it would be better if these agentic e2e tools output playwright code instead of keeping it all under the hood, as no risk averse regulated company will use a QA agent which could be nondeterministic when re running the same test
As I mentioned above, a playwright won’t make the cut for many of the serious test cases we’ve seen, you need a whole system that ensures your tests are run and improved immediately. We created this project in a way that supports on-premise deployments, but you’ll need to run the whole engine and eventually use some SLMs/LLMs at different stages.
At the end of the day, is the LLM not just calling Playwright APIs? I’d rather have access to the final set of Playwright API steps that the LLM executed to accomplish a goal, rather than just hoping the LLM will choose the same actions again the second time i run it
We use PW for the interaction with the browser, but really how we represent what to do is in a custom format (could be executed in other frameworks too). So the PW we could generate would be a subset, where the more interesting parts (custom functions) are not really implemented in PW.
Also part of our format is specially finding deterministic way of running steps, with automatic healing when failed. And we also build the whole system in a way that is self-hostable, so in the cases you mention you could be able to have control over what is run and where.
Hey there, been building an MCP to help discover, automate and run E2Es automatically connected to you Cursor / Claude / Codex / etc.
Funny enough, one of the most challenging things while building it was to be able to remotely control the browser that runs locally (been using https://localtunnel.me/ for it), while making sure that it does not impact the user too much.
Also, I wondered if there's someone that is shipping CLIs with an "mcp" command, as seems that having a normal CLI for the funcionatilitie sof the MCP would make a lot of sense + with the option of running it as an MCP if the user wants to.
Have you seen this pattern?
Also, as there's a lot of buzzwording around MCPs, any of you that been using an MCP as a daily driver? For me it was the github one, specially for code search and stuff like that.
Totally! Actually self hosting the localtunnel was key to improve latency, and it was easy enough to do.
Also about the CLI thing you mention, we had cases where people did not use MCP clients, so I actually wanted to expose a way to interact with the tools directly without the need of the server running!
Claude code comes to mind: `claude mcp serve`