It's interesting that you’re using Linear tickets as the primary context source. From my experience so far, one of the biggest issues with coding agents is context drift. Ticket says one thing, but the codebase has changed since it was written. How did you solve? fresh RAG pass or use something like ctags to map the repo before it starts the implementation, or does it rely entirely on the LLM's provided context window?
We don't have this kind of tooling built up, but we've been using Linear as a source of truth alongside a human written prompt adding context. It's worked really well for both feature development and bug hunting, and helps keep us honest with tickets (good LLM context is still good human context) as well as maintain some level of consistency between LLM passes on the same issue.
100% on using linear as the source of truth! We went a bit further and just use Linear as the prompt. So there isn't a human written one adding context that lives elsewhere
This has been great for me too. Every feature starts and completes with /gh-issue <issue number>.
Every issue is created with /spec and a conversation with a human. Once the spec is materialized as an issue it’s sufficient for an agent to implement.
We don’t believe PM or eng can write the best prompt or spec, so we don’t ask them to.
One real Linear ticket from a few months back that we assigned to broccoli:
Store post-processing run outcomes in a versioned, append-only audit trail so re-running the same processor on the same audio file produces a complete history (who/when/what changed), while keeping an easy “latest result” view. Add an admin-only UI.
That’s it. As a part of the sketch step, broccoli does its own repo discovery and online research before planning the execution.
Built similar for internal use at our work. Slack+JIRA though, not Linear. Otherwise GCP-native like this.
I didn't want to be on the hook for supporting an open source version though, so never made it public. Good on you for putting it out there.
A few differences I can quickly spot, fwiw...
I went with Firestore over Postgres for the lower cost, and use Cloud Tasks for "free" deduping of webhooks. Each webhooks is validated, translated, and created as an instant Cloud Task. They get deduped by ID.
We see a lot of value in a scheduler. So running a prompt on a schedule - good for things like status reports, or auto log reading/debug.
I prefer to put my PEMs in to KMS instead of Secret Manager. You can still sign things but without having to expose the actual private key where it can be snooped on.
I run the actual jobs on spot VMs using an image baked by Packer with all the tooling needed. You don't run in to time/resource limits running them as Cloud Run jobs?
Haha we are definitely like-minded because our internal Broccoli is actually on Firestore. That being said, Firestore is an acquired taste so we rewrote the OSS backend to Postgres so that everything can be deployed in one go with the infra that people are most familiar with.
Re: spot VMs. Great idea! There are two features we have not finished porting to OSS. Internally, we can specify the instance type and timeout, and we also send about 50% of jobs to Blaxel; we find it has a much better cold start compared to Cloud Run. We probably will port the multi-vendor support logic over to OSS soon but wanted to keep v1 simple (and a one-provider magic experience!).
Scheduler is a wish item for us. Curious how you implemented it? Currently, we just have a scheduled Cloud Function during the night to automatically address open PR comments (via the Broccoli GitHub feedback automation) so that the engineer wakes up to a mostly clean PR without needing to do anything. We haven't ported this to the OSS yet because 1) Firebase Cloud Functions, 2) not sure what would be the best ergonomics. Any suggestions here?
Ours currently runs with Cloud Tasks, which involves some cleanup handling if one run fails to enqueue the next.
Originally I had Cloud Scheduler running a heartbeat task every X mins, and the one of the heartbeat tasks was to look for any overdue scheduled tasks and fire them off. So they were not very precise in timing, but a very simple setup.
I made the move to Cloud Tasks so I could heartbeat less often. Now the cleanup happens in the heartbeat - ensure all scheduled tasks have a matching cloud task pending.
Feedback on PRs was an interesting challenge - since we can get it from Slack replies, Github comments, CI failures and we want to be fairly reactive. I ended up leaning on Firestore realtime queries, the harness on the agent VM is subscribed and can interrupt the agentic loop to feed in new feedback as it comes in.
All gets very complicated to OSS, but it has helped to get quicker feedback loops going.
> laptops left open just so tasks could keep running
Too real. We’re currently still sticking to local agent workflows which feel more powerful than cloud native ones. Moving that to your own cloud with no third-party control plane feels like the right middle ground. Nice work
EDIT: the adversarial two-agent review loop is really clever!
One persistent issue I keep having is preview environments for this kind of stuff. I have the full setup, migrations, database seeding, etc. But having it run off a PR is still kind of a mess with spinning up 2 services, databases, redis etc. Do you guys run into this problem?
We use firebase which supports the preview environments. It's mainly for front end changes though. Are you looking for a solution for backend changes as well?
Fair play for launching this, it looks like a neat project.
However I feel it will be an uphill battle competing with OpenAI and Anthropic, I doubt your harness can be better since they see so much traffic through theirs.
So this is for those who care about the harness running on their own infra? Not sure why anyone would since the LLM call means you are sending your code to the lab anyway.
Sorry I don’t want to sound negative, I am just trying to understand the market for this.
We are not trying to compete with OpenAI and Anthropic! We open source it because there's interest from other startups.
Teams would use Anthropic and OpenAI, but they shouldn't just use Anthropic or OpenAI. We see much better results from calling the models independently and do adversarial review and response.
This doesn't replace your need for the models, but you certainly don't need to rely on any of the cloud agent solutions out there that call these models underneath the hood.
Cool! We have a similar setup,connected to JIRA, but it stops at analysis and approach to solution. I'm taking inspiration from this now to take it to the next level!
I'd pay special attention to the harness that goes from plan to execute. We spent a lot of time ensuring this can produce high quality code that we feel good about in production instead of AI slop.
As for Jira, would love it if you contribute that integration to us! Someone asked for it in this thread :D
Yeah. We also use gitlab instead of github. I'll check this out later.
We also have set it up to work with multiple repos to truly understand context (we have frontend, backend, some tooling etc, an MCP server etc all in different repos).
Like the detailed setup instructions in the readme!
Also agree that teams should invest in their own harness (or maybe pedantically, build a system on top of harness likes Claude Code, Codex, Pi, or OpenCode)
It's a bit of trade-off. If we spin up a new container every time (which we do when we were using Google Cloud Run), we had to pay API pricing. However, with Blaxel, we can set containers to hibernate which also gives us the ability to use subscription
nice work! I built a similar system at my previous company. It was built on top of github. agent was triggered by the created issue, run in actions, save state in PR as hidden markdown.
It worked great but time to first token was slow and multi repo PRs took very long to create (30+ mins)
Now im working on my standalone implementation for cloud native agents
Why was the time to first token slow? Was it because of the spin up time for containers? That was an issue for us when we were running on Google's Cloud Run. We switched to Blaxel and it's much faster now. The hibernate feature has been great for comment iteration.
Tell me more about your workflow! For us, the workflow is, we'd assign the ticket to a bot user we create (broccoli in this case), and broccoli will go spin up a sandbox and do the execution. Do you trigger the task execution from Codex by giving it a linear ID? That was Broccoli v0 but of course still requires you to setup Codex with all the right keys.
Oh got it! In this case, the main difference is that we go through a flow from design to implement using our own prompts, and uses both Codex and Claude Code so they can improve off of each other.
I like everything about this idea but "one-shot". I work _with_ an LLM, but as Wozniak said, "Never trust a computer you can't throw out a window".
here's my [similar take](https://github.com/testeranto-dev/testeranto)
It's interesting that you’re using Linear tickets as the primary context source. From my experience so far, one of the biggest issues with coding agents is context drift. Ticket says one thing, but the codebase has changed since it was written. How did you solve? fresh RAG pass or use something like ctags to map the repo before it starts the implementation, or does it rely entirely on the LLM's provided context window?
We don't have this kind of tooling built up, but we've been using Linear as a source of truth alongside a human written prompt adding context. It's worked really well for both feature development and bug hunting, and helps keep us honest with tickets (good LLM context is still good human context) as well as maintain some level of consistency between LLM passes on the same issue.
100% on using linear as the source of truth! We went a bit further and just use Linear as the prompt. So there isn't a human written one adding context that lives elsewhere
This has been great for me too. Every feature starts and completes with /gh-issue <issue number>.
Every issue is created with /spec and a conversation with a human. Once the spec is materialized as an issue it’s sufficient for an agent to implement.
Everything is documented. It’s amazing.
We don’t believe PM or eng can write the best prompt or spec, so we don’t ask them to.
One real Linear ticket from a few months back that we assigned to broccoli:
Store post-processing run outcomes in a versioned, append-only audit trail so re-running the same processor on the same audio file produces a complete history (who/when/what changed), while keeping an easy “latest result” view. Add an admin-only UI.
That’s it. As a part of the sketch step, broccoli does its own repo discovery and online research before planning the execution.
Built similar for internal use at our work. Slack+JIRA though, not Linear. Otherwise GCP-native like this.
I didn't want to be on the hook for supporting an open source version though, so never made it public. Good on you for putting it out there.
A few differences I can quickly spot, fwiw...
I went with Firestore over Postgres for the lower cost, and use Cloud Tasks for "free" deduping of webhooks. Each webhooks is validated, translated, and created as an instant Cloud Task. They get deduped by ID.
We see a lot of value in a scheduler. So running a prompt on a schedule - good for things like status reports, or auto log reading/debug.
I prefer to put my PEMs in to KMS instead of Secret Manager. You can still sign things but without having to expose the actual private key where it can be snooped on.
I run the actual jobs on spot VMs using an image baked by Packer with all the tooling needed. You don't run in to time/resource limits running them as Cloud Run jobs?
Haha we are definitely like-minded because our internal Broccoli is actually on Firestore. That being said, Firestore is an acquired taste so we rewrote the OSS backend to Postgres so that everything can be deployed in one go with the infra that people are most familiar with.
Re: spot VMs. Great idea! There are two features we have not finished porting to OSS. Internally, we can specify the instance type and timeout, and we also send about 50% of jobs to Blaxel; we find it has a much better cold start compared to Cloud Run. We probably will port the multi-vendor support logic over to OSS soon but wanted to keep v1 simple (and a one-provider magic experience!).
Scheduler is a wish item for us. Curious how you implemented it? Currently, we just have a scheduled Cloud Function during the night to automatically address open PR comments (via the Broccoli GitHub feedback automation) so that the engineer wakes up to a mostly clean PR without needing to do anything. We haven't ported this to the OSS yet because 1) Firebase Cloud Functions, 2) not sure what would be the best ergonomics. Any suggestions here?
Ours currently runs with Cloud Tasks, which involves some cleanup handling if one run fails to enqueue the next.
Originally I had Cloud Scheduler running a heartbeat task every X mins, and the one of the heartbeat tasks was to look for any overdue scheduled tasks and fire them off. So they were not very precise in timing, but a very simple setup.
I made the move to Cloud Tasks so I could heartbeat less often. Now the cleanup happens in the heartbeat - ensure all scheduled tasks have a matching cloud task pending.
Feedback on PRs was an interesting challenge - since we can get it from Slack replies, Github comments, CI failures and we want to be fairly reactive. I ended up leaning on Firestore realtime queries, the harness on the agent VM is subscribed and can interrupt the agentic loop to feed in new feedback as it comes in. All gets very complicated to OSS, but it has helped to get quicker feedback loops going.
> laptops left open just so tasks could keep running
Too real. We’re currently still sticking to local agent workflows which feel more powerful than cloud native ones. Moving that to your own cloud with no third-party control plane feels like the right middle ground. Nice work
EDIT: the adversarial two-agent review loop is really clever!
Thank you! We found this approach for both plan critique and code review to be extremely effective
One persistent issue I keep having is preview environments for this kind of stuff. I have the full setup, migrations, database seeding, etc. But having it run off a PR is still kind of a mess with spinning up 2 services, databases, redis etc. Do you guys run into this problem?
We use firebase which supports the preview environments. It's mainly for front end changes though. Are you looking for a solution for backend changes as well?
Fair play for launching this, it looks like a neat project.
However I feel it will be an uphill battle competing with OpenAI and Anthropic, I doubt your harness can be better since they see so much traffic through theirs.
So this is for those who care about the harness running on their own infra? Not sure why anyone would since the LLM call means you are sending your code to the lab anyway.
Sorry I don’t want to sound negative, I am just trying to understand the market for this.
Good luck!
We are not trying to compete with OpenAI and Anthropic! We open source it because there's interest from other startups.
Teams would use Anthropic and OpenAI, but they shouldn't just use Anthropic or OpenAI. We see much better results from calling the models independently and do adversarial review and response.
This doesn't replace your need for the models, but you certainly don't need to rely on any of the cloud agent solutions out there that call these models underneath the hood.
Cool! We have a similar setup,connected to JIRA, but it stops at analysis and approach to solution. I'm taking inspiration from this now to take it to the next level!
I'd pay special attention to the harness that goes from plan to execute. We spent a lot of time ensuring this can produce high quality code that we feel good about in production instead of AI slop.
As for Jira, would love it if you contribute that integration to us! Someone asked for it in this thread :D
Yeah. We also use gitlab instead of github. I'll check this out later. We also have set it up to work with multiple repos to truly understand context (we have frontend, backend, some tooling etc, an MCP server etc all in different repos).
We also have a multi-repo setup, to trigger it you can just tag two repos in the Linear label!
How does this compare to using Claude Web with connectors to build the same feature?
On a separate note, READMEs written by AI are unpleasant to read. It would be great if they were written by a human for humans.
The main difference is that you have full control over this!
Like the detailed setup instructions in the readme!
Also agree that teams should invest in their own harness (or maybe pedantically, build a system on top of harness likes Claude Code, Codex, Pi, or OpenCode)
Yes! Broccoli is triggering Codex CLI and Claude Code CLI.
Does that mean you're using API pricing rather than subscription? Seems like itd get expensive very quickly for a small team.
It's a bit of trade-off. If we spin up a new container every time (which we do when we were using Google Cloud Run), we had to pay API pricing. However, with Blaxel, we can set containers to hibernate which also gives us the ability to use subscription
nice work! I built a similar system at my previous company. It was built on top of github. agent was triggered by the created issue, run in actions, save state in PR as hidden markdown.
It worked great but time to first token was slow and multi repo PRs took very long to create (30+ mins)
Now im working on my standalone implementation for cloud native agents
Why was the time to first token slow? Was it because of the spin up time for containers? That was an issue for us when we were running on Google's Cloud Run. We switched to Blaxel and it's much faster now. The hibernate feature has been great for comment iteration.
I use the Codex integration in Linear, can you tell me more about the differences please?
Tell me more about your workflow! For us, the workflow is, we'd assign the ticket to a bot user we create (broccoli in this case), and broccoli will go spin up a sandbox and do the execution. Do you trigger the task execution from Codex by giving it a linear ID? That was Broccoli v0 but of course still requires you to setup Codex with all the right keys.
They say it better than me: https://linear.app/integrations/codex
Oh got it! In this case, the main difference is that we go through a flow from design to implement using our own prompts, and uses both Codex and Claude Code so they can improve off of each other.
Thanks for making it open source, Jira Support would be good
Good point! Adding that to our list of to-dos - we don't use Jira but I guess it's still very popular!
this is exactly what I was looking for! can't wait to try it out
let us know if you have any feedback!