Putting Gemini to Work in Chrome

(blog.google)

51 points | by diwank 2 days ago ago

62 comments

  • ed_mercer 2 days ago ago

    > We’re also bringing the creative power of Nano Banana directly into Chrome, allowing you to transform images on the fly without needing to download and re-upload images or open another tab.

    Are there really people who are like "Man, if only I could this straight in Chrome" ? Is this something worth bloating a browser (further) with?

    • xnx a day ago ago

      Yes. AI is the next abstraction layer. By analogy, instead of wielding the wrench (application) yourself, you have a plumber (AI) fix your pipes. The browser is just a rich interface to that AI.

    • sssilver 2 days ago ago

      I think Google PMs would respond by saying that nobody in 2007 was like "Man, if only I could get rid of the physical keyboard on my smartphone".

      You know, the whole "faster horse" argument.

      • coffeefirst a day ago ago

        Yeah but I asked for a faster horse and they keep showing up with a two-headed walrus.

    • pjmlp 15 hours ago ago

      Yes, Google's shareholders.

    • xboxnolifes 2 days ago ago

      Honestly? Yeah. Not really Gemini or AI chat bot stuff, but I definitely appreciate when the software I use that interacts with images includes at least some minor drawing/highlighting/text overlay capabilities. So, while I don't like this feature in particular, I'm not against the general idea.

  • dz0ny 2 days ago ago

    Only PMs at Google need this. They still don't get how AI is used...

    • pjmlp 15 hours ago ago

      Still better than Microsoft though, Android is still not a Windows 11 kind of AI everywhere without roadmap mess.

  • ukuina 2 days ago ago

    This is a big deal, the highlight is Chrome autobrowse. Goes head-to-head with OpenAI Atlas.

    • ares623 2 days ago ago

      “Head to head” doing a lot of heavy lifting. How much market share does Atlas have I wonder.

      • tokioyoyo 2 days ago ago

        I actually think it has already been abandoned after very little adoption.

        • slfreference 2 days ago ago

          dont spread yourself too thin;or else river will dry; pick your battles.

  • hackyhacky 2 days ago ago

    Serious question, not snark: Does anyone actually want this? I honestly can't imagine a use for these features even among people tech savvy enough to understand them.

    • terhechte 2 days ago ago

      I like doing side projects, I don't like wasting a day of work potential on any of these web apps: Google Cloud, AWS, Azure, Appstore Connect, Google's Android App Store, RevenueCat, Stripe, etc

      I dread having to log in to these systems and waste hours achieving the simplest tasks.

      This is what I'm using Claude for. E.g. I log in to AppStore connect, tell it what I need (3 subscription tiers), it will do all the clicking and editing and Apple's stupid UI, then I will ask it to create a summary for RevenueCat, and use another Claude session in there to click all the buttons to configure based on what just happened in Appstore connect.

      Or configuring S3 buckets or whatnot.

      • gergo_b 2 days ago ago

        Could you detail this a bit please

    • walletdrainer 2 days ago ago

      I use AI-controlled browsers for everyday tasks like “Find me a Michelin starred restaurant in Paris with availability for 4 people at 8PM today”.

      Also for things like locating a product available to pick up in a nearby store, it’s crazy how often Google fails at this particular task.

      • cguess a day ago ago

        I mean, if you don't care at all about cuisine, style, location etc. I guess? Searching is half the fun of something like that.

        Anyways I will (continue to) not touch Chrome with a 1000ft pole after this. AI is awful in almost every aspect I've ever tasked it to.

        • walletdrainer a day ago ago

          It’ll give me a list of what’s available, the searching process isn’t made any more fun by including restaurants which will be a pain in the ass to book for a given date.

        • xnx a day ago ago

          > I mean, if you don't care at all about cuisine, style, location etc. I guess?

          You can always ask for a list and still make the final decision yourself.

    • lmm 2 days ago ago

      Having a browser that works for me would be useful yeah. Stuff like skip the story and give me the recipe, or click through the pointless extra steps, or reformat my address into the bizarre format the website wants.

      Of course the single biggest thing my browser can do to help me is blocking ads, which means it's curious to see this just after Google killed adblock in chrome.

      • LiamPowell 2 days ago ago

        Adblock continues to be just as effective as it ever was in Chrome.

        Even before the removal of MV2, the claims that it would kill adblock were ridiculous as many adblockers had already switched to MV3 but it was at least understandable that people could be ignorant of that fact. Now that everything is on MV3 how can people still be claiming that Google killed adblock when Chrome users still have working adblockers?

    • TheCapeGreek 2 days ago ago

      I've seen at least one decent use case from "normies" around me: Bypassing stupid company processes to achieve actual automated productivity in your rote processes instead of the theatre of it.

      Sounds like a contrived situation, but there's a surprising amount of "thought leader" CEOs out there who make completely nonsensical decisions under the banner of "saving costs and automating things".

      (Real-world example I know of) company pays for cheapest tier they can find of Gemini, tell everyone to use it. But won't pay for Asana seats, so every user in your 100-person startup is a guest, and can't use the connector in any AI app to TRULY do useful task management with AI.

      Having some better access to AI in the browser would pave over that pain for someone who currently doesn't want to spend their own money on something like Claude for Cowork and the Chrome extension to drive the browser, or open a terminal to have Claude Code do it.

    • stingraycharles 2 days ago ago

      I like having AI in my browser, I use Claude quite a bit.

      Examples: using my budgeting app directly to figure out why some forecasting event went wrong, or helping me correlate SOC2 tickets with GitHub pull requests and flagging all that are older than $date.

      It’s surprisingly convenient for a narrow set of tasks.

    • saidinesh5 2 days ago ago

      In one of the previous companies i worked at, we were automating a very valid use case of a bunch of people crawling though a set of urls daily/weekly and find the pdfs and summarise the changes from the previous week. I'm guessing these features are geared towards them.

    • raincole 2 days ago ago

      If it skips ads many people would want it. But it's made by Google so I suspect it's goal is to skip non-Google ads only.

    • mogili1 2 days ago ago

      I use claude's chrome plugin all the time. As well Chatgpt's agent mode. I prefer Agent mode when I don't need to login but want it to do search.

      However, Gemini in Chrome requires you to allow them to use your data to improve their model, which I won't consent to. Google workspace account seems exempt so I plan to try it out there.

    • captain_coffee 2 days ago ago

      Me personally: absolutely not - and I fundamentally do not understand the need for something like this. I would never use such a tool under any possible circumstance knowing what I know about the current technology underpinning these clankers.

      These feels on par with Microsoft's push to shove Copilot down everyone's neck at every step possible whether we like/need it or not

    • drusepth 2 days ago ago

      Right now I paste screenshots of AWS/Azure/GCP into Claude and ask it questions on how to navigate around / what to do / how to set things up. This seems like a much better experience solely to not have to deal with the weird mac screenshot UX.

    • Havoc 2 days ago ago

      Google shareholders want it. If search collapses they need a new stupidly profitable golden goose else Google has a giant hole in their finances

  • MrAlex94 2 days ago ago

    Am I being too cynical, or does anyone else envision a future where you ask Chrome to buy you something, anything, online and instead of it actually buying you the “best” item, you end up with items it “prefers” where Google make money from suggestions and/or completion of sale?

    I know it calls out that there’ll need to be user confirmation before the final purchase, but if you’re already not expending the effort to find the product or service yourself, are you really going to sit and research what it’s given you? If you are, then what’s the point of using the agent?

    Just seems like the next evolution in Google’s ad revenue generation.

  • testycool 2 days ago ago

    I do hope we get vertical tabs before that, though.

    • samhh 2 days ago ago

      It’s already available behind a flag IIRC.

  • bob1029 2 days ago ago

    https://blog.google/products/ads-commerce/agentic-commerce-a...

    I think the vision here is for your browser agent to spend money with google's partners on pointless consumer slop while you sleep.

  • tmaly a day ago ago

    I was trying to use Gemini connected apps with YouTube to get details about my channels performance. It could not do it, I instead had to settle for what it could see publicly on my channel.

  • fpauser 2 days ago ago

    > [..] the "creative power" of Nano Banana [..] This “creative power” is used everywhere to produce fake news and spread lies...

  • omnifischer 2 days ago ago

    I wish this executive (author of that post) https://xcancel.com/laparisa?lang=en will show their browser in REAL LIFE everyday use. Really do they use it?

    • stingraycharles 2 days ago ago

      Looks like they may be using it to write their tweets?

      • throwaway2056 2 days ago ago

        In other words... general people don't need it.

  • fpauser 2 days ago ago

    One more reason not to use chrome. I don't want this AI bloat. And for sure I don't want to redirect my personal website content through googles privacy sucking data pipelines.

    What I just discovered: The good old google search without AI bloat but with privacy via https://www.startpage.com. Highly recommended!

  • firefoxd 2 days ago ago

    I must be using web browsers completely wrong. Like browsing a page isn't a problem for me. I can do it at the speed of my needs.

    I'm having a hard time understanding why I will tell gemini to create an account on some website for me or send an email. Those are usually just a tab away. That's why I feel like I'm missing something here.

    • jsnell 2 days ago ago

      Basically none of their examples are just "browse a page"? They're multi-step tasks combining data from multiple pages.

      Like the first example in the demo carousel (the Y2K party) starts from a photo and a prompt of roughly "buy the props needed for replicating this photo from Etsy". It first analyzes the image in the current tab, identifies a bunch of things to buy, searches for them on Etsy, customizes the orders, adds them to the shopping basket, and then asks for a confirmation to actually send an order.

      The second one auto-fills a form with a couple of dozen fields from the data that's in a pdf in another tab. (And in the fiction of a demo, presumably a pdf that's you already had around, not one that you made just for the purposes of using it to auto-fill the form.)

      I'm not the target market for this: automating a browser with my credentials is just too scary, but I can certainly see the utility. There's a huge amount of tasks taking a minute or two are not worth creating bespoke automation for but that are also pretty mechanical processes.

      • coffeefirst a day ago ago

        Maybe I’m a curmudgeon who can’t imagine throwing an elaborate Y2K party because all my friends were alive and threw parties at the real Y2K, but… these all feel extremely contrived.

        It’s as if they used AI to generate use cases for their AI tool because they weren’t really sure what it’s for…

        • xnx a day ago ago

          Do you ever have a project that requires research and comparison? This can automate that.

          • coffeefirst a day ago ago

            Yeah but that's what I'm already using regular AI powered search for.

            I suppose by being in the browser it can private and paywalled data, so maybe that's something.

            • xnx a day ago ago

              Exactly. I think I'd use it for hotel price search where you usually don't get the real price until deep in the checkout process.

    • bandrami 2 days ago ago

      I feel that way about IDEs too, though. My text editor has snippets, my file manager shows me what files are where, and my terminal lets me run programs. Why it's important to people that these functions to be grafted into a single window escapes me.

      • newdee 2 days ago ago

        This is satire, right?

        • bandrami 2 days ago ago

          No. Why would you think that's satire?

    • lmm 2 days ago ago

      Maybe you're only using well-designed sites? Try making a booking with a Chinese airline and you'll quickly wish for an assistant to delegate it all to.

      • dzjkb 2 days ago ago

        funny you say that, I was literally just booking a flight with air china yesterday and the UX was 10x better than the average wizzair/ryanair experience - a clear, readable UI (with a great table comparison of prices +-3 days from the selected dates), no ads, no random services getting pushed in your face, no booking tabs automatically opening in the background

        • lmm 20 hours ago ago

          Huh. Last time I tried with them (about a year ago), and more recently trying with China Eastern, I couldn't even get it to show me a flight that I knew was flying on a given day (just at a slightly higher price than the one it would show me).

      • shakna 2 days ago ago

        If you struggle, then an agent will probably fail.

        • lmm a day ago ago

          I know exactly what to do, it's just very tedious to actually do it. Which seems like the perfect use case for an agent.

          • shakna 19 hours ago ago

            Tedium often means a large context window. Lots of personal information to be entered, in different formats, that must be exactly right.

            Thats exactly what an agent regularly fails at.

        • ares623 2 days ago ago

          Will it matter if you can’t tell?

          • samrus 2 days ago ago

            Yeah. Because you'll think you have a flight to beijing when you dont

            • ares623 2 days ago ago

              Oh yeah that bit lol

    • wolvoleo 2 days ago ago

      Yes. I like it for deep research, that kind of thing where I'd be wading though clickbait search results for hours.

      But for regular browsing? I don't see the point.

  • sluongng 2 days ago ago

    Not yet in Linux?

  • 7bit 2 days ago ago

    All I ever wanted my entire life. I feel whole again. Thank you Googool

  • rs_rs_rs_rs_rs 2 days ago ago

    Gee thanks, now I have a big Ask Google buttong in the url bar but only on Youtube for some reason, how can I disable it? Could not figure how to disable like the others.

  • __loam 2 days ago ago

    Allowing anyone to edit someone else's images from the browser with an AI model is deeply evil stuff.