Endless AI-Generated Wikipedia

(seangoedecke.com)

26 points | by Twixes 3 days ago ago

23 comments

  • bawolff a day ago ago

    > I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

    Guess it wasn't so endless after all.

    Author is assuming malice, but honestly bots clicking links is just what happens to every public site on the internet. Not to mention going down the link clicking rabbit hole is common among wikipedia readers.

    All that said, i don't really see the point. Wikipedia's human controls is what makes it exciting.

    • haileys a day ago ago

      It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.

    • kristianp a day ago ago

      New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".

    • dpark a day ago ago

      > but honestly bots clicking links is just what happens to every public site on the internet.

      As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.

      It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.

      • vunderba a day ago ago

        Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.

        You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.

        https://thedailywtf.com/articles/The_Spider_of_Doom

    • userbinator a day ago ago

      Google and all the other search engines will crawl any public site too.

    • leobg 13 hours ago ago

      Would have been ironic if it was the crawler from OpenAI… :)

    • blourvim a day ago ago

      more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent

    • UltraSane a day ago ago

      You should always have per-IP rate limiting.

  • 000ooo000 a day ago ago

    >I’m not worried about one power user costing me a lot of money in inference

    >edit: I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70.

  • kristianp a day ago ago

    I noticed it isn't that eager to generate links, for example the game names "Virtua Fighter" and "Daytona USA" are italicized, but not links in https://www.endlesswiki.com/wiki/sega_studio_tokyo

  • avinashsonee 11 hours ago ago
  • AaronAPU 17 hours ago ago

    This was literally the first idea I had at the initial GPT release. Prototyped it in about 30 minutes and then thought “bots will obviously just destroy this” and discarded it.

  • kiriberty a day ago ago

    This is a slippery slope to hallucinated hell

    • visarga a day ago ago

      I would use Deep Research mode outputs. Sometimes I run multiple of these in parallel on different models, then compare between them to catch hallucinations. If I wanted to publish that, I would also doublecheck each citation link.

      I think the idea is sound, the potential is to have a much larger AI-wikipedia than the human one. Can it cover all known entities, events, concepts and places? All scientific publications? It could get 1000x larger than Wikipedia and be a good pre-training source of text.

      Covering a topic I would not make the AI agent try to find the "Truth" but just to analyze the distribution of information out there. What are the opinions, who has them? I would also test a host of models in closed book mode and put an analysis of how AI covers the topic on its own, it is useful information to have.

      This method has the potential to create much higher quality text than usual internet scrape, in large quantities. It would be comparative analysis text connecting across many sources, which would be better for the model than training on separate pieces of text. Information needs to circulate to be understood better.

  • dcreater a day ago ago

    So will this end up being part of the training dataset for future LLMs?

  • hliyan a day ago ago

    Wouldn't this be better as a browser extension where the user can highlight some text on it and have it explained, like these: https://chromewebstore.google.com/search/ai%20explain?filter...

  • j_juggernaut a day ago ago

    Solved the Neon Genesis Evangelion challenge using Chatgpt Agents, take a look

  • blourvim a day ago ago

    I wonder if first link chain here also would lead to "Philosophy"

  • indigodaddy a day ago ago

    I'm trying to link to Philip Glass. This could take a while. Kinda fun and a bit reminiscint of googlewhacking or maybe the LLM equivalent of Six Degrees of Kevin Bacon, but it's gonna be way more than six to get to Philip Glass.

    Edit, well shit looks like there is a Minimalism page, but it didn't make any names clickable. Sean, looks like you need to tweak the code a bit?

    https://www.endlesswiki.com/wiki/minimalism

  • _def a day ago ago

    Huh i found a dead end, 404

  • tehjoker a day ago ago

    interesting idea, but while it is sold as a way to interact with the knowledge in a model, i suspect the rabbit hole effect and the most tantalizing information in it will be subtlety hallucinated. an efficient delivery vehicle for “computer madness”

  • oidar a day ago ago

    hugged