SolidStart - Hacker News

bawolff a day ago ago

> I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

Guess it wasn't so endless after all.

Author is assuming malice, but honestly bots clicking links is just what happens to every public site on the internet. Not to mention going down the link clicking rabbit hole is common among wikipedia readers.

All that said, i don't really see the point. Wikipedia's human controls is what makes it exciting.

[-]

haileys a day ago ago

It’s a poetic end, considering that the very same scraping activity without regard for cost to site operators is how these models are trained to begin with.

kristianp a day ago ago

New page generation has been re-enabled, with a rate limit and "using openai/gpt-oss-120b instead of Kimi-K2".

dpark a day ago ago

> but honestly bots clicking links is just what happens to every public site on the internet.

As a CS student ~20 years ago I wrote a small website to manage my todo list and hosted it on my desktop in the department. One day I found my items disappearing before my eyes. At first I assumed someone was intentionally messing with my app but logs indicated it was just a scraping bot someone was running.

It was a low stakes lesson on why GET should not mutate meaningful state. I knew when I built it anyone could click the links and I wasn’t bothered with auth since it was one accessible from within the department network. But I didn’t plan for the bots.

[-]

vunderba a day ago ago

Reminds me of the Spider of Doom which was a similar issue where "Get/Delete" links were hidden by simple javascript to see if the user was logged in. All of a sudden pages and content on the website began to mysteriously vanish.

You know what doesn't care about Javascript and tries to click every link on your page? A search engine's web crawler.

https://thedailywtf.com/articles/The_Spider_of_Doom

userbinator a day ago ago

Google and all the other search engines will crawl any public site too.

leobg 13 hours ago ago

Would have been ironic if it was the crawler from OpenAI… :)

blourvim a day ago ago

more clicks means a bigger wiki which I guess should be the point, unless the generated articles lead to nonsensical strings which sucks, but should be reasonable to prevent

UltraSane a day ago ago

You should always have per-IP rate limiting.

000ooo000 a day ago ago

>I’m not worried about one power user costing me a lot of money in inference

>edit: I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70.

kristianp a day ago ago

I noticed it isn't that eager to generate links, for example the game names "Virtua Fighter" and "Daytona USA" are italicized, but not links in https://www.endlesswiki.com/wiki/sega_studio_tokyo

avinashsonee 11 hours ago ago

https://infinipedia.ai/

AaronAPU 17 hours ago ago

This was literally the first idea I had at the initial GPT release. Prototyped it in about 30 minutes and then thought “bots will obviously just destroy this” and discarded it.

kiriberty a day ago ago

This is a slippery slope to hallucinated hell

[-]

visarga a day ago ago

I would use Deep Research mode outputs. Sometimes I run multiple of these in parallel on different models, then compare between them to catch hallucinations. If I wanted to publish that, I would also doublecheck each citation link.

I think the idea is sound, the potential is to have a much larger AI-wikipedia than the human one. Can it cover all known entities, events, concepts and places? All scientific publications? It could get 1000x larger than Wikipedia and be a good pre-training source of text.

Covering a topic I would not make the AI agent try to find the "Truth" but just to analyze the distribution of information out there. What are the opinions, who has them? I would also test a host of models in closed book mode and put an analysis of how AI covers the topic on its own, it is useful information to have.

This method has the potential to create much higher quality text than usual internet scrape, in large quantities. It would be comparative analysis text connecting across many sources, which would be better for the model than training on separate pieces of text. Information needs to circulate to be understood better.

dcreater a day ago ago

So will this end up being part of the training dataset for future LLMs?

hliyan a day ago ago

Wouldn't this be better as a browser extension where the user can highlight some text on it and have it explained, like these: https://chromewebstore.google.com/search/ai%20explain?filter...

j_juggernaut a day ago ago

Solved the Neon Genesis Evangelion challenge using Chatgpt Agents, take a look

blourvim a day ago ago

I wonder if first link chain here also would lead to "Philosophy"

indigodaddy a day ago ago

I'm trying to link to Philip Glass. This could take a while. Kinda fun and a bit reminiscint of googlewhacking or maybe the LLM equivalent of Six Degrees of Kevin Bacon, but it's gonna be way more than six to get to Philip Glass.

Edit, well shit looks like there is a Minimalism page, but it didn't make any names clickable. Sean, looks like you need to tweak the code a bit?

https://www.endlesswiki.com/wiki/minimalism

_def a day ago ago

Huh i found a dead end, 404

tehjoker a day ago ago

interesting idea, but while it is sold as a way to interact with the knowledge in a model, i suspect the rabbit hole effect and the most tantalizing information in it will be subtlety hallucinated. an efficient delivery vehicle for “computer madness”

oidar a day ago ago

hugged

Endless AI-Generated Wikipedia