How I block all 26M of your curl requests

(foxmoss.com)

42 points | by foxmoss 2 days ago ago

8 comments

  • seba_dos1 2 days ago ago

    > with tools like Anubis being largely ineffective

    To the contrary - if someone "bypasses" Anubis by setting the user agent to Googlebot (or curl), it means it's effective. Every Anubis installation I've been involved with so far explicitly allowed curl. If you think it's counterproductive, you probably just don't understand why it's there in the first place.

    • jgalt212 a day ago ago

      If you're installing Anubis, why are you setting it to allow curl to bypass?

      • seba_dos1 a day ago ago

        The problem you usually attempt to alleviate by using Anubis is that you get hit by load generated by aggressive AI scrappers that are otherwise indistinguishable from real users. As soon as the bot is polite enough to identify as some kind of a bot, the problem's gone, as you can apply your regular measures for rate limiting and access control now.

        (yes, there are also people who use it as an anti-AI statement, but that's not the reason why it's used on the most high-profile installations out there)

  • keanb 2 days ago ago

    Those bots would be really naive not to use curl-impersonate. I basically use it for any request I make even if I don’t expect to be blocked because why wouldn’t I.

    • f4uCL9dNSnQm 2 days ago ago

      There are plenty of naive bots. That is why tar pits work so great at trapping them in. And this TLS based detection looks just like offline/broken site to bots, it will be harder to spot unless you are trying to scrap only that one single site.

    • _boffin_ a day ago ago

      I heard about curl-impersonate yesterday when I was hitting a CF page. Did something else to completely bypass it, which has been successful, but should try this.

  • unwind 2 days ago ago

    I got exactly this far:

        uint8_t *data = (void *)(long)ctx->data;
    
    before I stopped reading. I had to go look up the struct xdp_md [1], it is declared like this:

        struct xdp_md {
            __u32 data;
            __u32 data_end;
            __u32 data_meta;
            /* ... further fields elided ... */
        };
    
    So clearly the `data` member is already an integer. The sane way to cast it would be to cast to the actual desired destination type, rather than first to some other random integer and then to a `void` pointer.

    Like so:

        uint8_t * const data = (uint8_t *) ctx->data;
    
    I added the `const` since the pointer value is not supposed to change, since we got it from the incoming structure. Note that that `const` does not mean we can't write to `data` if we feel like it, it means the base pointer itself can't change, we can't "re-point" the pointer. This is often a nice property, of course.

    [1]: https://elixir.bootlin.com/linux/v6.17/source/include/uapi/l...

    • ziml77 16 hours ago ago

      Your code emits a compiler warning about casting an integer to a pointer. Changing the cast to void* emits a slightly different warning about the size of integer being cast to a pointer being smaller than the pointer type. Casting to a long and then a void* avoids both of these warnings.