ell1e

1d ago

Why are there so many german communities on Lemmy?

🤫🤫🤫

1d ago

My Ultimate Self-hosting Setup

A rolling back mechanism is the best thing to have for server tweaks. I achieve the same with docker. Something similar might be possible with FreeBSD Jails, podman, or anything similar like that. (Not that NixOS is a bad choice, I just wanted to share some more options for anybody looking for some to try.)

1d ago

Why do people hate coldplay?

Jump

This is the most epic comment I've read on lemmy so far 😩👌

2d ago

I was wrong about robots.txt

Jump

Right, but the article does. Anyway, I'm moving on. Thanks for the discussion.

2d ago

Why are there so many german communities on Lemmy?

Jump

you made us proud!

2d ago

Why are there so many german communities on Lemmy?

Jump

oh, is it? 👀

2d ago

Why are there so many german communities on Lemmy?

Jump

sadly, data that is too centralized and easily available will always be abused at some point. the recent US developments are showing this nicely.

2d ago

Why are there so many german communities on Lemmy?

Jump

you deserve a trophy 🏆 🥰

2d ago

Why are there so many german communities on Lemmy?

Jump

not that the recent governments care, they want to centralize most of the data of citizens now with pretty poor protections in a lot of cases. sads

2d ago

Why are there so many german communities on Lemmy?

Jump

i may or may not be german as well 🫣

2d ago

Why are there so many german communities on Lemmy?

Jump

interestingly, most commenters here don't seem to be on .world 🤔

2d ago

Why are there so many german communities on Lemmy?

Jump

surprise germans 🫨

2d ago

I was wrong about robots.txt

Jump

But the article later does back it up: "Although Cloudflare singled out Google, other search engines that view AI search features as part of their search products also use the same bots for training as they do for search indexing."

In any case, I'm okay with admitting neither you nor me can look inside Google to see they're doing. But the claims are out there, I didn't make them up, whether they're true or not. Thank you for the certainly interesting Google crawler info link.

3d ago

I was wrong about robots.txt

Jump

You look up what Googlebot does. No AI.

The page seems written to perhaps suggest it but doesn't explicitly say the other bots can't feed into some other sort of AI training. It would be in Google's interest to mislead the users here.

Edit: I found a quote where it says Googlebot does both in one: "Google-Extended doesn't have a separate HTTP request user agent string. Crawling is done with existing Google user agent [...]" and I guess Cloudflare doesn't trust Google to abide by the access controls. That seems sensible to me. Edit 2: What exactly the CEO believes was perhaps rightfully disputed below, it was just my guess.

3d ago

I was wrong about robots.txt

Jump

Nothing on this page seems to contradict the article. But if I simply missed the part that does, I'd be happy to learn.

3d ago

I was wrong about robots.txt

Jump

So what's the quote from the documentation that backs up your claim? The line "perform other product specific crawls" seems extremely vague by design.

3d ago

I was wrong about robots.txt

Jump

And allowing the public crawler might also have it feed their AI: https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/

3d ago

I was wrong about robots.txt

Jump

See here: https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/ If you have a source that says it's false, I'd be curious.

3d ago

I was wrong about robots.txt

Jump

Often it is respected, but the resulting problem is platforms conflate things with the questionable AI scraping crawlers to blackmail websites into participating in feeding AI.

For example, Googlebot if enabled won't just list you for search, but will also scrape your contents for Google's AI. Edit: see https://arstechnica.com/tech-policy/2025/07/cloudflare-wants-google-to-change-its-ai-search-crawling-google-likely-wont/ as source. I imagine LinkedinBot, given it's microsoft, will feed some other AI of theirs as well on top of the previews.

Until regulation steps in to require AI bots to separately ask for crawling permission, or to actually get a proper license for reuse of the contents, this situation isn't going to improve.