Not if you use NORD VPN ™, fellow human. NORD VPN ™ guarantees filtering of astroturfing comments made by LLMs! Thats right - NORD VPN ™ does the following:
For it to happen in the Fediverse AI would have to be training on the Fediverse.
That’s what this post is about. Using reddit to plant comments that AI trains on, and subsequently getting AI to spit out your answer to questions it’s asked.
As such this can happen anywhere where AI is being trained. The issue is with how AI is training, not with how websites it trains on are being operated.
There isn’t really any reason to think the Fediverse won’t be used for AI training, if it isn’t already. Everything is in the open here, it’s easy enough to scrape all the data.
I’m looking for a network and/or internet with strong authentication which is open for unique human users only. Sure, bots could still use someone’s credentials but at least their scale & impact would be limited.
strong authentication which is open for unique human users only
Unless you completely ditch anonymity, this can only turn into a state captured propoganda platform. Whoever controls access/auth will have the keys to the content.
If you’ve any suggestion on how to implement that, then it’s a million-dollar idea.
The “I’m a human” test that only takes a few seconds and then lets you do what you like for an hour was always vulnerable to ‘auth farms’. Pay some poor bastards in the third world a pittance to pass the test a thousand times an hour, let the bots run wild. And the bots have gained the ability to pass the tests themselves, at least by boiling the oceans in some datacentre while the VC money holds out.
Finding the people running the bots, fitting them with some very heavy boots and then seeing if they can swim in the deep ocean is probably needlessly cruel, but I’d be up for tarring and feathering a few. Once the videos got out, the rest might think harder about their life choices…
Most of this work is done via emulators if you have a hardware attestation process built in, it will stop most of it. Obviously you still have a problem with phone farms but those are much more expensive than emulators and there’s a physical capacity to them.
People downvote me when I say it. That’s all cope. We’re not wrong; if and when this goes mainstream, it’ll attract the same bad actors just as heavily.
Of course, there are surely already a few here testing the waters.
I agree the Fediverse is not in a good spot for something like this.
I actually think this is where the identity system of atproto will be more impactful here as it allows a better verification system. I’ve been thinking lately you should be able to use hardware attestation + biometric attestation on apps to filter these emulated users out.
I mean there’s nothing preventing them for doing the same thing here. But if we could get a more even split of users between instances it would arguably be harder for them to pull the same thing because a) the admins can intervene and ban those accounts because the admins are not corporate slaves, unless they are in which case b) other instances can just ban the instance that is letting corporations go wild. We’ve already seen that level of “moderation” with Lemmygrad being ostracized from the wider Lemmy/Piefed ecosystem. It wouldn’t work with a disproportionate instances because defederating lemmy.world would be a massive hit on users feeds and the higher user count would make it harder to moderate against these actions.
It’s going to require more work from mods and admins, but I imagine we’ll fare better than Reddit. After-all Reddit has an incentive to support this kind of behavior.
There’s no algorithm to be played in the fediverse. The reward is too low for all the work of making a post visible, and it won’t carry to the next post, essentially starting all over again.
There’s no algorithm to be played in the fediverse.
There presumably is. Some metric decides visibility on the feeds. That algorithm not being based on corporate profitability doesn’t mean it doesn’t exist.
In fact, it doesn’t matter if there’s no internal incentive. If it’s being indexed and shows in searches it will have all the incentive needed to maximise SEO for profit.
I think there are some differences that make the fediverse more resilient to this. For example, the absence of cumulative account karma keeps out the reddit style karma farming. The ability to ban whole instances also makes it easier to kick out bad actors. Instance admins could also implement their own rules like switching to an invite based system to reduce bot spam. Also it seems to me that reddit is actively encouraging this kind behaviour to inflate their user statistics and there is no incentive to tolerate this kind of spam for a fediverse server admin.
Yet you didn’t respond to the point that makes the difference:
reddit is actively encouraging this kind behaviour to inflate their user statistics and there is no incentive to tolerate this kind of spam for a fediverse server admin
I consider internet dead at this point. I hang out on the outer edge in niche low population servers like the fediverse that the worst humans mostly ignore because spamming here isn’t profitable and manipulating politics doesn’t gain much for their efforts at the moment.
It infuriates me to no end almost everyday I get new plasticized paper ads in my snail mail. Like yeah keep destroying the planet in hopes of selling more junk to destroy the planet with. And my city made it mandatory ! They are not allowed to skip a house.
I wonder if we could make only the sign-up page of Lemmy and Piefed public to the internet, and the rest only accessible through login and verification of being actually bloody human? Could use anti-scraping measures…
I don’t need one of those stupid ID verifications. Something else should be that instead, but what, I do not know. Whatever helps counter AI scraping and preserves anonymity.
Yes PieFed has a setting for that. It makes scrapers give up pretty fast but ruins the experience for people without an account so I only use it on really bad days.
If the idea of a healthy Fediverse requires people moving instances whenever one finds themselves close to bottom-feeders and opportunistic parasites, we already lost.
I see your point, though for me it’s not so much the requirement of moving inasmuch it’s the ease of doing so.
With traditional social media, you’d need to move entirely to another social media platform while you might not even be able to enjoy similar content. With lemmy&piefed, you can do that.
That’s the job of the web server, not of the application that runs on it.
There is already software you can get that feeds a never-ending maze of text to AI scrapers, some of which is AI generated and/or designed to poison LLM training. The problem is that these still use up a ton of bandwidth.
A never-ending maze would mean the scrapers just hammer our servers forever. Better to lead them into a honeypot and automatically ban their IP. Like PieFed does.
There are a lot of strategies. afaik a tar pit tries to waste the attacker’s resources by delaying our responses to their traffic? A honey pot tries to funnel bot traffic towards a place which only bots would go to. Once they go there you know they’re a bot and they can be banned.
It’ll happen in the Fediverse too.
I know fediverse. Weds users and visibility but with all happening I wish it was disable to every crawler/bot
Not if you use NORD VPN ™, fellow human. NORD VPN ™ guarantees filtering of astroturfing comments made by LLMs! Thats right - NORD VPN ™ does the following:
For it to happen in the Fediverse AI would have to be training on the Fediverse.
That’s what this post is about. Using reddit to plant comments that AI trains on, and subsequently getting AI to spit out your answer to questions it’s asked.
As such this can happen anywhere where AI is being trained. The issue is with how AI is training, not with how websites it trains on are being operated.
There isn’t really any reason to think the Fediverse won’t be used for AI training, if it isn’t already. Everything is in the open here, it’s easy enough to scrape all the data.
I’m looking for a network and/or internet with strong authentication which is open for unique human users only. Sure, bots could still use someone’s credentials but at least their scale & impact would be limited.
Unless you completely ditch anonymity, this can only turn into a state captured propoganda platform. Whoever controls access/auth will have the keys to the content.
If you’ve any suggestion on how to implement that, then it’s a million-dollar idea.
The “I’m a human” test that only takes a few seconds and then lets you do what you like for an hour was always vulnerable to ‘auth farms’. Pay some poor bastards in the third world a pittance to pass the test a thousand times an hour, let the bots run wild. And the bots have gained the ability to pass the tests themselves, at least by boiling the oceans in some datacentre while the VC money holds out.
Finding the people running the bots, fitting them with some very heavy boots and then seeing if they can swim in the deep ocean is probably needlessly cruel, but I’d be up for tarring and feathering a few. Once the videos got out, the rest might think harder about their life choices…
Most of this work is done via emulators if you have a hardware attestation process built in, it will stop most of it. Obviously you still have a problem with phone farms but those are much more expensive than emulators and there’s a physical capacity to them.
People downvote me when I say it. That’s all cope. We’re not wrong; if and when this goes mainstream, it’ll attract the same bad actors just as heavily.
Of course, there are surely already a few here testing the waters.
I agree the Fediverse is not in a good spot for something like this.
I actually think this is where the identity system of atproto will be more impactful here as it allows a better verification system. I’ve been thinking lately you should be able to use hardware attestation + biometric attestation on apps to filter these emulated users out.
I mean there’s nothing preventing them for doing the same thing here. But if we could get a more even split of users between instances it would arguably be harder for them to pull the same thing because a) the admins can intervene and ban those accounts because the admins are not corporate slaves, unless they are in which case b) other instances can just ban the instance that is letting corporations go wild. We’ve already seen that level of “moderation” with Lemmygrad being ostracized from the wider Lemmy/Piefed ecosystem. It wouldn’t work with a disproportionate instances because defederating lemmy.world would be a massive hit on users feeds and the higher user count would make it harder to moderate against these actions.
It’s going to require more work from mods and admins, but I imagine we’ll fare better than Reddit. After-all Reddit has an incentive to support this kind of behavior.
There’s no algorithm to be played in the fediverse. The reward is too low for all the work of making a post visible, and it won’t carry to the next post, essentially starting all over again.
The Hot ordering is itself an algorithm.
There presumably is. Some metric decides visibility on the feeds. That algorithm not being based on corporate profitability doesn’t mean it doesn’t exist.
In fact, it doesn’t matter if there’s no internal incentive. If it’s being indexed and shows in searches it will have all the incentive needed to maximise SEO for profit.
Google will never rank fediverse posts high. Said otherwise, the external incentives are not there either.
I think there are some differences that make the fediverse more resilient to this. For example, the absence of cumulative account karma keeps out the reddit style karma farming. The ability to ban whole instances also makes it easier to kick out bad actors. Instance admins could also implement their own rules like switching to an invite based system to reduce bot spam. Also it seems to me that reddit is actively encouraging this kind behaviour to inflate their user statistics and there is no incentive to tolerate this kind of spam for a fediverse server admin.
karma is meaningless to seo outside of account restrictions. the people doing this as a job aren’t doing it for imaginary internet points
it doesn’t matter what individual instances do as long as the largest ones have open signups
Yet you didn’t respond to the point that makes the difference:
We are all rats. When this ship sinks, we will float to the next, or (decide to) drop off.
All things considered, how much would actually be lost?
The alternative being…
Squeak
This comment made me realise that the internet could have been born, lived, and died, within my lifetime.
Thank you for the strange compliment.
I consider internet dead at this point. I hang out on the outer edge in niche low population servers like the fediverse that the worst humans mostly ignore because spamming here isn’t profitable and manipulating politics doesn’t gain much for their efforts at the moment.
Like phone calls, and texting, bad actors ruin everything that they touch.
Even snail mail.
It infuriates me to no end almost everyday I get new plasticized paper ads in my snail mail. Like yeah keep destroying the planet in hopes of selling more junk to destroy the planet with. And my city made it mandatory ! They are not allowed to skip a house.
Idk their new album sounds pretty good
Good actors too, it’s the nature of capitalism.
Who is the ‘good’ actor in this ‘capitalism’ thing?
It’s more of a technical concept within the lore than an actual character.
And when that happens, we move instances.
I wonder if we could make only the sign-up page of Lemmy and Piefed public to the internet, and the rest only accessible through login and verification of being actually bloody human? Could use anti-scraping measures…
Yes. If you can’t fight the death of the www, embrace it! Help making it happen!
/s
I don’t need one of those stupid ID verifications. Something else should be that instead, but what, I do not know. Whatever helps counter AI scraping and preserves anonymity.
Lemmy also has an admin setting like that. Additionally there will be private, federated communities available in version 1.0.
That’s how it’s been on my mbin instance (fedia.io) for a while now.
Yes PieFed has a setting for that. It makes scrapers give up pretty fast but ruins the experience for people without an account so I only use it on really bad days.
If the idea of a healthy Fediverse requires people moving instances whenever one finds themselves close to bottom-feeders and opportunistic parasites, we already lost.
I see your point, though for me it’s not so much the requirement of moving inasmuch it’s the ease of doing so.
With traditional social media, you’d need to move entirely to another social media platform while you might not even be able to enjoy similar content. With lemmy&piefed, you can do that.
That’s discord model.
Fediverse needs to have a layer which traps AI in a never-ending maze.
Fortunately AI is taking care of that on its own https://doi.org/10.1038/s41586-024-07566-y
That’s the job of the web server, not of the application that runs on it.
There is already software you can get that feeds a never-ending maze of text to AI scrapers, some of which is AI generated and/or designed to poison LLM training. The problem is that these still use up a ton of bandwidth.
How would that layer distinguish AI from non-AI?
A never-ending maze would mean the scrapers just hammer our servers forever. Better to lead them into a honeypot and automatically ban their IP. Like PieFed does.
So just find scrapers and bot farm owners IRL and burn down their houses, easy
What about a maze that adds a few hundred ms to the response time with each request, so the load gets less the longer it’s trapped?
I haven’t tried to make something like that. I think it’d be hard to do that without also exhausting our resources too.
Ah, that makes sense
Sadly that only works for scrapers, content engaging bots are immune to it.
Is that how tarpitting works? I didn’t know.
There are a lot of strategies. afaik a tar pit tries to waste the attacker’s resources by delaying our responses to their traffic? A honey pot tries to funnel bot traffic towards a place which only bots would go to. Once they go there you know they’re a bot and they can be banned.