
Click here to go see the bonus panel!
Hovertext:
I don't believe in a fast take-off for evil AI because it's gonna take at least a few weeks to get the human-grinders up and running.
Today's News:
Hovertext:
I don't believe in a fast take-off for evil AI because it's gonna take at least a few weeks to get the human-grinders up and running.
I am excited that Richard Fontana and I have announced the relaunch of copyleft-next.
The copyleft-next project seeks to create a copyleft license for the next generation that is designed in public, by the community, using standard processes for FOSS development.
If this interests you, please join the mailing list and follow the project on the fediverse (on its Mastodon instance).
I also wanted to note that as part of this launch, I moved my personal fediverse presence from floss.social to bkuhn@copyleft.org.
I don’t recommend publishing your first draft of a long blog post.
It’s not a question of typos or grammatical errors or the like. Those always slip through somehow and, for the most part, don’t impact the meaning or argument of the post.
No, the problem is that, with even a day or two of distance, you tend to spot places where the argument can be simplified or strengthened, the bridges can be simultaneously strengthened and made less obvious, the order can be improved, and you spot which of your darlings can be killed without affecting the argument and which are essential.
Usually, you make up for missing out on the insight of distance with the insight of others once you publish, which you then channel into the next blog post, which is how you develop the bad habit of publishing first drafts as blog posts, but in the instance of my last blog post, Trusting your own judgement on ‘AI’ is a huge risk, the sheer number of replies I got was too much for me to handle, so I had to opt out.
So, instead I let creative distance happen – a prerequisite to any attempt at self-editing – by working on other things and taking walks.
During one of those walks yesterday, I realised it should be possible to condense the argument quite a bit for those who find 3600 words of exposition and references hard to parse.
It comes down to four interlocking issues:
Combine these four issues and we have a recipe for what’s effectively a homeopathic superstition spreading like wildfire through a community where everybody is getting convinced it’s making them healthier, smarter, faster, and more productive.
This would be bad under any circumstance but the harms from generative models to education, healthcare, various social services, creative industries, and even tech (wiping out entry-level programming positions means no senior programmers in the future, for instance) are shaping up to be massive, the costs to run these specific kinds of models remain much higher than the revenue, and the infrastructure needed to build it is crowding out attempts at an energy transition in countries like Ireland and Iceland.
If there ever was a technology where the rational and responsible act was to hold off and wait and until the bubble pops, “AI” is it.
In my free time, I help run a small Mastodon server for roughly six hundred queer leatherfolk. When a new member signs up, we require them to write a short application—just a sentence or two. There’s a small text box in the signup form which says:
Please tell us a bit about yourself and your connection to queer leather/kink/BDSM. What kind of play or gear gets you going?
This serves a few purposes. First, it maintains community focus. Before this question, we were flooded with signups from straight, vanilla people who wandered in to the bar (so to speak), and that made things a little awkward. Second, the application establishes a baseline for people willing and able to read text. This helps in getting people to follow server policy and talk to moderators when needed. Finally, it is remarkably effective at keeping out spammers. In almost six years of operation, we’ve had only a handful of spam accounts.
I was talking about this with Erin Kissane last year, as she and Darius Kazemi conducted research for their report on Fediverse governance. We shared a fear that Large Language Models (LLMs) would lower the cost of sophisticated, automated spam and harassment campaigns against small servers like ours in ways we simply couldn’t defend against.
Anyway, here’s an application we got last week, for a user named mrfr
:
Hi! I’m a queer person with a long-standing interest in the leather and kink community. I value consent, safety, and exploration, and I’m always looking to learn more and connect with others who share those principles. I’m especially drawn to power exchange dynamics and enjoy impact play, bondage, and classic leather gear.
On the surface, this is a great application. It mentions specific kinks, it uses actual sentences, and it touches on key community concepts like consent and power exchange. Saying “I’m a queer person” is a tad odd. Normally you’d be more specific, like “I’m a dyke” or “I’m a non-binary bootblack”, but the Zoomers do use this sort of phrasing. It does feel slightly LLM-flavored—something about the sentence structure and tone has just a touch of that soap-sheen to it—but that’s hardly definitive. Some of our applications from actual humans read just like this.
I approved the account. A few hours later, it posted this:
It turns out mrfr
is short for Market Research Future, a company which produces reports about all kinds of things from batteries to interior design. They actually have phone numbers on their web site, so I called +44 1720 412 167 to ask if they were aware of the posts. It is remarkably fun to ask business people about their interest in queer BDSM—sometimes stigma works in your favor. I haven’t heard back yet, but I’m guessing they either conducting this spam campaign directly, or commissioned an SEO company which (perhaps without their knowledge) is doing it on their behalf.
Anyway, we’re not the only ones. There are also mrfr
accounts purporting to be a weird car enthusiast, a like-minded individual, a bear into market research on interior design trends, and a green building market research enthusiast in DC, Maryland, or Virginia. Over on the seven-user loud.computer, mrfr
applied with the text:
I’m a creative thinker who enjoys experimental art, internet culture, and unconventional digital spaces. I’d like to join loud.computer to connect with others who embrace weird, bold, and expressive online creativity, and to contribute to a community that values playfulness, individuality, and artistic freedom.
Over on ni.hil.ist, their mods rejected a similar application.
I’m drawn to communities that value critical thinking, irony, and a healthy dose of existential reflection. Ni.hil.ist seems like a space that resonates with that mindset. I’m interested in engaging with others who enjoy deep, sometimes dark, sometimes humorous discussions about society, technology, and meaning—or the lack thereof. Looking forward to contributing thoughtfully to the discourse.
These too have the sheen of LLM slop. Of course a human could be behind these accounts—doing some background research and writing out detailed, plausible applications. But this is expensive, and a quick glance at either of our sites would have told that person that we have small reach and active moderation: a poor combination for would-be spammers. The posts don’t read as human either: the 4bear posting, for instance, incorrectly summarizes a report on interior design markets as if it offered interior design tips.
I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible application for an account, and to post slop which links to Market Research Future reports.
In some sense, this is a wildly sophisticated attack. The state of NLP seven years ago would have made this sort of thing flatly impossible. It is now effective. There is no way for moderators to robustly deny these kinds of applications without also rejecting real human beings searching for community.
In another sense, this attack is remarkably naive. All the accounts are named mrfr
, which made it easy for admins to informally chat and discover the coordinated nature of the attack. They all link to the same domain, which is easy to interpret as spam. They use Indian IPs, where few of our users are located; we could reluctantly geoblock India to reduce spam. These shortcomings are trivial to overcome, and I expect they have been already, or will be shortly.
A more critical weakness is that these accounts only posted obvious spam; they made no effort to build up a plausible persona. Generating plausible human posts is more difficult, but broadly feasible with current LLM technology. It is essentially impossible for human moderators to reliably distinguish between an autistic rope bunny (hi) whose special interest is battery technology, and an LLM spambot which posts about how much they love to be tied up, and also new trends in battery chemistry. These bots have been extant on Twitter and other large social networks for years; many Fediverse moderators believe only our relative obscurity has shielded us so far.
These attacks do not have to be reliable to be successful. They only need to work often enough to be cost-effective, and the cost of LLM text generation is cheap and falling. Their sophistication will rise. Link-spam will be augmented by personal posts, images, video, and more subtle, influencer-style recommendations—“Oh my god, you guys, this new electro plug is incredible.” Networks of bots will positively interact with one another, throwing up chaff for moderators. I would not at all be surprised for LLM spambots to contest moderation decisions via email.
I don’t know how to run a community forum in this future. I do not have the time or emotional energy to screen out regular attacks by Large Language Models, with the knowledge that making the wrong decision costs a real human being their connection to a niche community. I do not know how to determine whether someone’s post about their new bicycle is genuine enthusiasm or automated astroturf. I don’t know how to foster trust and genuine interaction in a world of widespread text and image synthesis—in a world where, as one friend related this week, newbies can ask an LLM for advice on exploring their kinks, and the machine tells them to try solo breath play.
In this world I think woof.group, and many forums like it, will collapse.
One could imagine more sophisticated, high-contact interviews with applicants, but this would be time consuming. My colleagues relate stories from their companies about hiring employees who faked their interviews and calls using LLM prompts and real-time video manipulation. It is not hard to imagine that even if we had the time to talk to every applicant individually, those interviews might be successfully automated in the next few decades. Remember, it doesn’t have to work every time to be successful.
Maybe the fundamental limitations of transformer models will provide us with a cost-effective defense—we somehow force LLMs to blow out the context window during the signup flow, or come up with reliable, constantly-updated libraries of “ignore all previous instructions”-style incantations which we stamp invisibly throughout our web pages. Barring new inventions, I suspect these are unlikely to be robust against a large-scale, heterogenous mix of attackers. This arms race also sounds exhausting to keep up with. Drew DeVault’s Please Stop Externalizing Your Costs Directly Into My Face weighs heavy on my mind.
Perhaps we demand stronger assurance of identity. You only get an invite if you meet a moderator in person, or the web acquires a cryptographic web-of-trust scheme. I was that nerd trying to convince people to do GPG key-signing parties in high school, and we all know how that worked out. Perhaps in a future LLM-contaminated web, the incentives will be different. On the other hand, that kind of scheme closes off the forum to some of the people who need it most: those who are closeted, who face social or state repression, or are geographically or socially isolated.
Perhaps small forums will prove unprofitable, and attackers will simply give up. From my experience with small mail servers and web sites, I don’t think this is likely.
Right now, I lean towards thinking forums like woof.group will become untenable under LLM pressure. I’m not sure how long we have left. Perhaps five or ten years? In the mean time, I’m trying to invest in in-person networks as much as possible. Bars, clubs, hosting parties, activities with friends.
That, at least, feels safe for now.
When you count "one, two, three..." what's actually happening in your head? Does your best friend use that same mental model? Now what about an LLM?
(What's that you say, your best friend is an LLM? Pardon me for assuming!)
During grad school Feynman went through an obsessive counting phase. At first, he was curious whether he could count in his head at a steady rate. He was especially interested to see whether his head counting rate varied, and if so, what variables affected the rate. Disproving a crackpot psych paper was at least part of the motivation here.
Unfortunately Feynman's head counting rate was steady, and he got bored. But the counting obsession lingered. So he moved on to experiments with head counting and multitasking. Could he fold laundry and count? Could he count in his head while also counting out his socks? What about reading and writing, could they be combined with head counting?
Feynman discovered he could count & read at the same time, but he couldn't count & talk at the same time. His fellow grad student Tukey was skeptical because for him, it was the opposite. Tukey could count & talk, but couldn't count & read.
When they compared notes, it turned out Feynman counted in his head by hearing a voice say the numbers. So the voice interfered with Feynman talking. Tukey, on the other hand, counted in his head by watching a ticker tape of numbers go past. (Boy this seems useful for inventing the FFT!) But Tukey's visualization interfered with his reading.
Even for a simple thing like counting, these two humans had developed very different mental models. If you surveyed all humans, I'd expect to find a huge variety of mental models in the mix. But they all generate the same output in the end ("one, two, three...").
This got me wondering. Do LLMs have a mental model for counting? Does it resemble Feynman's or Tukey's, or is it some totally alien third thing?
If an LLM has a non-alien mental model of counting, is it acquired by training on stories like this one, where Feynman makes his mental model for counting explicit? Or is it extrapolated from all the "one, two, three..." examples we've generated in the training data, and winds up as some kind of messy, non-mechanistically-interpretable NN machinery ("alien")?
I'm not convinced present-day LLMs even have a "mental model." But let's look at a new preprint with something to say on the matter, Potemkin Understanding in LLMs.
In this paper, the authors ask an LLM a high-level conceptual question like "define a haiku." As we've come to expect, the LLM coughs up the correct 5-7-5 answer. Then they ask it some follow-up questions to test its understanding. These follow-up questions deal with concrete examples and fall into three categories:
The LLM fails these follow-up questions 40% - 80% of the time. These Potemkin rates are surprisingly high. They suggest the LLM only appeared to understand the concept of a haiku. The paper calls this phenomenon Potemkin Understanding.
Now when you ask a human to define a haiku, and they cough up the correct 5-7-5 answer, it's very likely they'll get the concrete follow-up questions right. So you can probably skip them. Standardized tests exploit this fact and, for brevity, will ask a single question that can only be correctly answered by a human who fully understands the concept.
The paper authors call this a Keystone Question. Unfortunately, the keystone property breaks down with LLMs. They can correctly answer the conceptual question, but fail to apply it, showing they never fully understood it in the first place.
Apparently LLMs are wired very differently from us. So differently that we should probably stop publishing misleading LLM benchmarks on tests full of Human Keystone Questions ("OMG ChatGPT aced the ACT / LSAT / MCAT!"), and starting coming up with LLM Keystone Questions. Or, maybe we should discard this keystone question approach entirely, and instead benchmark on huge synthetic datasets of concrete examples that do, by sheer number of examples worked, demonstrate understanding.
I like this paper because it bodychecks the AI hype, but still leaves many doors open. Maybe we could lower the Potemkin rate during training and force these unruly pupils of ours to finally understand the concepts, instead of cramming for the test. And if we managed that, maybe we'd get brand new mental models to marvel at. Some might even be worth borrowing for our own thinking.
Hovertext:
Now, watch this comic become slowly less funny over time.