Why isn’t Twitter working? How Elon Musk finally broke his site – and why the internet might be about to get worse

2023-07-04 15:49

It started like any other outage: unexplained error messages that told users they had hit their “rate limit”, and Twitter posts refusing to load. But as the weekend progressed, it became clear that these weren’t just any old technical problems, but rather issues that could define the future not only of Twitter but of the internet. Elon Musk took to Twitter on Saturday and announced that he would be introducing a range of changes “to address extreme levels of data scraping [and] system manipulation”. Users would only be able to see a limited number of posts, and those who are not logged in wouldn’t be able to see the site at all. That decision triggered those error messages, since users were hitting the “rate limit” that meant they were requesting too many posts for Twitter to be able to handle. The new limits – apparently temporary, though still in effect – meant that users were being rationed on how many tweets they were able to see, and would see frustrating and unexplained messages when they actually hit that limit. In many ways it was yet another perplexing and worrying decision by Mr Musk, whose stewardship of Twitter has lurched from scandal to scandal since he took over the company in October last year. (He appointed a chief executive, Linda Yaccarino, last month, but is still seemingly deciding, executing and communicating the company’s strategy.) But something seems different about the chaos this time around. For one, it is not one of the many content policy issues or potentially hostile ways of encouraging people to sign up for Twitter’s premium service that have marked Mr Musk’s leadership of Twitter so far; for another, it seemed to be part of a broader issue that is rattling the whole internet, and which Twitter might only be one symptom. It remains unclear whether Mr Musk’s latest decision really has anything to do with scraping by artificial intelligence systems, as he claimed. But the explanation certainly makes sense: AI systems require vast corpuses of text and images to be trained on, and the companies that make them have generated that by scraping and regurgitating the text that can be easily found across the web. Every time someone wants to load a web page, their computer makes a request to that company’s servers, which then provide the data that can be reconstructed on the user’s web browser. If you want to load Elon Musk’s Twitter account, for instance, you direct your browser to the relevant address and it will show his Twitter posts, pulled down from the internet. That comes with costs, of course, including the price of running those servers and the bandwidth required to be sending vast amounts of data quickly across the internet. For the most part on the modern internet, that cost has been covered by also sending along some advertising, or requiring that people sign up for a subscription to see the content they are asking for. AI companies that are scraping those sites make frequent requests for that data, however, and quickly. And since the system is automated, they are not able to look at ads or pay for subscriptions, meaning that companies are not paid for the content they are providing. That issue looks to be growing across the internet. Companies that host text discussions, such as Twitter, are very aware that they might be serving up the same data that could one day render them obsolete, and are keen to at least make some money from that process. It also looks to be some of the reason behind the recent fallout on Reddit, too. That site is especially useful for feeding to an AI – it includes very human and very helpful answers to the kinds of questions that users might ask an AI system – and the company is very aware that it is, once again, giving up the information that might also be used to overtake it. To try and solve that, it recently announced that it would be charging large amounts of access to its API, which serves as the interface through which automated systems can hoover up that data. It was at least partly intended as a way to generate money from those AI companies, though it also had the effect of making it too expensive for third-party Reddit clients – which also rely on that API – to keep running, and the most popular ones have since shut down. There is good reason to think that this will keep happening. The web is increasingly being hoovered up by the same AI systems that will eventually be used to further degrade the experience of using it: Twitter is, in effect, being used to train the same bots that will one day post misleading and annoying messages all over Twitter. Every website that hosts text, images or video could face the same problems, as AI companies look to build up their datasets and train up their systems. As such, all of the internet could become more like Mr Musk’s Twitter did over the weekend: actively hostile to actual users, as it attempts to keep the fake users away. But just as likely is that it is Mr Musk’s explanation for why the site went down conveniently chimes with the zeitgeist, and helpfully shifts blame to the AI companies that he has already voiced significant skepticism about. The truth may be that Twitter – which has fired the vast majority of its staff, including those in its engineering teams – might finally be running into problems with infrastructure that happen when fewer people are around to keep the site online. Twitter’s former head of trust and safety, Yoel Roth, is perhaps the best qualified person to suggest that is the case. He said that Mr Musk’s argument for the new limits “doesn’t pass the sniff test” and instead suggested that it was the result of someone mistakenly breaking the rate limiter and then having that accident passed off by Mr Musk as being intentional, whether he knows that or not. “For anyone keeping track, this isn’t even the first time they’ve completely broken the site by bumbling around in the rate limiter,” Mr Roth wrote on Twitter rival Bluesky. “There’s a reason the limiter was one of the most locked down internal tools. Futzing around with rate limits is probably the easiest way to break Twitter.” Mr Roth also said that Twitter has long been aware that it was being scraped – and that it was OK with it. He called it the “open secret of Twitter data access” and said the company considered it “fine”. And he too suggested that the events of the weekend could be a hint about what is coming to the internet, offering an entirely different alternative. It’s not Twitter, Reddit and other companies who should really be upset about what is going on, he suggested. “There’s some legitimacy to Twitter and Reddit being upset with AI companies for slurping up social data gratis in order to train commercially lucrative models,” Mr Roth said. “But they should never forget that it’s not *their* data — it’s ours. A solution to parasitic AI needs to be user-centric, not profit-centric.” Read More Twitter to stop TweetDeck access for unverified users Meta’s Twitter alternative Threads to be launched this week – report Twitter rival Bluesky halts sign-ups after huge surge in demand Twitter is breaking more and more Twitter rival sees huge increase in users as Elon Musk ‘destroys his site’ What does Twitter’s rate-limiting restriction mean?

Elon Musk took to Twitter on Saturday and announced that he would be introducing a range of changes “to address extreme levels of data scraping [and] system manipulation”. Users would only be able to see a limited number of posts, and those who are not logged in wouldn’t be able to see the site at all.

That decision triggered those error messages, since users were hitting the “rate limit” that meant they were requesting too many posts for Twitter to be able to handle. The new limits – apparently temporary, though still in effect – meant that users were being rationed on how many tweets they were able to see, and would see frustrating and unexplained messages when they actually hit that limit.

In many ways it was yet another perplexing and worrying decision by Mr Musk, whose stewardship of Twitter has lurched from scandal to scandal since he took over the company in October last year. (He appointed a chief executive, Linda Yaccarino, last month, but is still seemingly deciding, executing and communicating the company’s strategy.)

But something seems different about the chaos this time around. For one, it is not one of the many content policy issues or potentially hostile ways of encouraging people to sign up for Twitter’s premium service that have marked Mr Musk’s leadership of Twitter so far; for another, it seemed to be part of a broader issue that is rattling the whole internet, and which Twitter might only be one symptom.

It remains unclear whether Mr Musk’s latest decision really has anything to do with scraping by artificial intelligence systems, as he claimed. But the explanation certainly makes sense: AI systems require vast corpuses of text and images to be trained on, and the companies that make them have generated that by scraping and regurgitating the text that can be easily found across the web.

Every time someone wants to load a web page, their computer makes a request to that company’s servers, which then provide the data that can be reconstructed on the user’s web browser. If you want to load Elon Musk’s Twitter account, for instance, you direct your browser to the relevant address and it will show his Twitter posts, pulled down from the internet.

That comes with costs, of course, including the price of running those servers and the bandwidth required to be sending vast amounts of data quickly across the internet. For the most part on the modern internet, that cost has been covered by also sending along some advertising, or requiring that people sign up for a subscription to see the content they are asking for.

AI companies that are scraping those sites make frequent requests for that data, however, and quickly. And since the system is automated, they are not able to look at ads or pay for subscriptions, meaning that companies are not paid for the content they are providing.

That issue looks to be growing across the internet. Companies that host text discussions, such as Twitter, are very aware that they might be serving up the same data that could one day render them obsolete, and are keen to at least make some money from that process.

It also looks to be some of the reason behind the recent fallout on Reddit, too. That site is especially useful for feeding to an AI – it includes very human and very helpful answers to the kinds of questions that users might ask an AI system – and the company is very aware that it is, once again, giving up the information that might also be used to overtake it.

To try and solve that, it recently announced that it would be charging large amounts of access to its API, which serves as the interface through which automated systems can hoover up that data. It was at least partly intended as a way to generate money from those AI companies, though it also had the effect of making it too expensive for third-party Reddit clients – which also rely on that API – to keep running, and the most popular ones have since shut down.

There is good reason to think that this will keep happening. The web is increasingly being hoovered up by the same AI systems that will eventually be used to further degrade the experience of using it: Twitter is, in effect, being used to train the same bots that will one day post misleading and annoying messages all over Twitter.

Every website that hosts text, images or video could face the same problems, as AI companies look to build up their datasets and train up their systems. As such, all of the internet could become more like Mr Musk’s Twitter did over the weekend: actively hostile to actual users, as it attempts to keep the fake users away.

But just as likely is that it is Mr Musk’s explanation for why the site went down conveniently chimes with the zeitgeist, and helpfully shifts blame to the AI companies that he has already voiced significant skepticism about. The truth may be that Twitter – which has fired the vast majority of its staff, including those in its engineering teams – might finally be running into problems with infrastructure that happen when fewer people are around to keep the site online.

Twitter’s former head of trust and safety, Yoel Roth, is perhaps the best qualified person to suggest that is the case. He said that Mr Musk’s argument for the new limits “doesn’t pass the sniff test” and instead suggested that it was the result of someone mistakenly breaking the rate limiter and then having that accident passed off by Mr Musk as being intentional, whether he knows that or not.

“For anyone keeping track, this isn’t even the first time they’ve completely broken the site by bumbling around in the rate limiter,” Mr Roth wrote on Twitter rival Bluesky. “There’s a reason the limiter was one of the most locked down internal tools. Futzing around with rate limits is probably the easiest way to break Twitter.”

Mr Roth also said that Twitter has long been aware that it was being scraped – and that it was OK with it. He called it the “open secret of Twitter data access” and said the company considered it “fine”.

And he too suggested that the events of the weekend could be a hint about what is coming to the internet, offering an entirely different alternative. It’s not Twitter, Reddit and other companies who should really be upset about what is going on, he suggested.

“There’s some legitimacy to Twitter and Reddit being upset with AI companies for slurping up social data gratis in order to train commercially lucrative models,” Mr Roth said.

“But they should never forget that it’s not *their* data — it’s ours. A solution to parasitic AI needs to be user-centric, not profit-centric.”

Twitter to stop TweetDeck access for unverified users

Meta’s Twitter alternative Threads to be launched this week – report

Twitter rival Bluesky halts sign-ups after huge surge in demand

Twitter is breaking more and more

Twitter rival sees huge increase in users as Elon Musk ‘destroys his site’

What does Twitter’s rate-limiting restriction mean?