The “Dead Web” Reality: Why 60% of Your Search Results are Now Being Written by “Ghost Bots”

Marcel covers emerging tech and artificial intelligence with clarity and curiosity. With a background in digital media, he explains tomorrow’s tools in a way anyone can understand.

Something has quietly shifted about the internet over the past few years, and most people can feel it even if they can’t explain it precisely. Search results feel thinner. News articles seem to say the same thing in the same way. Comment sections read like they were written by someone who has never quite lived a human life. The suspicion that something is off is no longer paranoia. The numbers are starting to confirm it. This is what researchers now call the “Dead Internet” phenomenon, and the data behind it is both clearer and more nuanced than the conspiracy theory version that first circulated on fringe forums. What’s actually happening is worth understanding carefully.

From Fringe Theory to Documented Reality

The Dead Internet Theory began as a fringe idea circulating on forums like 4Chan and Agora Road’s Macintosh Café, where users suspected the web had been quietly hollowed out and replaced with automated noise. For years it was dismissed as paranoia. The rapid rise of generative AI has since shifted this conversation from a fringe idea to an observable reality.

In 2023, the dead Internet theory entered academic literature when a book published by the CRC Press included a definition of it in its glossary, and in 2024, an opinion piece titled “Artificial influencers and the dead internet theory” was published in the Curmudgeon Corner of AI & Society. In that 2024 column, Yoshija Walter stated that the once speculative theory is now observable with the introduction of AI generated content.

In 2026, a publication in Computer magazine built upon the earlier AI & Society article by distinguishing between a “leaner” version of the dead Internet theory, centered on the core evidence, and the “conspiracy-laden” full version. The core evidence, stripped of the more extreme claims, is compelling enough on its own.

The Numbers Behind the AI Content Flood

In November 2024, the quantity of AI-generated articles being published on the web surpassed the quantity of human-written articles. That is not a projection or an estimate about the future. It already happened. Graphite, a leading SEO firm, shared data showing more than half of all written articles on the internet are now created by AI, with AI-generated content first overtaking human-generated articles in November 2024.

A rapid spike in AI-generated articles coincided with the release of ChatGPT, climbing from roughly ten percent in late 2022 to over 40 percent by 2024, before slowing to a more steady climb. A study by Ahrefs analyzed 900,000 newly created web pages in April 2025 and found that 74.2 percent of them contained AI-generated content. The pace of change in under three years is striking.

The proportion of AI-generated articles has plateaued since May 2024, and despite the prevalence of AI-generated articles on the web, these articles largely do not appear prominently in Google and ChatGPT results. Still, the sheer volume of AI content in the broader web ecosystem is already a structural fact, not a forecast.

Bot Traffic Has Now Crossed the Human Traffic Line

Bot activity overtook human traffic for the first time in 2024. According to Imperva’s 2025 Bad Bot Report, a global study of automated traffic on the internet, automated systems accounted for 51 percent of all web traffic in 2024. That figure represents a milestone: for the first time in a decade, more than half of all internet activity was not generated by a human being.

A February 2025 paper in the Asian Journal of Research in Computer Science described social platforms as “machine-driven ecosystems,” arguing that bots generate between 40 and 60 percent of web traffic. This shift is largely attributed to the rise of AI and large language models, which have simplified the creation and scaling of bots.

Even more concerning is that “bad bots,” those used for malicious purposes, now comprise almost a third of all traffic. The web’s automation problem is not just a matter of volume. It involves a meaningful slice of deliberately harmful activity running through the system every day.

Social Media Is Quietly Overrun

By one estimate, as many as 64 percent of accounts on X could be bots responsible for 76 percent of peak traffic. That is an extraordinary ratio, even accounting for methodological debate over how bots are counted. The same study estimated that as many as 95 million Instagram accounts, roughly 9.5 percent of the total, could be fake or automated.

After analyzing over 274,000 posts on Medium, researchers estimated that roughly 47 percent were likely AI-generated. A WIRED analysis of Medium found between 40 and 47 percent of a large post sample to be likely AI-generated, and Medium’s CEO said AI posts were “up tenfold” compared to early 2024. Those figures reflect what is happening across writing platforms more broadly.

The dead internet theory essentially claims that activity and content on the internet, including social media accounts, are predominantly being created and automated by artificial intelligence agents, which can rapidly create posts alongside AI-generated images designed to farm engagement on platforms such as Facebook, Instagram, and TikTok. The mechanics are now well documented.

Google Is Struggling to Keep Search Clean

In 2024, Google reported that its search results were being inundated with websites that “feel like they were created for search engines instead of people,” and acknowledged the role of generative AI in the rapid proliferation of such content, noting that it could displace more valuable human-made alternatives. That is a significant admission from the company that effectively controls how most people find information online.

Google’s massive March 2024 search update took aim at AI-generated “copycat content,” with the search giant expecting the update to reduce unhelpful, low-quality content by 40 percent. Major updates in both March 2024 and February 2025 drastically targeted low-quality content in search results. Google has continued issuing spam updates through 2025, with another significant rollout in August of that year.

The platform incentive structure never punished AI content at scale, which meant content farms could flood the web with AI-generated pages targeting long-tail keywords at near-zero cost. AI-generated content costs a fraction of human writing to produce, and Google’s own research found no consistent correlation between AI content and search ranking penalty. That combination created exactly the conditions for runaway automation.

The Rise of AI-Generated “News” Factories

NewsGuard tracked the rise of AI-generated “news” sites from 49 to 1,271 between May 2023 and May 2025, with the number reaching 1,121 by November 2024 and continuing to grow into 2025. These are not fringe operations. Many appear in mainstream search results.

These sites operate with a simple but effective business model: generate massive volumes of content using AI tools like ChatGPT or similar language models, optimize that content for search engines, and harvest advertising revenue when users click through from Google Discover or search results. Some operators have reportedly become millionaires through this scheme.

French investigative journalist Jean-Marc Manach uncovered a sprawling ecosystem of over 4,000 AI-generated news websites operating primarily in French, with at least 100 already appearing in English. What began as 70 suspicious sites in early 2024 metastasized into thousands. The scale of this problem is not limited to any one country or language.

Humans Can Barely Tell the Difference Anymore

A recent study found that humans can distinguish AI-generated text from human-written text roughly 53 percent of the time, just barely better than making a random guess, and this holds across all levels of education and expertise. That near-coin-flip accuracy has real consequences for how we process information.

In a healthcare-related study, healthcare professionals were only able to correctly identify whether abstracts were generated by human or AI 43 percent of the time on average, with accuracy ranging from 20 to 57 percent. Medical professionals reading research summaries in their own fields are essentially guessing. That is a sobering data point.

The practical implication is straightforward: the cues people have historically used to evaluate trustworthiness no longer work reliably. The broader concern is not that fewer people are online, but that automated activity is eroding the basic cues people use to tell who’s real. That erosion is now measurable and progressing.

The “Model Collapse” Problem Lurking Beneath the Surface

There is a deeper technical consequence to all of this AI content flooding the internet, one that reaches beyond what users see in their search results. Analysis published in Nature shows that indiscriminately training generative AI on real and generated content, usually done by scraping data from the internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.

Research found that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear, and this effect can occur in variational autoencoders, Gaussian mixture models, and LLMs. In plain terms: AI trained on AI output gradually loses touch with the messy, diverse reality that made the original training data valuable.

When a new model is trained primarily or partially on AI-generated data, it inherits the biases, errors, and distortions embedded in those outputs. As this pattern continues over multiple generations of model training, the distribution of training data drifts away from real-world information, causing outputs to become less coherent and eventually collapse into gibberish or repetitive, low-value text. This is not a hypothetical future risk. It describes the direction the current training pipeline is moving.

The Political and Social Stakes

Bots on social media have been found to be significantly involved in disseminating articles from unreliable sources, with accounts with high numbers of followers legitimizing misinformation and leading real users to believe, engage with, and reshare bot-posted content. This is not a new problem, but AI has made it dramatically cheaper to operate at scale.

AI-generated posts receive inflated engagement from bot networks, which in turn triggers platform algorithms to prioritize them, creating a self-reinforcing loop in which synthetic signals create the illusion of widespread political support. Visibility online is no longer necessarily a signal of genuine human interest. It can be manufactured.

In 2024 election campaigns, despite Russia’s persistent interference on X, there were no measurable changes in attitudes, polarization, or voting behavior, and Meta reported AI-generated misinformation had “modest and limited” impact. This suggests we may be overestimating AI’s direct persuasive power, while underestimating its capacity to corrode institutional trust. The slow erosion of trust may be the more consequential effect.

What Authentic Content Actually Looks Like Now

Researchers hypothesize that the plateau in AI-generated article growth is because practitioners found that AI-generated articles do not perform well in search. Large-scale studies show that although AI articles are published in huge volumes, the top-ranking pages tend to be human-written or heavily human-edited because they offer more originality and value. There is still a meaningful gap at the quality frontier.

A growing number of blockchain projects, including World, formerly Worldcoin, Proof of Personhood, and Human Passport, are rolling out systems meant to prove personhood by tying online activity to a verified human. The problem of proving you are real online has gone from philosophical curiosity to practical infrastructure challenge.

Legislation is slowly catching up. The EU AI Act, particularly Article 50, introduces transparency obligations that require users to be informed when they are interacting with AI systems, a step that is inspiring similar regulatory frameworks globally, from China to South Korea and Brazil. Regulation alone will not reverse the content flood, but it does begin to set expectations around disclosure.

What This Means for Anyone Using the Web

Google’s response, serving AI-generated summaries via AI Overviews rather than linking to sources, has created a feedback loop where AI content generates AI summaries that further displace human publishers. The architecture of how search works is itself reinforcing the synthetic content cycle. As of August 2025, roughly 10 percent of sources cited inside Google’s AI Overviews were themselves AI-generated.

As the web tilts toward AI-generated content, future models trained on it will reflect a degraded, homogenized signal. Human insight, the original data source, is being diluted at the input layer of AI development. That is the longer-term structural problem that makes the current moment more consequential than a simple spam problem.

Human expertise, originality, firsthand experience, and well-sourced data are ranking signals that AI systems cannot cheaply mimic. AI can generate words, images, and videos, but it cannot replicate lived experience, strategic thinking, or ethical insight. That remains true, and it may be the most useful thing to hold onto as this landscape continues to shift.

Conclusion: Reading the Web More Carefully

The “Dead Web” is not a metaphor for total collapse. The internet still carries real human voices, genuine research, and authentic community. What has changed is the signal-to-noise ratio, and that change is now large enough to show up clearly in independent data from multiple sources, from Ahrefs to Graphite to Imperva.

The practical response is less about alarm and more about recalibrating habits. Source transparency, author bylines, verifiable expertise, and original reporting matter more now than they did three years ago, not because they were always ignored, but because the volume of synthetic content surrounding them has grown so dramatically.

The deeper question is not whether humans can outpace bots in raw content volume. They clearly cannot. The question is whether the parts of the web that still carry genuine human thought remain findable, trusted, and worth protecting. That is less a technical challenge than a collective choice about what we decide to value.

About the author

Marcel Kuhn