Gaining Visibility in the Algorithmic-based Internet by Reclaiming Your Space

If you are writing, one of your primary objectives is to be seen and read—two distinct but interconnected goals. Visibility helps your work reach a wider audience, which can lead to more readers engaging with your content. However, in today’s Internet landscape, visibility is shaped by algorithms on social networks, search engines, and advertising networks. These systems often prioritize content that aligns with their business interests, rather than purely the quality or relevance of the material.

The traditional bookstore model offers an insightful comparison. In the past, a bookseller might carefully curate their inventory based on relationships with publishers or their own judgment of quality and market interest. As a reader, you could walk into a bookshop, browse the shelves, or explore a table displaying new releases. Often, a thoughtful recommendation from the bookseller could guide your choice. If a book sparked your curiosity, you might purchase it, and, if it resonated, spend time reading and reflecting on its reading.

In the algorithmic age, or perhaps more accurately, the digital advertising age, the human touch of discovery has been replaced by calculated exposure. Content is surfaced not by serendipity or personal curation but by systems optimised to serve business objectives, often prioritizing engagement metrics over quality.

We often hear that to stand out in this environment, authors or writers must adopt a strategic approach, blending authenticity with a deep understanding of how algorithms operate. To be honest, that’s complete bullshit. The content that is most visible today is rarely the most authentic or useful. It’s simply the content that can generate clicks and capture brain time for advertisers.

I recently rediscovered an old backup of logs from my web server from the early 2000s and compared them with the current logs from 2024. The differences are striking. Back then, around 80% of the access was generated by actual users opening URLs in their browsers, while the remaining 20% was machine crawling or other noise.

Today (2024), noise accounts for 90% of the accesses to my blog posts. This noise primarily comes from:

Trending bots attempting to determine whether a link is interesting enough to include in their algorithms to notify customers that the information is relevant. Companies like Hootsuite, Zoho, Ahref/Yep and many others operate in this space—though I suspect they rarely, if ever, actually read the content. It seems more about branding trends than genuine readers engagement.
Aggressive crawling bots, often associated with AI or linguistic projects, use user-agents reference that provide vague or meaningless explanations of their purpose. Companies like Facebook and Amazon are among the most common in this space. However, it’s clear that your writing will never actually appear on their platforms—it’s simply harvested to serve their data-gathering efforts and refine their own content.
Search engine bots, like Google’s, crawl your pages and might include them in their index weeks or months later. However, don’t expect to appear in the top ten results unless your content is an exact match—and even then, if you’re not part of their advertising network, you’re likely out. On a side note, the SEO industry has effectively destroyed and fucked-up the organic nature of the Internet.
Security bots brute-force attempts, which are increasingly difficult to differentiate from legitimate activity. Many security vendors operate across multiple domains, including exploitation or reselling their data feeds for purposes beyond simply notifying website owners of vulnerabilities.
Mastodon “pre-fetching” URLs, which provides no indication of how many people actually read the content. The overhead can be significant, especially if your post is boosted by an account with many followers, as it’s simply the Fediverse server performing a GET request. Out of curiosity, I analysed a specific URL that was shared only on the Fediverse. It received approximately 34,000 GET requests, yet this translated to just two actual users reading the content.
Pre-fetching bots designed to display your content within ad-supported interfaces. This practice is increasingly common, with bots often testing how the content fits different devices. Unsurprisingly, this doesn’t benefit the content creator in any meaningful way. Instead, it’s aimed at keeping users within the ad-supported application, preventing them from venturing into the broader Internet beyond the confines of these advertising-driven ecosystems.

Less than 10% of the remaining entries in the logs represent actual users browsing and reading the content (and that’s without delving into questions about attention span or whether the content is truly read beyond the title). This 10% is an optimistic estimate based on this quick analysis.

How did we go from an Internet where 80% of the traffic came from genuine readers to a mere 10%, with the rest being noise? Have we really spent the last twenty years investing in Internet routing, bandwidth, server storage, hardware, and energy only to support access via advertising-gated platforms? In doing so, we’ve limited the original promise of the Internet: providing diversity and access to a wealth of knowledge.

What would be the strategy to return to the original promise of the Internet? Blogrolls, random access to blog posts or websites, RSS feeds, or perhaps running your own server? I’m still exploring, but I’m trying to contribute my part to that original vision.

We’ve set up a book club aligned with this promise called le sillon fictionnel. The site is now one year old, and we have 34 followers on Mastodon. According to the logs, we have some actual readers, despite the 95% of noise generated by bots.