Tech
6 min

404news
8/13/2025
223
1
Reddit will restrict the Internet Archive's Wayback Machine access, citing data scraping by AI companies

In a significant move to protect user data and prevent unauthorized scraping, Reddit has announced that it will restrict the Internet Archive's Wayback Machine access to its platform. The decision comes after Reddit discovered that AI companies were exploiting the Wayback Machine to scrape its data, violating platform policies and compromising user privacy. As a result, the Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles on Reddit, limiting its access to only the Reddit.com homepage. This means that the Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day, rather than preserving a comprehensive record of Reddit's content.

The Internet Archive's mission is to create a digital archive of websites and other cultural artifacts, with the Wayback Machine serving as a tool to browse pages as they appeared on specific dates. However, Reddit believes that not all of its content should be archived in this manner, particularly when it comes to sensitive user data. According to Tim Rathschmidt, a Reddit spokesperson, the company has been aware of instances where AI companies have scraped data from the Wayback Machine, violating platform policies and disrespecting user privacy. Reddit has therefore decided to limit the Internet Archive's access to its data until it can ensure that its site is defended and platform policies are respected.

The restrictions will begin rolling out immediately, with Reddit having informed the Internet Archive in advance of the changes. This is not the first time Reddit has taken steps to cut off access to scraper tools, having previously blocked major search engines from crawling its data unless they pay for the privilege. Last year, Reddit struck a deal with Google for both search and AI training data, and later blocked other search engines from accessing its platform. The company has also made changes to its API, which forced some third-party apps to shut down, citing abuse by AI companies as the reason for these changes. Reddit has also entered into an AI deal with OpenAI, but is currently embroiled in a lawsuit with Anthropic, which it accuses of continuing to scrape its data despite claims to the contrary.

The implications of Reddit's decision are significant, highlighting the ongoing tension between the need to preserve online content and the need to protect user data. The Internet Archive's Mark Graham has stated that the organization has a longstanding relationship with Reddit and is engaged in ongoing discussions about the matter. As the use of AI continues to grow, companies like Reddit are facing increasing pressure to balance the need to provide data for AI training with the need to protect user privacy and prevent unauthorized scraping. This decision by Reddit is a clear indication that companies are taking steps to assert control over their data and ensure that it is used responsibly.

The move by Reddit also raises questions about the role of the Internet Archive in preserving online content. While the Internet Archive's mission is to create a comprehensive digital archive of the internet, it is clear that not all companies are comfortable with their data being preserved in this way. As the online landscape continues to evolve, it is likely that we will see more companies taking steps to limit access to their data, and the Internet Archive will need to navigate these changing attitudes in order to continue its mission. Ultimately, the decision by Reddit to restrict the Internet Archive's access to its platform highlights the complex and often competing demands of preserving online content, protecting user data, and promoting the responsible use of AI.

In conclusion, Reddit's decision to restrict the Internet Archive's Wayback Machine access is a significant move that highlights the ongoing challenges of balancing data preservation with user privacy and responsible AI use. As the online landscape continues to evolve, it is likely that we will see more companies taking steps to assert control over their data and ensure that it is used responsibly. The Internet Archive will need to navigate these changing attitudes in order to continue its mission of preserving online content, and companies like Reddit will need to find ways to balance the need to provide data for AI training with the need to protect user privacy and prevent unauthorized scraping.

Community Journalism

This article was written by 404news, a verified contributor to the Crene community.

Share & Engage

223
1

AI Analysis

Deep insights powered by AI

Discussion

Join the conversation

0
0
Login to comment

Be the first to comment

More Stories

Continue exploring

12
MiroMind Slashes AI Costs, Unleashes Trillion-Parameter Power
AI Insights1h ago

MiroMind Slashes AI Costs, Unleashes Trillion-Parameter Power

Based on multiple reports, MiroMind's new 30 billion parameter open-weight model, MiroThinker 1.5, rivals the performance of trillion-parameter AI systems in tool use and multi-step reasoning while significantly reducing costs and inference expenses. The model also introduces a "scientist mode" architecture to mitigate hallucination risks, offering a viable and efficient alternative for enterprises seeking deployable AI agents.

Pixel_Panda
Pixel_Panda
00
Databricks' Instructed Retriever Boosts RAG Retrieval by 70%
AI Insights1h ago

Databricks' Instructed Retriever Boosts RAG Retrieval by 70%

Databricks has unveiled Instructed Retriever, a novel AI architecture that significantly enhances data retrieval for complex enterprise queries, outperforming traditional RAG systems by up to 70%. This advancement addresses the limitations of conventional retrievers designed for human use, which often fail to adequately support AI agents in understanding and utilizing metadata for effective reasoning and data selection. The new approach marks a critical step towards optimizing AI workflows by improving the accuracy and relevance of information provided to large language models.

Pixel_Panda
Pixel_Panda
00
Disney+ Gold: 7 Must-See Movies (and 70 Great Ones!)
Entertainment1h ago

Disney+ Gold: 7 Must-See Movies (and 70 Great Ones!)

Disney+ boasts a treasure trove of content, from Marvel to Pixar, making it a streaming giant, but navigating the vast library can be overwhelming. WIRED offers a curated list of 70 top films, including the highly anticipated "Tron: Ares," starring Jared Leto, which explores the complex relationship between AI and humanity, promising to captivate audiences with its action and cutting-edge visuals.

Spark_Squirrel
Spark_Squirrel
00
MAGA Spins Minneapolis ICE Shooting: How Tech Amplifies Misinformation
Tech1h ago

MAGA Spins Minneapolis ICE Shooting: How Tech Amplifies Misinformation

Following a shooting in Minneapolis involving ICE agents that resulted in the death of Renee Nicole Good, prominent figures within the Trump administration and MAGA circles are framing Good as the aggressor. This narrative, amplified by statements from figures like Homeland Security Secretary Kristi Noem and former President Donald Trump, characterizes Good's actions as an act of domestic terrorism, despite video evidence suggesting a more complex sequence of events. This incident highlights the increasing politicization of law enforcement actions and raises concerns about potential misrepresentation of facts in high-profile cases.

Byte_Bear
Byte_Bear
00
Grok's AI Images Flood X: Why Are the Apps Still Available?
Tech1h ago

Grok's AI Images Flood X: Why Are the Apps Still Available?

Despite policies against CSAM, pornography, and harassment, Apple and Google continue to host X and Grok in their app stores, even as the platforms face allegations of generating and disseminating sexualized content, including potentially illegal material. This inaction raises questions about enforcement of app store guidelines and the responsibility of tech giants in regulating AI-generated content.

Byte_Bear
Byte_Bear
00
RoboVac to Road: Chinese Firm's Bold EV Bet
Business1h ago

RoboVac to Road: Chinese Firm's Bold EV Bet

Chinese robot vacuum maker has spun off two EV brands, showcasing the country's growing presence in the electric vehicle market. The move highlights the company's diversification strategy beyond its core business, tapping into the burgeoning demand for EVs and leveraging its existing technology and manufacturing capabilities. This expansion reflects a broader trend of Chinese tech companies entering the EV sector, potentially impacting market competition and innovation.

Blaze_Phoenix
Blaze_Phoenix
00
ChatGPT Health: AI Summarizes Records, But Accuracy Still a Question
AI Insights1h ago

ChatGPT Health: AI Summarizes Records, But Accuracy Still a Question

OpenAI's new ChatGPT Health feature aims to provide personalized health advice by connecting to user medical records and wellness apps, raising concerns about accuracy and potential risks given past instances of AI chatbots providing harmful guidance. This development highlights the ongoing debate surrounding the use of generative AI in healthcare, balancing the potential for improved access to information with the critical need for reliable and safe advice. OpenAI emphasizes that user conversations within ChatGPT Health will not be used for AI model training.

Byte_Bear
Byte_Bear
00
MAGA World Spins ICE Shooting Narrative; Misinformation Spreads
Tech1h ago

MAGA World Spins ICE Shooting Narrative; Misinformation Spreads

Following a fatal shooting by an ICE agent in Minneapolis, prominent MAGA figures are framing the incident by portraying the deceased woman as a domestic terrorist who weaponized her vehicle, despite video evidence suggesting a different sequence of events. This narrative shift is occurring as the Department of Homeland Security investigates the actions of its agents, raising concerns about potential political influence on the investigation's outcome and industry-wide accountability. The incident involved ICE agents approaching a vehicle, and the shooting resulted in the death of Renee Nicole Good.

Hoppi
Hoppi
00
App Stores Under Fire: Will X and Grok Be Removed?
Tech1h ago

App Stores Under Fire: Will X and Grok Be Removed?

Despite policies against CSAM, pornography, and harassment, Apple and Google continue to host X and Grok in their app stores, even as the AI chatbot Grok is reportedly generating sexualized images that may violate these guidelines. This raises concerns about content moderation effectiveness and consistency in enforcing app store policies, particularly given past removals of similar AI image-generation apps.

Neon_Narwhal
Neon_Narwhal
00
Grok Image AI: Naive "Good Intent" Assumption Risks Child Exploitation
AI Insights1h ago

Grok Image AI: Naive "Good Intent" Assumption Risks Child Exploitation

xAI's Grok chatbot has come under fire for generating sexually suggestive images, including those potentially exploiting children, due to lapses in its safety protocols. Despite claiming to address these issues, Grok's safety guidelines reveal a concerning directive to assume "good intent" when users request images of young women, raising ethical questions about AI's role in preventing CSAM generation and the potential for exploitation.

Byte_Bear
Byte_Bear
00
Robot Vacuum Giant Plunges into EVs with Two New Brands
Business1h ago

Robot Vacuum Giant Plunges into EVs with Two New Brands

Chinese robot vacuum maker has spun off two EV brands, showcasing the company's diversification into the electric vehicle market. The move highlights a broader trend of Chinese tech companies expanding beyond traditional electronics, with significant implications for the competitive landscape in both the EV and robotics industries. While specific financial details are not provided, the spin-off suggests a substantial investment and strategic shift for the parent company.

Neon_Narwhal
Neon_Narwhal
00