Google Search now trains AI on your media uploads—and how to opt out

Google has quietly expanded the scope of data collection for training its artificial intelligence models. A recent update to Google Search history now retains media uploads from user interactions—for instance, images used in reverse image searches—and repurposes them to improve the company’s AI systems. The feature is enabled by default for all accounts, and only a deliberate user choice can stop it.

The silent update

This change alters how Google Search history works, which already logged queries and visited links. Now, the multimedia files users submit during interactions with the search engine also become part of the training dataset. Google has not disclosed technical details about which models are trained or how data is anonymized, but the move aligns with the race to acquire diverse data for feeding Large Language Models and multimodal systems. For organizations using cloud services like Google Workspace or relying on search APIs, the implication is immediate: their content could inadvertently contribute to models that may later be used by competitors or external actors.

How to opt out of data usage

The opt-out process is not complicated, but it requires navigating through account settings. Typically, users can go to “Data & privacy,” locate the search history settings, and disable the option that allows data to be used for AI model improvement. The company warns that opting out may reduce personalization of services, but for those who care about confidentiality, it’s an acceptable trade-off. Transparency remains partial: it is unclear how long media files are retained, or whether opting out retroactively deletes data already collected.

Why businesses need to raise their guard

For organizations handling intellectual property, healthcare data, or trade secrets, Google’s new policy poses a concrete risk. Imagine an employee mistakenly uploading an image of an unannounced product for a reverse image search: that image would end up in a training dataset, with potential legal and competitive repercussions. The issue goes beyond a single feature—it signals that mainstream cloud services are shifting toward a model where user data becomes raw material for AI, often without explicit, granular consent. The General Data Protection Regulation (GDPR) imposes strict constraints, but the opacity of usage practices makes it hard for companies to prove compliance.

On-premise and data sovereignty: the AI-RADAR perspective

This is where on-premise deployment gains strategic importance. Running LLMs and AI systems entirely on in-house infrastructure—whether a GPU-equipped server on site or an air-gapped cluster—ensures that no data ever leaves the corporate perimeter. There’s no need to trust third-party opt-out policies or accept privacy compromises in exchange for services. Of course, self-hosting requires hardware investments, in-house expertise, and a careful evaluation of total cost of ownership (TCO). But for organizations in regulated sectors or those that treat data sovereignty as a competitive pillar, the choice to avoid handing training over to external providers becomes an enabler, not a burden. AI-RADAR will continue to explore frameworks and architectures for on-premise deployment, equipping businesses with the analytical tools needed to weigh these trade-offs.