To help journalists better understand the risks of artificial intelligence and set boundaries around which AI systems and tools make sense to use, adjust, or avoid, we’ve put together a series of guides on the most common ways in which you, and your sensitive data, are likely to interact with AI.
In the first part of this series, we outlined safety considerations for using stand-alone AI tools like dedicated chatbots and transcription apps.
Next up: AI enhancements to existing tools.
It is becoming increasingly common to find AI features integrated directly into the platforms and software used by journalists and newsrooms. Examples of this trend include Gemini sprinkled into Google Workspace or generative AI features popping up across Adobe’s Creative Cloud suite.
Whether you are using these tools to help draft a story outline or to edit photos and videos, such enhancements can offer convenience. They also complicate the privacy picture when compared with stand-alone AI tools.
Assuming that you are comfortable with the service provider (e.g., Google or Adobe) having access to your data in the first place, the risk that these integrations present is that more data can be exposed to the underlying AI model(s) with less direct user awareness.
For instance, while submitting a search query to Claude is a direct action that requires active effort to share data, the AI enhancements built into existing tools often offer more subtle ways (like the click of a button) to expose wider swathes of data (like the text of an entire Google Doc) to AI analysis.
Given this increased exposure, the key question becomes whether all this additional data (e.g., the Google Doc being summarized by Gemini within Google Drive) is used to train the underlying AI model.
Why care about whether your data is used to train the AI? While the risk is relatively rare, AI systems can accidentally expose (or be tricked by an attacker into exposing) content — including prompts and chat histories — that’s been previously used to train the system. When it comes to highly sensitive data like notes from an off-the-record conversation or source-identifying details in an initial draft, that risk of future exposure is probably not one worth taking.
Unfortunately, there is no uniform answer as to whether platforms are using data to train their underlying AI models, and the devil often lies in the details of convoluted privacy policies, which are subject to change.
Take Gemini in Google Apps as one example. Many journalists use both paid Google Workspace (through official accounts) and free Google Apps via their personal accounts — often interchangeably — without realizing they are subject to different privacy policies. It appears that data exposed to Gemini in paid Workspace plans is not used to train Google’s AI models. This changes, however, for personal Google accounts.
According to Google’s documentation, “if you’re a Workspace user with a personal account and you choose to share data, including your Workspace data, with Gemini Apps or Search services… or through screen actions (including screenshots), this data… may be used for model training and improvement.” If this seems unclear and contradictory to you, you’re not alone!
Google’s AI privacy policy is far from the only one to cause confusion. Adobe sparked significant backlash a couple of years ago when an update to its terms of service led users to believe that the company was using customer projects to train its internal AI. To its credit, Adobe clarified the changes and has since emphasized that it does not train its Firefly generative AI service on any Creative Cloud subscribers’ personal content.
While Adobe’s approach to keeping customers’ personal content out of any AI training data ended up in a solid place, it also highlights how companies’ privacy policies and terms of service are rarely static. So even in the best of circumstances, it’s important to keep an eye out for notifications about any policy changes.
As a general rule, if you’re concerned about certain data being used to train AI models for the reasons described above, then it’s best to keep it out of any services whose AI features are difficult to disable or avoid, and whose privacy policies don’t clearly state that such data is not being used to train their AI.
To read about additional risks and mitigations in other popular AI tools, take a look at the other pieces in our series on stand-alone AI tools, and on AI agents and operating system integrations.