Metadata 101: Understanding the basics of media metadata
Anastasia Kolobrodova |
March 26, 2024
I have a Polaroid photo on my bookshelf. In it, a friend and I are standing in the sun in a slightly blurry courtyard. There’s no way for anyone but the two of us to recognize the context at a glance, but I know that the photo was taken in the afternoon in the center of Prague on a specific day in 2018. It’s the only copy that exists, and if I gave it to a journalist I’m sure they could track down the who/what/where/when/whys, but it would take some legwork.
I have another photo from later that month on my phone. It’s a close-up of my friend’s dog moping under a chair. The difference is that if I texted or emailed it to you right now, you would immediately be able to determine exactly when and where it was taken – as well as what kind of phone I had at the time.
This is because of the photo’s .
Threats to press freedom around the world are at an all-time high. Sign up to stay up to date and take action to protect journalists and whistleblowers everywhere.
Thanks for signing up for our newsletter. You are not yet subscribed! Please check your email for a message asking you to confirm your subscription.
Metadata is data about data. In short, this is anything that can be found in the file (timestamp, location info, camera type, and more) that isn’t the actual content (dog under a chair).
This type of information is attached by default to all photos taken by a digital device. While inadvertently sharing this type of information can present a safety risk for many of us, for journalists this becomes a serious matter of source protection. In 2012, Vice inadvertently revealed the location of John McAfee – who had been on the run from murder charges – by publishing a photo with its associated location metadata, allowing authorities to find him in Guatemala.
Luckily, it’s possible to easily remove metadata from your photographs:
This isn’t the only type of basic metadata that you need to know about. Your Google Docs, Microsoft Word files, and PDFs are happy to reveal where they originated, who worked on them, and when.
An example of metadata from a Microsoft Word document. This is accessible by going to the File -> Properties menu path on any document. Image credit: Anastasia Kolobrodova (CC BY 4.0)
While you can screenshot each page of these to strip that metadata, there’s an easier solution for larger files. Freedom of the Press Foundation (FPF) maintains a tool called Dangerzone, which not only will remove most metadata from your documents but also will make any potentially dangerous files safe.
One thing to keep in mind about files, however, is that metadata isn’t always going to be digital. Printed or scanned documents can have printer dots or other nearly undetectable identifiers that can reveal the origin of the document – or even the source.
One high-profile example of this was the Reality Leigh Winner case in 2017. Winner leaked documents to The Intercept, which then showed the original printed document to the National Security Agency. A crease in that original printed document helped the NSA identify Winner as the source and led to her arrest. To keep yourself and your sources protected in particularly sensitive cases, the most responsible move for news organizations is to fully recreate the file in question before publishing it or showing it to another person.
While this article goes into the basics of document metadata, there’s even more to dive deep on. Want to learn more? Check out our more comprehensive guide: Everything you wanted to know about media metadata, but were afraid to ask. Our digital security training team is also always ready to support news organizations who want assistance learning how to handle sensitive files. If you are a journalist in need, reach out to us!