How secure are journalists’ favorite transcription tools?

Martin Shelton

Principal Researcher

Yael Grauer headshot

Investigative Tech Reporter

Photo by Randarey. CC-BY-2.0

Journalistic work often depends on transcription services for creating written logs of recorded audio, from assisting in research to captioning videos to publication of interviews. But uploading audio to a transcription service means giving a copy of that — sometimes sensitive — recording over to a company. While there is no single service that meets all of our data privacy needs, here we unpack security and privacy practices for popular transcription services, weigh when journalists should use remote transcription services and explore how to minimize risk when working with sensitive audio.

In an informal poll we heard from more than 50 journalists about their favorite transcription services and what they liked about each. Given the necessity of transcription, we emailed and researched the top 5 most referenced services — Descript, Otter.ai, Rev, Temi and Trint — and asked: How safe are these services and what are they doing to protect your data and your private recordings and transcripts?

We wanted to better understand:

  • Does the platform support two-factor authentication to help you protect your data from password breaches;
  • Does the service have the ability to access your audio and/or transcripts;
  • Under what circumstances would an employee have access to your audio and/or transcripts;
  • Are there any third parties who also get access to your transcripts and audio;
  • Does the platform offer a transparency report to disclose whether or not they have received or complied with law enforcement requests for user data;
  • How does the company use encryption to protect this data?

Many of the security features of popular transcription services are nearly identical. They all use standard TLS encryption to protect your traffic to and from the website, as well as AES-256 encryption to store data safely on their Amazon Web Services servers. That means that your transcript is less likely to be compromised by an attacker outside of the organization. However, it also means that the company itself has the technical ability to access the audio you've uploaded. Likewise, to provide these services the companies also have the technical ability to read the transcripts the company created and stored, and they each have a different policy surrounding when it’s necessary to do so.

Even if you’re OK with people inside the company, or those who are contracting for the company, accessing your data, you may feel differently about third party requests for data. Unfortunately none of the five services we looked at offer a transparency report, so there’s no way to know how frequently they receive or disclose user data responsive to law enforcement requests. We would love to see all major transcription service providers join the growing list of organizations that provide a transparency report so we can include that information when weighing privacy concerns.

Descript

Descript says in its security documentation that while it has the technical capability, it commits to not look at users’ data except in certain circumstances, such as when processing specific computer-generated voices or when a user has requested a review from a customer service representative. Users can also opt in to share information to help improve the service.

Behind the scenes, Descript is using a small handful of services to process transcripts. Descript uses Google Cloud Speech-to-Text to provide automatic transcription. Google deletes your data from its servers after the transcription is completed. According to its documentation, Descript also uses Rev to provide automatic or human transcription. Descript says, “If you request a White Glove transcription, we will share your audio files with Rev, which has strict confidentiality agreements with all of its employees.” (See the Rev section for our notes on that agreement and its caveats.)

Descript offers a powerful feature called Overdub, which allows users to insert realistic computer-generated voices into the transcript. To accomplish this, Descript uses Google Cloud to process and reproduce your voice. Descript will generate “non-defamatory” samples of your voice, and human reviewers on Amazon’s Mechanical Turk will listen to this sample audio to confirm it sounds right. Descript says their employees may also review the uploaded audio, as well as computer-generated output audio for quality assurance.

Descript does not offer two-factor authentication.

Otter.ai

Like the other transcription services outlined here, Otter.ai encrypts its data on Amazon Web Services servers and holds the encryption keys. Its privacy policy suggests it uses this access to provide the service and to train its transcription artificial intelligence with collective “de-identified” audio recordings. Otter.ai’s privacy policy says, “Only with your explicit permission will we manually review certain audio recordings to further refine our model training data.”

Otter.ai tells Freedom of the Press Foundation in an email, “We do not sell or share your data with third parties, nor access your data without your explicit permission. You also have full control to delete your conversations. Deleting a conversation permanently deletes it from Otter's servers, and can't be undone.”

The company’s security white paper says only two administrators have access to its database “as required by their job function.” We asked Otter.ai for further clarification about job functions that could merit access to user data, but have not heard back as of publication.

Otter.ai’s security white paper suggests it does not rely on third party services to process audio or transcripts, only to store user data.

As of May 2022, Otter.ai enabled two-factor authentication for all accounts.

Rev

Rev offers both automatic and human transcription. According to Rev’s security documentation, employees are “restricted to handle data required to perform their job. Our staff is trained on proper use of our systems and best practices for security & privacy.”

However, the circumstances under which employees may access user data are unclear. We reached out to Rev for more details, but did not receive a response as of publication.

Rev’s security documentation suggests it does not rely on third parties to automate transcription, but instead relies on its more than 60,000 freelance manual transcriptionists known as Revvers.

Rev requires strict confidentiality agreements, and following a 2019 report by OneZero, Rev now prevents transcriptionists from downloading customer audio.

Alongside Otter.ai, Rev is the only service of this group that offers two-factor authentication to all users.

Temi

Temi is an audio-to-text transcription service that uses advanced speech recognition software. Temi is operated by Rev, which is why the two services have virtually identical privacy policies and similar security properties. Temi does not appear to use any third party processing services. Unlike Rev, Temi does not offer human transcriptions and says on its website that, “Files are transcribed by machines and are never seen by a human.”

Temi does not offer two-factor authentication.

Trint

Trint is an AI-based transcription service for both audio and video files, popular with videographers because it integrates with the video editor Adobe Premiere Pro, as well as some other features.

Trint’s documentation is clear about its security measures. It does have the ability to decrypt users’ transcription and audio data, though it affirmatively commits to not doing so except in unusual cases, and only with written consent from a client.

Trint’s platform privacy policy says that it relies on MongoDB Atlas. While MongoDB’s employees can technically access data uploaded to Trint, the cloud database service has policies and controls to constrain such access to a small group of engineers, “only to ensure the quality of the service.”

Trint also uses a service called Transloadit, which helps upload and process files. Transloadit commits to store processed files for 24 hours before purging them, adding, “Transloadit employees only look at your files to troubleshoot problems. This rarely happens, and when it does, we do so with the understanding that anything we see is to be kept strictly confidential.”

Trint does not offer two-factor authentication.

Threat modeling: When is a remote transcription service appropriate?

We recommend avoiding transcription altogether if your audio, in the wrong hands, could put people at risk. However, there are some situations where a transcription service is a necessary or easier choice, like when transcribing an interview that will be published in full. Because these services usually have access to your audio and transcripts, journalists must still make subjective decisions about when to share files with a transcription service.

To think about before uploading:

  • Do you intend to publish this transcript somewhere public?
  • How sensitive is this audio/transcript?
  • Have you committed to keeping this audio/transcript confidential?
  • How comfortable are you with a human transcriptionist listening to the audio, as opposed to an automated speech recognition tool?
  • Some transcription services based on automated speech recognition (e.g., Trint) commit to not look at your transcriptions. Are these reasonable enough assurances for this particular audio/transcript?
  • Similarly, many services (e.g., Rev) that offer human transcriptionists also require those transcriptionists to sign confidentiality and/or non-disclosure agreements. Are these reasonable enough assurances for this particular audio/transcript?
  • Many transcription services depend on third parties to help process the transcript. Are the third parties who have access to the audio acceptable for this particular file?
  • Does your account with the service store any particularly sensitive audio or transcripts? What assurances do you have from the platform to keep your account secure?

If you are working with sensitive material — recordings that could put someone at risk if they were made public or turned over to authorities — we suggest severely limiting who has access to that data. If you are working with an editor or on a deadline and your recordings are particularly sensitive, consider pushing for assistance in hand transcription or later deadlines due to the security of the recordings.

If you are working with material that’s sensitive until it’s published — embargoed research, for example — the risk of leaks is low and the outcome would be harmful, but not catastrophic. In this instance, using a service makes more sense, though you’d still want to use one with good privacy and security protections.

Consider using AI instead of human transcription. Although files can still be leaked through AI transcription services, you might feel more comfortable knowing a human doesn’t need to listen in.

Minimizing risk with transcription services

Transcription services can be an important part of the journalistic workflow, but the services available today may introduce risk.

Take a few steps to use transcription services safely:

  • Most popular transcription services do not offer two-factor authentication, meaning someone could log into the account with just a password and download your transcripts. If you or anyone in your team reuses passwords, someone only needs to breach one of those passwords to reuse the password across multiple websites. That’s why we always recommend using long, unique passwords, ideally randomized with a password manager. These tools are easy to use, and some are free, like through the 1Password for Journalism program.
  • If you have access to a service that does use two-factor authentication, like Rev or Otter.ai, it’s time to turn it on. If your team has a shared organizational subscription to a transcription service, everyone on the team can still access the two-factor authentication codes. Read more about how to use Authy or a password manager such as 1Password to share these codes across all of your team’s devices.
  • After downloading transcription files safely to your device, consider deleting them from the transcription service so that they cannot be accessed if your account is ever breached. For those working within a news organization, consider reaching out to your general counsel to learn if you have any policies in place for retention of sensitive transcription data. Freelancers should consider a consistent personal retention policy for recordings and transcripts. You may also want to check with your editor and look for any retention language in your contract.

Can I automate transcriptions privately on my own device?

There are some alternative transcription tools that avoid sharing a transcript with a third party, but none are perfect and the quality can be inconsistent. Gentle can run on your own computer, and there are a number of “offline-friendly” transcription apps if you have a compatible device. Google Recorder only works on Google Pixel phones, while Ada Dictation only works on Apple devices. These are just a few examples; there are thousands of these tools out there, each with their own tradeoffs.

Whatever service you choose, glance through its security documentation to make sure there aren’t any unwanted surprises. In particular, look for policies surrounding the circumstances under which employees may access data and how the organization stores its data. If possible, opt for services that enable two-factor authentication. You can also reach out to our digital security team if you need help.

Donate to support press freedom

Your support is more important than ever.