When you are reading the news, it is reading you back. According to new research out of the University of Pennsylvania visiting news websites exposes you to more than twice as much tracking software as the rest of the web.
Researchers Tim Libert and Victor Pickard used open-source software to analyze Alexa’s top 100,000 websites, 2,000 of which were news-related. "A visitor to The New York Times' homepage is potentially connected to a whopping 44 third-party servers," the researchers report, "while visitors to the Los Angeles Times' website get their browsing history leaked to 32 external servers."
Based on my own analysis of news websites those numbers actually seem low. I've regularly seen between 60 and 80 trackers on news websites. Just over a year ago I wrote about the need for the news industry to have a real debate about reader privacy. Over the past year we have actually seen some movement on this front, and this month market research group Forrester Research predicted that 2016 could be a tipping point for online privacy, with more people than ever demanding greater protections from the apps and services they use.
The Ethics of Reader Privacy
This isn't just a business issue, it is an ethical issue about how we relate to the communities we serve. And for readers, it’s much more than just an issue of agreeing to view ads, knowing that ads allow them to view free content. Libert and Pickard agree, writing that publishers have to "consider the ethics of tracking users and their outsize role in widely reviled annoyances such as increasing page load times, invading privacy, sucking up data on limited plans and imposing distracting animations and sounds on the viewer."
Recent events remind us that addressing reader privacy isn't just about the data collection, but also other kinds of risks that trackers expose users to. Just last week The Economist notified their readers that a prominent ad-blocker blocker called Pagefair had exposed them to malware. The Economist was one of about 500 publishers affected by the breach. In February, hackersused ads on Forbes' website to distribute malware.
Journalists ascribe incredible importance to protecting sources, and rightfully so. But today we have to prioritize protecting our audiences too.
In a July post at the Columbia Journalism Review, the executive director of the Committee to Protect Journalists, Joel Simon, argued that "news organizations don’t worry enough about [...] keeping the identity of their readers secret." He continues:
In an era when electronic spycraft is rampant, people who go to a website looking for news can unwittingly endanger themselves just by clicking on a story or video. Governments that know who is accessing specific information can intrude in a variety of ways—by blocking or censoring the story or by targeting individuals who access prohibited information for harassment or even legal action.
Simon's piece harkens back to debates in the last decade over the Patriot Act's "library provision," which was fought fiercely by librarians across the country. Librarians have long fought for the privacy of their users. The American Library Association even has a formal statement on the "Freedom to Read." It is time for journalists to follow their lead.
Transparency and Informed Consent
If you want to know what kind of data news websites are collecting about your browsing, you have to install special software similar to what the University of Pennsylvania researchers used. For a long time there has been little or no transparency about these systems, or why newsrooms have adopted them. However, that might be beginning to change.
This week The Intercept published a blog post about the new analytics software which they'll be incorporating into their website in the coming weeks. "We thought it was important to describe this system — and its privacy implications and safeguards — to you in a transparent fashion," wrote Ryan Tate and Betsy Reed, the site's deputy editor and editor-in-chief respectively. The statement is so unique, and refreshing, it is worth quoting at length:
The biggest challenge we faced in adopting a new audience measurement system was preserving reader privacy; modern analytics tools virtually always come from outside vendors who become intimate third parties in the relationship between publishers and readers. It was important to us to try and rebalance this relationship in favor of the reader. Since launching a little over a year and a half ago, The Intercept has always coupled its drive to expose information closely held by the powerful with efforts to protect data that rightfully belongs to our readers. That’s why we serve all our content over well-encrypted “HTTPS” web connections and why in April we became only the third internet service, behind Facebook and Blockchain.info, to allow people to contact us over HTTPS-encrypted connections to the anonymity network Tor.
To address these concerns The Intercept worked with analytics firm Parse.ly to customize a solution that strips away identifying information like reader's IP address, does not store geolocation data of visitors to the site and avoids cookies that could track readers across other news sites that also use Parse.ly analytics. Together, The Intercept and Parse.ly are illustrating how publishers can stand up for their users and chart a more balanced approach.
The Intercept was an early leader in protecting their readers by using HTTPS web connections—,which gives readers protection from having their reading habits spied on— but we are beginning to see a shift in this industry wide. At the end of last year three staff at the New York Times called on journalists to embrace HTTPS. "If you run a news site, or any site at all, we’d like to issue a friendly challenge to you" wrote Eitan Konigsburg, Rajiv Pant and Elena Kvochko. "Make a commitment to have your site fully on HTTPS by the end of 2015." In June of this year the Washington Post reported that it would begin encrypting parts of its site with HTTPS, "making it more difficult for hackers, government agencies and others to track the reading habits of people who visit the site." The Marshall Project, ProPublica and TechDirt also encrypt their sites.
However few other news organizations seem to have made the move, despite the fact that HTTPS is standard for most large tech companies, like Google and Facebook. It’s unclear if the New York Times, with six weeks left in 2015, will keep their pledge either.
Who Controls the Data About Our Readers?
I know this is an issue a lot of individual journalists care about. Last week I got to hear journalist Quinn Norton talk about how conflicted she felt having built her a career reporting on surveillance and security while publishing on platforms that were violating her readers' privacy and trust at every turn. She described the business model of most major news organizations as invasive surveillance without informed consent. At a bare minimum, she said, news websites "should be seeking meaningful consent from our users for the data we are collecting."
The problem is, for the most part, news organizations themselves aren't collecting that much data. Most of the data collection that happens on news websites happens by third parties. News organizations aren't so much collecting data as being vehicles for data collection. And in many ways that is worse, because news organizations have acknowledged how valuable data about their readers is, but then gave most of that value away. In so doing they have also largely abdicated control over, and responsibility for, how that data is used.
Jeff Jarvis, director of the CUNY Tow Knight Center for Entrepreneurial Journalism, argues that newsrooms have to seek another path. Right now, Jarvis says, news organizations "creep out their own customers by collecting data on them without being open about it, without revealing the reason and the benefits (free content! less noise!), without giving them any control over the data." Instead of giving data, and the value it represents, away to advertisers news organizations should be using data to learn how to better serve their readers. If news organizations were collecting data themselves they could use it to build stronger relationships with readers and build trust by giving readers more control over how that data is used.
The debate over ad-blockers has brought reader privacy and control back to the fore and highlighted the risks of not addressing this issue more openly with readers. In the absence of transparency and engagement from publishers, people are turning to ad-blockers to regain some amount of control. I understand the frustration and anger media companies feel over ad-blockers, but the answer is not to attack the people who use them. Instead of attacking the technology journalists and publishers should engage the people and address the deeper issues at play.
Disclosures: One of the University of Pennsylvania researchers, Victor Pickard, is a friend and we have coauthored research on media and technology policy together in the past. This summer I spoke about some of these issues at a forum on the economics of digital news organized by Parse.ly. I was not paid for that talk, nor do I have any formal affiliation with the company.
Josh Stearns is a journalist, community builder and civic strategist. He directs the journalism sustainability project at the Geraldine R. Dodge Foundation. He was a founding board member of the Freedom of the Press Foundation and was previously the press freedom director at Free Press.