New ruling gives legal boost to a key data journalism tool
Parker Higgins
September 20, 2019
A recent federal appeals court ruling may be a big win for data journalists and researchers who depend on scraping — the automated collection of data from websites — to collect information on which they report.
The case, involving the professional social networking giant LinkedIn and the data startup hiQ Labs, has been widely discussed in legal circles and the security community, but may be just as significant for journalists.
Threats to press freedom around the world are at an all-time high. Sign up to stay up to date and take action to protect journalists and whistleblowers everywhere.
Thanks for signing up for our newsletter. You are not yet subscribed! Please check your email for a message asking you to confirm your subscription.
The legal controversy centers on a database of information that hiQ had scraped from the portion of LinkedIn profiles that were set to be publicly accessible on the web. LinkedIn sought to stop hiQ from accessing that data, sending a letter threatening to sue under the Computer Fraud and Abuse Act and a handful of other grounds. hiQ then challenged that letter in court. In response, the Ninth Circuit in San Francisco ruled last week that the CFAA likely does not prohibit the scraping of public web pages.
Strictly speaking, the court has not yet definitively ruled on the issues at the heart of hiQ Labs v. LinkedIn. Rather, it said hiQ is likely enough to win on the merits of its argument — including the CFAA question — that LinkedIn must allow the startup to continue scraping while the case continues. It's a bit of a confusing procedural posture, but some takeaways are clear and important.
While CFAA rulings — in this circuit and elsewhere — have become a bit of a contradictory thicket, here the Ninth digs in specifically to the statutory language on authorization, or who can access what. Namely, the CFAA prohibits "exceed[ing] authorized access" to a computer.
The ruling lays out a neat taxonomy of computer information, dividing it into three parts: information for which access is open to the general public and permission is not required; information for which authorization is required and has been given; and information for which authorization is required but has not been given.
Last week’s ruling specifically covers LinkedIn’s publicly available data, which it correctly describes as falling into the first category — one that also includes the vast swath of information available on the public web. LinkedIn had argued that, by sending a cease-and-desist letter, it revoked hiQ’s “authorization” to use the site. The court dispensed with that idea: information that is presumptively available to all requires no special authorization to access, and so there’s no authorization to revoke.
Many journalistic endeavors that involve scraping fall precisely into that first category, which is why the ruling is significant.
Journalists may automate visits to an Inspector General’s web page, to be alerted when there are newly published reports. They may write a script to download all the previous meeting agendas of a community board committee at once, to analyze how often a topic has been discussed. They may back up the online marketing materials for a business they’re reporting on, to monitor whether it quietly makes changes after an expose is published.
For journalists and researchers, web scraping — and other mechanisms of automating computer usage — can be an invaluable source of raw data, but has occasionally hit legal friction, especially around the CFAA.
While these examples don’t deal exclusively with the sort of entirely public web content addressed in the hiQ ruling, they demonstrate both the power of scraping as a tool, and the peril of the CFAA as a threat.
By taking the common-sense position that these activities are “not analogous to ‘breaking-and-entering,’” last week’s ruling provides legal cover for the myriad journalistic uses of public web scraping against the dark cloud of the CFAA.