Hedge funds are watching a key lawsuit involving LinkedIn to see if they can spend billions on web-scraped data

Hedge funds are watching a key lawsuit involving LinkedIn to see if they can spend billions on web-scraped data

are expected to spend nearly $2 billion on web scraping alone in 2020, a sliver of the overall money that is pouring into the exploding alternative-data scene.

But the pedal-to-the-metal approach of scaling up and building out web-scraping units by hedge funds may be for naught as the courts try to create a framework for what is allowed on the web.

In August 2017, a judge in the Northern District of California ruled that public LinkedIn pages could be scraped by hiQ, a company that used the data trawled by its bots to inform employers about their employees’ web activity, even though LinkedIn’s terms of use forbade the use of any web-scraping bot.

The case, which is in the appeals process in the 9th District, is being watched by lawyers and funds closely to determine what the future will be for an increasingly important part of hedge funds’ investment process.

See more:Bloomberg is diving in to the booming alternative-data field with a new product that will help the market become mainstream

With the judge ruling against LinkedIn, funds are still ramping up their web scraping for now, according to Peter Greene, the vice chair of the investment-management group at the law firm Lowenstein Sandler.

“I don’t think it is scaring folks from using it,” Greene said of LinkedIn’s call to stop scraping. The most significant change has come from hedge funds’ compliance departments that are tracking litigation related to web scraping and are generally more knowledgeable and sophisticated around digital data collection, he said.

A backlash has mounted over perceived invasions of privacy

But as a backlash has mounted over perceived invasions of privacy by tech companies like Facebook and Google, hedge funds need to be prepared to defend and possibly alter their data strategies if there is a sudden pullback on what is legally allowed to be scraped, lawyers say.

“The key is determining what does it mean to be public on the internet,” Greene said.

Sign up here for our weekly newsletter Wall Street Insider, a behind-the-scenes look at the stories dominating banking, business, and big deals.

Just a couple of years ago, he said, it was only the biggest firms that understood the ins and outs of the litigation around the space, but now it’s “all sizes of managers.”

And beyond the legal risk, the headline risk that comes with collecting and using large swatches of online data needs to be top of mind as well, said Stacey Brandenburg, a lawyer with ZwillGen, in a presentation at the data company Quandl’s annual conference last month.

Web scraping in particular is an area where there are motivated third parties — the websites — that are unhappy with the practice.

“When you’re developing a web-scraping program, you want to think through and monitor carefully what your web-scrapes look like and how they are being responded to by a site so you could be on notice to the point where it is unequivocally clear that a site has revoked your authorization and doesn’t want you to be there, because the next step is to send a cease-and-desist and potentially to sue you,” Brandenburg said.

Fund managers are getting overwhelmed

A pullback on the amount of data that can be scraped could be a good thing for managers that are being overwhelmed by the amount of information coming in, said Fidelity’s head of artificial intelligence and advanced data, John Avery, at an industry conference earlier this year.

“If anything, I think folks are scraping more than they need,” said Evan Reich, a data strategist at the $20 billion BlueMountain Capital Management. He warned against overreliance on web-scraping data that wasn’t properly filtered.

See more:Hedge funds are spending billions to get an edge through access to satellite images and credit-card transactions. Now they fear a crackdown is coming.

If you do accidentally pull in information that has data that a hedge fund can’t legally use — like personally identifiable credit-card info — and build models using it, “then it may not be able to be purged,” Reich said.

“Ideally you never want to have, 10 years down the road, someone says purge my data, and you’d rather it not be something that’s impossible, or extremely difficult, to purge,” Reich said.

“No data set is so good that it is worth betting the firm on.”

This is a subscriber-only story. To read the full article, simply click here to claim your deal and get access to all exclusive Business Insider PRIME content.


Hedge Funds
BI Prime
alternative data

Read More


Please enter your comment!
Please enter your name here