Guest Column: ‘Big Data’ can change the world

The following editorial appeared in the Los Angeles Times on Monday, Nov. 19:

The Israeli company Seculert offers a service that identifies malware-infected computers without requiring its customers to install any new equipment or software. According to Aviv Raff, the company’s CTO and co-founder, Seculert deliberately exposes its own computers to malware in order to become part of the chain of virus-infected computers that cyber criminals are assembling around the Internet. By analyzing the communications on these “botnets,” Seculert can identify computers on its customers’ networks that have been infected. Then it goes one significant step further. Using the profiles it develops of malware, it sorts through the staggering amount of data logged by its customers’ computers to find other machines on their networks that have been infected by other botnets.

Seculert is one of the many new tech companies capitalizing on what the industry calls “Big Data,” the trove of information being captured by websites, servers, mobile phone networks and the growing number of electronic sensors around the planet. Like high-tech dumpster divers, data scientists search through an unstructured mess of unrelated data feeds for insights about business opportunities and risks. It can be a purely commercial exercise, such as combing through tweets and Facebook posts to measure how consumers feel about a brand. Or it can serve a higher purpose: Regional records of over-the-counter drug sales can help identify flu outbreaks as they’re happening. A hospital’s accumulated electrocardiogram test records can help predict which heart attack patients will have another attack within two years. The records of mobile phone movements can track the likely spread of cholera in a developing country after a natural disaster.

Big Data analysis has the potential to upend how businesses and nonprofits find and serve customers the way “Moneyball” techniques challenged Major League Baseball’s traditional notions of how to evaluate talent. But the pursuit of Big Data has also led some privacy advocates and regulators to worry about the collection and retention of sensitive personal information, as well as the potential use of the new tools to discriminate against individual consumers — for example, by targeting clusters of customers for onerous credit terms. These concerns should be addressed, with one caveat: Regulators need to take care not to treat the analysis of anonymized information as if it were an ominous new form of surveillance.

Privacy advocates’ main focus of late has been the threat posed by marketers and “data brokers” who monitor people’s browsing habits and disclosures on social networks. The Big Data phenomenon raises different issues. It’s premised on the notion that information collected routinely in the course of providing one set of services can reveal something useful about another set of services. Seculert, for example, gleans important computer security information from seemingly innocuous entries in a computer network’s voluminous digital traffic records. So some of the principles of “fair information practices” that privacy advocates swear by — for example, that sites shouldn’t collect more information than needed to provide their services, and that people should be able to correct erroneous data about themselves — don’t make sense in this context. With Big Data, the information that one site discards can power another company’s business. And the occasional errors in the data don’t really matter because they can be factored out in the analysis.

Nevertheless, regulators should apply the same basic fairness test to Big Data applications that they do to other uses of online data. That starts with ensuring that supposedly anonymous information really can’t be traced back to the individuals, computers or smartphones from which it was collected. And as Commissioner Julie Brill of the Federal Trade Commission noted earlier this year, analyses that involve particularly sensitive data, such as financial or health information, shouldn’t circumvent existing federal restrictions and notification requirements. Granted, mapping those laws to Big Data practices can be tricky, and even truly anonymized data could conceivably be used for discriminatory purposes. Yet regulators should bear in mind how much there is to be gained from Big Data as connected devices proliferate and the amount of data multiplies, like an ever-growing mountain of books just waiting to be alphabetized.