CCPA and the Need for Data Anonymization


For many businesses, the collection and processing of customers’ personal data is part of their core business activities. However, the type and degree of processing can vary greatly. Any organization that sells a physical product likely processes payment card and address information as part of their sales process, which is a relatively harmless and widely accepted type of processing. And then, there are organizations make their money off of data processing that is unlikely to be acceptable to the people to whom that data belongs.

As data protection regulations continue to be passed and to come into effect, this second type of data processing will become more and more difficult to do. These regulations have strict protections in place for data that can be used to uniquely identify an individual or household. However, these rules only apply in the case where data is identifiable. In order to support their modern business practices as the regulatory landscape grows more complex, many organizations will need to adopt strong data anonymization solutions.

Introduction to the CCPA

The California Consumer Privacy Act (CCPA) is California’s new data protection regulation. While it was signed into law in June 2018 (and has been heavily revised and updated since), it only went into effect on January 1, 2020.

The CCPA is California’s answer to the EU’s General Data Protection Regulation (GDPR). The GDPR dramatically increased the protections provided to consumers regarding their personal data that is collected and used by companies. Part of the GDPR regulation was a requirement that any organization that wished to collect, store, and process the data of EU citizens abide by equivalent rules, either by operating in a country with a similar data protection law in place or by adopting GDPR-compliant policies within the organization. The CCPA is one of many new laws that have been put into place in the aftermath of the GDPR.

While many of these laws have the same intention, they accomplish their shared goals in very different ways. Regulations like the CCPA and GDPR have some of the same provisions but also have other requirements that the other does not share. For example, the GDPR provides protection of personal data only at the level of the individual, while the CCPA applies to any data that can be used to identify an individual or a specific household.

Most of these data protection regulations, including the GDPR and CCPA, limit an organization’s ability to process the data of its customers. Every use of consumer data, no matter how benign, require explicit permissions from the data subject (the customer). Since many common uses of consumer data are not approved by the data subjects, as demonstrated by the outrage over the numerous Facebook scandals, businesses wishing to use customer data for business purposes are placed in a tough position by the new data protection laws.

Data Protection and Anonymization

The limitations placed upon the use of consumers’ personal data can make it difficult for many modern organizations to carry out their core business practices. For some organizations that offer “free” services, like social media sites, their revenue model is based upon the sale of targeted advertising to their users, which requires data collection analysis for the targeting. Other organizations, like Google, are actively developing and training machine learning (ML) and artificial intelligence (AI) algorithms that require realistic data for training and testing purposes.

Data protection regulations limit the use of consumer data and require explicit user consent for data collection and processing. As a result, businesses must take advantage of the fact that these regulations only protect data that can be used to uniquely identify an individual or a household. If data is sufficiently anonymized, then the data protection regulations do not apply.

However, organizations wishing to use data anonymization to enable them to process customer data must walk a fine line. If data is too anonymized, it is essentially random and of little or no value for common business applications. On the other hand, data that is not sufficiently anonymized can be de-anonymized, putting the user in a state of regulatory non-compliance.

Research has demonstrated that, for many organizations, the data anonymization processes that they are using are woefully inadequate. Using only 15 features, researchers from the Imperial College London were able to uniquely identify 99.98% of Americans in a dataset anonymized using commonly-used techniques. Since most datasets contain significantly more than 15 features (and additional features simplify the process of de-anonymization), the anonymization techniques used in the study are clearly inadequate.

As additional data protection laws, like the CCPA, come into effect, the need for businesses to perform effective data anonymization will only grow. The GDPR, which went into effect in May 2018, is still in early stages as regulatory organizations work to scale to manage the volume of reports that they receive. (GDPR regulators received 160,000 data breach reports by the end of 2019.) As these regulatory agencies and those governing the enforcement of other data privacy laws mature, failing to properly anonymize customers’ personal data will be more easily detected and more strongly penalized.

Balancing Compliance and Business Needs

As regulations like the GDPR and CCPA come into effect, consumer data is more strongly protected. This can create issues for a number of businesses, who require the ability to process customer data and sell their insights in order to make money. As regulators start taking advantage of their newfound powers, these businesses will need access to stronger data anonymization solutions than those currently in-place.