Automatically classify sensitive data

Tag data according to its type, sensitivity, and value to the organization

Understand the value of your data, determine whether the data is at risk, and implement controls to mitigate risks

Approaches to data classification

While the importance of data classification is increasingly understood, it can be a daunting task.

We propose a top-down approach - identifying high-risk systems and processes and addressing classification for these first, before adding additional systems and processes. Over time, this will allow you to classify all sensitive data.

Our approach, and technology partnerships, combine three primary approaches to data classification

  1. Classification based on context. This method takes advantage of an item’s metadata, like the system that the application supports or the attribute name, or through tracing the lineage of sensitive data.
  2. Classification based on user knowledge. This leverages the knowledge of your subject matter experts to add a classification to the attributes that they create or work with regularly
  3. Classification based on content. This method implies reviewing items and documents themselves and the classification is applied based on their contents.

Leverage Technology to simplify Classification at scale

Govern, Protect and Monitor your Personal Data

Automate Data Discovery and Classification

Okera’s crawler automatically identifies and tags sensitive data. The platform comes with a set of out-of-the-box, ML-driven rules to classify sensitive values (such as email addresses), but users can also create their own custom rules.

Register Data Across Multiple Sources for Full Visibility

Modern data platforms contain data from many sources and in many formats. Okera’s automated schema registration makes life easier for data producers to onboard datasetsdata classification and expedites data discovery for analysts and data scientists across the enterprise.

Leverage Business Context for Enforcement

The Okera platform can leverage the rich business context and tags from an enterprise data catalogue in order to define data access control policies and enforce them at scale. Additionally, Okera’s technical metadata registry will give the governance team better insight into what data they actually have in their platform.

Protect Sensitive Data and Stay Ahead of Regulatory Audits

In order to create and manage policies for secure data access, it is critical to know where the sensitive data exists in the data lake. Having all of the technical metadata and tags registered in Okera allows for better monitoring, auditing, and reporting on sensitive data.

Classify Personal Information to Protect Data Privacy

Data privacy is the right of a citizen to have control over how personal information is collected and used. Data protection is a subset of privacy. This is because protecting user data and sensitive information is the first step to keeping user data private.

South Africa's data privacy regulation, the Protection of Personal Information Act (PoPIA) extends the definition of a citizen to include any juristic person. This means that our data privacy bill protects against the abuse of sensitive data related to individuals (like customers and employees) and companies (suppliers and partners).

Ensuring data privacy requires sound data management practises

Whilst legal compliance is of course essential, the bulk of the effort of ensuring compliance is built around ensuring sound data management practices.

Data privacy is built upon a foundation of accountability.

In larger organisations, accountability means defining and sharing clear policies for the use of personal data, and ensuring that these policies are followed throughout the organisation. Polices can be linked to specific business processes and systems, to link individual actions to the underlying data and ensure that privacy is not infringed. A governed data catalogue can be an invaluable tool for tracking and sharing (and to external auditors and regulators)  the details of your policies within the organisation, and for putting these into the context of actual data use.

Locate and document personal data

In order to protect personal data, we need to understand where it is captured and stored, and for what purpose.

This can be a very large challenge, particularly for larger organisations. The use of automated scanners to locate PII and other sensitive data can be extremely helpful to then complete Impact Assessments, Risk Assessments, Data Cataloguing and Classification exercises

Data security

Once data is secured we need to protect it from both internal and external threats. Whilst it may be tempting to focus on securing the perimeter of the organisation, many cases of abuse of personal data are internal. Managing data privacy effectively requires a nuanced approach to data security - ensuring role-based access to individual fields based on the user's processing purpose. Blending technology such as masking, encryption and user behaviour analytics ensures the granular level of security needed to ensure the data subject is protected.


A final piece in the puzzle is to monitor personal data access for suspicious activity. A number of breaches have occurred through the illegal activities of legitimately authorised users. By monitoring user activity and access, and flagging unusual or suspicious activities for further investigation, we minimise the risk of abuse.