G2 takes pride in showing unbiased reviews on user satisfaction in our ratings and reports. We do not allow paid placements in any of our ratings, rankings, or reports. Learn about our scoring methodologies.
IBM InfoSphere Optim Data Privacy protects privacy and support compliance using extensive capabilities to de-identify sensitive information across applications, databases and operating systems
AI-Driven Data Anonymization For Knowledge Management -> Nymiz detects sensitive data in unstructured files (doc, docx, xls, xlsx, jpg, tlf, png, pdf) and also in structured data (databases), and
Tumult Analytics is an open-source Python library making it easy and safe to use differential privacy; enabling organizations to safely release statistical summaries of sensitive data. Tumult Analyti
Private AI is at the forefront of privacy solutions, providing an advanced machine learning (ML) system that identifies, redacts, and replaces personally identifiable information (PII) across a wide s
Tonic.ai offers a developer platform for data de-identification, synthesis, subsetting, and provisioning to keep test data secure, accessible, and in sync across testing and development environments.
Salesforce Shield is a suite of products that provides an extra level of security and protection above and beyond what’s already built into Salesforce. Salesforce Shield capabilities help improve data
Data security and privacy for data in use by both mission-critical and line-of-business applications.
Very Good Security (“VGS”) makes it easy for customers to collect, protect and share sensitive financial data in a way that accelerates revenue, eliminates risk, ensures compliance, and drives profita
KIProtect makes it easy to ensure compliance and security when working with sensitive or personal data.
Evervault eliminates the security and compliance burden of handling sensitive user data, by equipping developers with easy-to-use tools to encrypt, process, and share that data, without touching it in
PRIVACY VAULT is intended to support industries that collect and process personal profiles, high-velocity consumer activity and IoT data, plus unstructured documents, images, voice and video.
Privacy1 is a software company in Stockholm and London that develops technologies for practical management of personal data. Our mission is to be an enabler to make data protection easier and accessib
brighter AI provides anonymization solutions based on state-of-the-art deep learning technology to protect every identity in public. We develop game-changing image and video anonymization software t
Aircloak enables organisations to gain flexible and secure insights into sensitive data sets through a smart, automatic, on-demand anonymization engine. It ensures compliance for both internal analyst
Data de-identification tools remove direct and indirect sensitive data and personally-identifying information from datasets to reduce the reidentification of that data. Data de-identification is particularly important for companies working with sensitive and highly-regulated data, such as those in healthcare working with protected health information (PHI) in medical records or financial data.
Companies may be prohibited from analyzing datasets that include sensitive and personally identifiable information (PII) in order to comply with internal policies and meet data privacy and data protection regulations. However, if the sensitive data is removed from a dataset in a non-identifiable manner, that dataset may become usable. For example, using data de-identification software tools, information such as peoples’ names, addresses, protected health information, tax identifying number, social security number, account numbers, and other personally identifying or sensitive data can be removed from datasets enabling companies to extract analytical value from the remaining de-identified data.
When considering using de-identified datasets, companies should understand the risks of that sensitive data becoming re-identified. Reidentification risks can include differencing attacks, such as where bad actors use their knowledge about people to see if specific individuals’ personal data is included in a dataset, or reconstruction attacks, where someone combines data from other data sources to reconstruct the original de-identified dataset. When evaluating data de-identification methods, understanding the degree of anonymity using k-anonymity is important.
The following are some core features within data de-identification tools:
Anonymization: Some data de-identification solutions offer statistical data anonymization methods, including k-anonymity, low-count suppression, and noise insertion. When working with sensitive data, particularly regulated data, anonymization weights and techniques to achieve that must be considered. The more anonymized the data is, the lesser the risk of re-identification. However, the more anonymous a dataset is made, the less its utility and accuracy.
Tokenization or pseudonymization: Tokenization or pseudonymization replaces sensitive data with a token value stored outside the production dataset; it effectively de-identifies the dataset in use but can be reconstructed when needed.
The biggest benefit of using data de-identification tools is enabling analyses of data that would otherwise be prohibited from use. This allows companies to extract insights from their data while following data privacy and protection regulations by protecting sensitive information.
Data usability for data analysis: Enables companies to analyze datasets and extract value from datasets that would otherwise be unable to be processed due to the sensitivity of data contained within them.
Regulatory compliance: Global data privacy and protection regulations require companies to treat sensitive data differently than non-sensitive data. If a dataset can be made non-sensitive using data de-identification software techniques, it may no longer be in the scope of data privacy or data protection regulations.
Data de-identification solutions are used by people analyzing production data or those creating algorithms. De-identified data can also be used for safe data sharing.
Data Managers, administrators, and data scientists: These professionals who interact with datasets regularly will likely work with data de-identification software tools.
Qualified experts: These include qualified experts under HIPAA and can provide expert determination to attest that a dataset is deemed de-identified and the risks of re-identification are small based on generally accepted statistical methods.
Depending on the type of data protection a company is looking for, alternatives to data de-identification tools may be considered. For example, when determining when the data de-identification process is best, data masking may be a better option for companies that want to limit people from viewing sensitive data within applications. If the data merely needs to be protected during transit or at rest, encryption software may be a choice. If privacy-safe testing data is needed, synthetic data may be an alternative.
Data masking software: Data masking software obfuscates the data while retaining the original data. The mask can be lifted to reveal the original dataset.
Encryption software: Encryption software protects data by converting plaintext into scrambled letters, known as ciphertext, which can only be decrypted using the appropriate encryption key.
Synthetic data software: Synthetic data software helps companies create artificial datasets, including images, text, and other data from scratch using computer-generated imagery (CGI), generative neural networks (GANs), and heuristics. Synthetic data is most commonly used for testing and training machine learning models.
Software solutions can come with their own set of challenges.
Minimizing re-identification risks: Simply removing personal information from a dataset may not be enough to consider the dataset de-identified. Indirect personal identifiers— contextual personal information within the data—may be used to re-identify a person in the data. Reidentification can happen from cross-referencing one dataset with another, singling out specific factors that relate to a known individual, or through general inferences of data that tend to correlate. De-identifying both direct and indirect identifiers, introducing noise (random data), and generalizing the data by reducing the granularity and analyzing it in aggregate can help prevent re-identification.
Meeting regulatory requirements: Many data privacy and data protection laws do not specify technical requirements for what is considered de-identified or anonymous data, so it is up to companies to understand the technical capabilities of their software solutions and how that relates to adhering to data protection regulations.
Users must determine their specific needs for data de-identification tools. They can answer the questions below to get a better understanding:
Create a long list
Buyers can visit G2’s Data De-identification Software category, read reviews about data de-identification products, and determine which products fit their businesses’ specific needs. They can then create a list of products that match those needs.
Create a short list
After creating a long list, buyers can review their choices and eliminate some products to create a shorter, more precise list.
Conduct demos
Once buyers have narrowed down their software search, they can connect with the vendor to view demonstrations of the software product and how it relates to their company’s specific use cases. They can ask about the de-identification methods. Buyers can also ask about integrations with their existing tech stack, licensing methods, and pricing—whether fees are based on the number of projects, databases, executions, etc.
Choose a selection team
Buyers must determine which team is responsible for implementing and managing this software. Often, that may be someone from the data team. It is important to have a representative from the financial team on the selection committee to ensure the license is within budget.
Negotiation
Buyers should get specific answers to the license cost, how it is priced, and if the data de-identification software is based on the dataset size, features, or execution. They must keep in mind the company’s data de-identification needs for today and the future.
Final decision
The final decision will come down to whether the software solution meets the technical requirements, the usability, the implementation, other support, the expected return on investment, and more. Ideally, the data team will make the final decision, alongside input from other stakeholders like software development teams.