What is data masking?
Data masking is a method to protect sensitive data in use from unintended exposure while maintaining the data’s functional value by obfuscating the data. Data masking techniques can include substituting parts of datasets, shuffling the data, translating specific numbers to ranges, scrambling the data, and more. A common use case would be to mask certain data available to call center representatives, like changing customers’ birth dates to age ranges (between the ages of 30-50 years old, for example) in order to protect the customers’ sensitive birth date information, while retaining the usefulness of the age range information to the call center employee.
Types of data masking
Types of data masking vary depending on how the original values are organized. The main types include:
- Static: Creates one sanitized version of the database by altering all sensitive information. A backup of a database in production is created and moved to a different location. After removing unnecessary data, the remaining information is masked while in stasis. Once this is complete, the new copy can be safely distributed.
- Deterministic: Maps two data sets so they have the same type of data, with each value consistently replaced by the corresponding value. For example, the term “Verbena” would always be replaced by the term “Amina.” This method can be convenient but isn’t the most secure.
- On-the-fly: Useful in a development environment, this type masks data as it is transferred from production systems to development systems before being saved. Instead of creating a backup, data is automatically masked while continuously streaming from production to the desired destination.
- Dynamic: While on-the-fly stores information in a secondary data store in the development environment, dynamic data masking streams these details directly from production to the development environment.
Benefits of data masking
Data masking is a process that keeps sensitive information away from prying eyes while in use. Organizations using this strategy experience the following security benefits:
- Proactive security measure: Helps organizations avoid critical threats like data loss, exfiltration, account compromise, insecure interfaces, and insider threats.
- Safer cloud adoption: Some organizations might be hesitant to operate in the cloud due to potential security risks. Masking solves this problem by reducing these concerns.
- Usable, low-risk data: While useless to any security risks, masked data is still functional for the organization’s internal use.
- Safe sharing: Sensitive details can be shared with testers and developers without leaking data that is not masked.
Data masking techniques
Organizations can choose from various masking techniques, each varying by the method and level of security. The most common techniques include:
- Encryption: Renders the data useless unless the viewer has the encryption key. This technique is the most secure, as it uses an algorithm to mask the data fully. It’s also the most complicated, as it relies on technology like encryption software to perform ongoing security measures.
- Scrambling: Rearranges characters in a randomized order. This method is simple and not as secure as encryption.
- Nulling: Presents specific values as missing (null) when viewed by certain users.
- Value variance: Original values are concealed by providing a function instead, like the difference between the highest and lowest value in a series.
- Substitution: Values are replaced with fake details that seem realistic. For example, names might be replaced by a random selection of other names.
- Shuffling: Instead of replacing data values with fake alternatives, the actual values within the set are shuffled to represent existing records while safeguarding sensitive information.
Data masking best practices
Certain measures can be taken to ensure data masking processes are effective. For the best results, the following safety precautions should be adhered to:
- Plan ahead: An organization should identify information that requires protecting before beginning the masking process. Additional information that needs gathering includes who will be authorized to view specific details, where it will be stored, and which applications will be involved.
- Prioritize referential integrity: All information types should be masked using one standard algorithm. While the same masking tool might not be an option for large businesses, all masking tools should be synchronized to share data across department lines without issue.
- Secure the algorithms: Algorithms, alternative data sets, and keys must be secured to prevent unauthorized users from reverse engineering sensitive information.

Martha Kendall Custard
Martha Kendall Custard is a former freelance writer for G2. She creates specialized, industry specific content for SaaS and software companies. When she isn't freelance writing for various organizations, she is working on her middle grade WIP or playing with her two kitties, Verbena and Baby Cat.