Data masking sounds a bit ambiguous, doesn’t it? What is it really?
We know it has to do with data, but what does masking mean in this context? Well, I’ll tell you this, it’s not a tool like you might imagine. It's highly valuable to businesses and it protects data from prying eyes.
Through this post, we will discuss what each is, its similarities, and its differences.
Without further ado, let's dive right in.
What is data masking?
Data masking is any method used to obfuscate data for the means of protecting sensitive information. In more technical terms, data masking is the act of anonymization, pseudonymization, redaction, scrubbing, or de-identification of sensitive data. Data masking — also known as data obfuscation — is generally done by replacing actual data values with fictitious and realistic equivalents.
Why is data masking important?
The field of information technology is becoming more data-driven, with data-driven applications and data-driven competition as two very prominent examples. Maintaining a competitive edge in a data-driven world is vital, and as this becomes more evident, the need for data security continues to grow.
Data masking is something that is used by businesses and individuals for more than just protecting their own information but also the information of others. As an example, businesses use data masking techniques to protect their company’s information as well the information of their users and customers.
With security compliance and data privacy regulations moving to the forefront of concerns, data masking and information obfuscation now play a major role in meeting fundamental security requirements.
Data Masking Types and Techniques
- Inplace masking
- On the fly masking
- Dynamic data masking
- Static data masking
- Synthetic data generation
- Data encryption
- Tokenization
- Scrambling
- Nulling out or deletion
- Variance
- Substitution
- Shuffling
- Redaction
Types of Data Masking
1. Inplace Masking
Inplace masking involves reading from a target and then overwriting any sensitive information with masked data.
2. On the Fly Masking
On the fly masking is reading data from a location — such as production — and writing masked data into a non-production target.
3. Static Data Masking
Masking of data in storage removes any traces like logs or changes in data captures. This helps by removing static data left behind from interactions with storage.
4. Dynamic Data Masking
Data is streamed directly from the production system and consumed by another system in the dev/test environment.
5. Synthetic Data Generation
Instead of masking data, this approach actually generates new data in lieu of existing data, preserving the data structure. It’s used for scenarios like greenfield application development — building software systems for a totally new environment.
Data Masking Techniques
1. Data Encryption
Encrypted data is essentially a collection of data that is useless without the decryption key created by the encryption method being used. Data is masked by the encryption algorithm by changing the data to an equivalent string of data.
Due to the complexity of this technique, it is one of the most secure ways to mask data. This method requires ongoing encryption and mechanisms to manage and share encryption keys which can complicate the data masking process.
2. Tokenization
Tokenization is a type of encryption that generates stateful or stateless tokens, most of which can be re-identified. Tokens allow for secure data transmission and are often used for authentication purposes, as tokens are unique and based on a set of information provided.
3. Scrambling
This technique involves scrambling of characters or numbers, as a result, the information is technically still present. This means that with the right tools one could reassemble the data, as such is a poor choice for masking highly-sensitive data.
4. Nulling Out or Deletion
This process replaces data with empty values and thus removes any usefulness from the data.
5. Variance
With variance, the data is changed based on predefined range values. It can be useful in situations where transactional data that is non-sensitive needs to be protected, for aggregations or analytical and testing purposes.
6. Substitution
Data is substituted with another value. The level of difficulty to execute can range quite a lot based on a few factors. While it is typically true that the difficulty is relative to the security of the data, it’s the correct way to mask when done right.
7. Shuffling
Shuffling is the act of moving data within rows in the same column. This can be useful in that it provides data that appears to be real information yet has no actual value. However, this process is another one that leaves data intact and, therefore, the security of masking can vary.
8. Redaction
This type of data masking is done by changing all target characters to the same character, such as replacing all numbers with an asterisk. The downside of this method is that it removes all value for the target data which makes this less than valuable to businesses.
Data Masking Best Practices
Having discussed the types and techniques of data masking, there is still one more point we would be remiss not to discuss and that is the best practices surrounding these concepts. Let’s discuss those next and then step into a wrap-up covering important takeaways.
Identify the scope of your project.
To properly identify your project’s data masking needs, you must first know the full scope of your project. This enables you to properly identify what information needs to be masked, who and what is authorized to view or interact with that information, as well as where the data lives. Gathering this information may sound simple, but it can vary in complexity based on the individual needs of your project and/or business.
Establish consistent referential integrity.
You should ensure that across the entire business application, all data types are masked using the same algorithm. Using the same data masking tool across an entire business application is not a feasible solution. Instead, it is best to sync practices across the entirety of your business, ensuring that your project’s data masking practices are consistent.
Secure your masking algorithms
Securing your masking algorithms is one of the most vital steps in the data masking process because if they are discovered or revealed by attackers, your application data can be completely exposed.
Exposed algorithms can cripple your entire application, and even your business, furthermore it can leave contacts and customers vulnerable to theft or worse.
A best practice is to ensure the separation of roles and duties. In fact, it’s often a security requirement by some regulations.
Furthermore, while a team’s personnel may decide what methods should be used for data masking, only department data owners should have access to specific algorithms, keys, and data sets. This minimizes surface exposure to sensitive algorithm data, further protecting the integrity of your data masking algorithm.
Data Masking: Final Takeaways
There are a few important points to remember from this post, so, let’s wrap up with a few key takeaways.
- Data masking is the process of obfuscating information with the express intent of protecting sensitive data from exposure or breach.
- The importance of data masking is underlined by the importance of protecting business, customer, user, and any other information deemed as sensitive data.
- There are many different kinds of data masking used for varying purposes, these types include Inplace Masking, On the Fly Masking, Static Data Masking, Dynamic Data Masking, and Synthetic Data Generation.
- Within the realm of data masking there are also several techniques used to accomplish this such as Data Encryption, Tokenization, Scrambling, Nulling Out or Deletion, Variance, Substitution, Shuffling, and Redaction.
- The best practices for successful data masking are identifying the scope of your project, Establishing consistent referential integrity, and securing your masking algorithms.
From this post, you should have a better understanding of the basic concepts of data masking, the importance of it, types of data masking, techniques used, and some best practices to consider when implementing it.