Skip to main content

Posts

Showing posts with the label masking

Masking Data Before Ingest

Masking data before ingesting it into Azure Data Lake Storage (ADLS) Gen2 or any cloud-based data lake involves transforming sensitive data elements into a protected format to prevent unauthorized access. Here's a high-level approach to achieving this: 1. Identify Sensitive Data:    - Determine which fields or data elements need to be masked, such as personally identifiable information (PII), financial data, or health records. 2. Choose a Masking Strategy:    - Static Data Masking (SDM): Mask data at rest before ingestion.    - Dynamic Data Masking (DDM): Mask data in real-time as it is being accessed. 3. Implement Masking Techniques:    - Substitution: Replace sensitive data with fictitious but realistic data.    - Shuffling: Randomly reorder data within a column.    - Encryption: Encrypt sensitive data and decrypt it when needed.    - Nulling Out: Replace sensitive data with null values.    - Tokenization:...

Data Masking When Ingesting Into Databricks

  Photo by Alba Leader Data masking is a data security technique that involves hiding data by changing its original numbers and letters. It's a way to create a fake version of data that's similar enough to the actual data, while still protecting it. This fake data can then be used as a functional alternative when the real data isn't needed.  Unity Catalog  is not a feature within Databricks. Instead, Databricks provides the  Delta Lake  feature, which includes data governance capabilities such as row filters and column masking. Unity Catalog in Databricks allows you to apply data governance policies such as row filters and column masks to sensitive data. Let’s break it down: Row Filters : Row filters enable you to apply a filter to a table so that subsequent queries only return rows for which the filter predicate evaluates to true. To create a row filter, follow these steps: Write a SQL user-defined function (UDF) to define the filter policy. CREATE FUNCTIO...