Skip to main content

Posts

Showing posts from March 12, 2024

Data Masking When Ingesting Into Databricks

  Photo by Alba Leader Data masking is a data security technique that involves hiding data by changing its original numbers and letters. It's a way to create a fake version of data that's similar enough to the actual data, while still protecting it. This fake data can then be used as a functional alternative when the real data isn't needed.  Unity Catalog  is not a feature within Databricks. Instead, Databricks provides the  Delta Lake  feature, which includes data governance capabilities such as row filters and column masking. Unity Catalog in Databricks allows you to apply data governance policies such as row filters and column masks to sensitive data. Let’s break it down: Row Filters : Row filters enable you to apply a filter to a table so that subsequent queries only return rows for which the filter predicate evaluates to true. To create a row filter, follow these steps: Write a SQL user-defined function (UDF) to define the filter policy. CREATE FUNCTIO...

GenAI Speech to Sentiment Analysis with Azure Data Factory

  Photo by Tara Winstead Azure Data Factory (ADF) is a powerful data integration service, and it can be seamlessly integrated with several other Azure services to enhance your data workflows. Here are some key services that work closely with ADF: Azure Synapse Analytics : Formerly known as SQL Data Warehouse, Azure Synapse Analytics provides an integrated analytics service that combines big data and data warehousing. You can use ADF to move data into Synapse Analytics for advanced analytics, reporting, and business intelligence. Azure Databricks : Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform. ADF can orchestrate data movement between Databricks and other data stores, enabling you to process and analyze large datasets efficiently. Azure Blob Storage : ADF can seamlessly copy data to and from Azure Blob Storage. It’s a cost-effective storage solution for unstructured data, backups, and serving static content. Azure SQL Database : Use ADF ...