Skip to main content

Posts

MySql with Docker

Running a MySQL database in a Docker container is straightforward. Here are the steps: Pull the Official MySQL Image: The official MySQL image is available on Docker Hub. You can choose the version you want (e.g., MySQL 8.0): docker pull mysql:8.0 Create a Docker Volume (Optional): To persist your database data, create a Docker volume or bind mount. Otherwise, data will be lost when the container restarts. Example using a volume: docker volume create mysql-data Run the MySQL Container: Use the following command to start a MySQL container: docker run --name my-mysql -e MYSQL_ROOT_PASSWORD=secret -v mysql-data:/var/lib/mysql -d mysql:8.0 Replace secret with your desired root password. The MySQL first-run routine will take a few seconds to complete. Check if the database is up by running: docker logs my-mysql Look for a line that says “ready for connections.” Access MySQL Shell: To interact with MySQL, attach to the container and run the mysql command: docker exec -it my-mysql mysql -p...

Data Masking When Ingesting Into Databricks

  Photo by Alba Leader Data masking is a data security technique that involves hiding data by changing its original numbers and letters. It's a way to create a fake version of data that's similar enough to the actual data, while still protecting it. This fake data can then be used as a functional alternative when the real data isn't needed.  Unity Catalog  is not a feature within Databricks. Instead, Databricks provides the  Delta Lake  feature, which includes data governance capabilities such as row filters and column masking. Unity Catalog in Databricks allows you to apply data governance policies such as row filters and column masks to sensitive data. Let’s break it down: Row Filters : Row filters enable you to apply a filter to a table so that subsequent queries only return rows for which the filter predicate evaluates to true. To create a row filter, follow these steps: Write a SQL user-defined function (UDF) to define the filter policy. CREATE FUNCTIO...

GenAI Speech to Sentiment Analysis with Azure Data Factory

  Photo by Tara Winstead Azure Data Factory (ADF) is a powerful data integration service, and it can be seamlessly integrated with several other Azure services to enhance your data workflows. Here are some key services that work closely with ADF: Azure Synapse Analytics : Formerly known as SQL Data Warehouse, Azure Synapse Analytics provides an integrated analytics service that combines big data and data warehousing. You can use ADF to move data into Synapse Analytics for advanced analytics, reporting, and business intelligence. Azure Databricks : Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform. ADF can orchestrate data movement between Databricks and other data stores, enabling you to process and analyze large datasets efficiently. Azure Blob Storage : ADF can seamlessly copy data to and from Azure Blob Storage. It’s a cost-effective storage solution for unstructured data, backups, and serving static content. Azure SQL Database : Use ADF ...

Databricks with Azure Past and Present

  Let's dive into the evolution of Azure Databricks and its performance differences. Azure Databricks is a powerful analytics platform built on Apache Spark, designed to process large-scale data workloads. It provides a collaborative environment for data engineers, data scientists, and analysts. Over time, Databricks has undergone significant changes, impacting its performance and capabilities. Previous State: In the past, Databricks primarily relied on an open-source version of Apache Spark . While this version was versatile, it had limitations in terms of performance and scalability. Users could run Spark workloads, but there was room for improvement. Current State: Today, Azure Databricks has evolved significantly. Here’s what’s changed: Optimized Spark Engine: Databricks now offers an optimized version of Apache Spark . This enhanced engine provides 50 times increased performance compared to the open-source version. Users can leverage GPU-enabled clusters, enabling faster ...

Redhat Openshift for Data Science Project

  Photo by Tim Mossholder Red Hat OpenShift Data Science is a powerful platform designed for data scientists and developers working on artificial intelligence (AI) applications. Let’s dive into the details: What is Red Hat OpenShift Data Science? Red Hat OpenShift Data Science provides a fully supported environment for developing, training, testing, and deploying machine learning models. It allows you to work with AI applications both on-premises and in the public cloud . You can use it as a managed cloud service add-on to Red Hat’s OpenShift cloud services or as self-managed software that you can install on-premise or in the public cloud. Key Features and Benefits : Rapid Development : OpenShift Data Science streamlines the development process, allowing you to focus on building and refining your models. Model Training : Train your machine learning models efficiently within the platform. Testing and Validation : Easily validate your models before deployment. Deployment Flexibi...

On premises vs Cloud

Organizations often face the dilemma of choosing between  #onpremises  servers and a  #cloud -only approach. Let’s explore the pros and cons of each: Costs and Maintenance: On-Premises: Requires upfront capital investment in hardware, installation, software licensing, and IT services. Ongoing costs include staff salaries, energy expenses, hosting fees, and office space. Regular updates and replacements add to the financial burden. Cloud: Subscription-based model, reducing upfront costs. Managed by the cloud provider, minimizing maintenance efforts. Scalability without significant capital investment. Security and Compliance: On-Premises: Provides direct control over security measures. Suits organizations with strict compliance requirements. Cloud: Robust security measures implemented by cloud providers. Compliance certifications (e.g., ISO, SOC) for data protection. Shared responsibility model: Cloud provider secures infrastructure, while you secure data. Scalability and F...