Skip to main content

Posts

Bot State with Azure

We can use  Azure Bot Application using FastAPI that integrates with Azure Cache for Redis for session management and uses Azure Cosmos DB for state management. Here are the steps to achieve this: State Management with Azure Cosmos DB : Why do you need state? Maintaining state allows your bot to have more meaningful conversations by remembering certain things about a user or conversation. For example, if you’ve talked to a user previously, you can save previous information about them, so that you don’t have to ask for it again. State also keeps data for longer than the current turn, so your bot retains information over the course of a multi-turn conversation. Storage Layer : The backend storage layer is where the state information is actually stored. You can choose from different storage options: Memory Storage : For local testing only; volatile and temporary. Azure Blob Storage : Connects to an Azure Blob Storage object database. Azure Cosmos DB Partitioned Storage : Connects ...

Azure Data Factory, ADSL Gen2 BLOB Storage and Syncing Data from Share Point Folder

  Photo by Manuel Geissinger Today we are going to discuss data sync between on premisses SharePoint folder and Azure BLOB Storage.  When we need to upload or download files from SharePoint folder within the home network to Azure. We must consider the best way to auto sync as well. Let's discuss them step by step. Azure Data Factory (ADF) is a powerful cloud-based service provided by Microsoft Azure . Let me break it down for you: Purpose and Context : In the world of big data , we often deal with raw, unorganized data stored in various systems. However, raw data alone lacks context and meaning for meaningful insights. Azure Data Factory (ADF) steps in to orchestrate and operationalize processes, transforming massive raw data into actionable business insights. What Does ADF Do? : ADF is a managed cloud service designed for complex data integration projects. It handles hybrid extract-transform-load (ETL) and extract-load-transform (ELT) scenarios. It enables data movement...

MySql with Docker

Running a MySQL database in a Docker container is straightforward. Here are the steps: Pull the Official MySQL Image: The official MySQL image is available on Docker Hub. You can choose the version you want (e.g., MySQL 8.0): docker pull mysql:8.0 Create a Docker Volume (Optional): To persist your database data, create a Docker volume or bind mount. Otherwise, data will be lost when the container restarts. Example using a volume: docker volume create mysql-data Run the MySQL Container: Use the following command to start a MySQL container: docker run --name my-mysql -e MYSQL_ROOT_PASSWORD=secret -v mysql-data:/var/lib/mysql -d mysql:8.0 Replace secret with your desired root password. The MySQL first-run routine will take a few seconds to complete. Check if the database is up by running: docker logs my-mysql Look for a line that says “ready for connections.” Access MySQL Shell: To interact with MySQL, attach to the container and run the mysql command: docker exec -it my-mysql mysql -p...

Data Masking When Ingesting Into Databricks

  Photo by Alba Leader Data masking is a data security technique that involves hiding data by changing its original numbers and letters. It's a way to create a fake version of data that's similar enough to the actual data, while still protecting it. This fake data can then be used as a functional alternative when the real data isn't needed.  Unity Catalog  is not a feature within Databricks. Instead, Databricks provides the  Delta Lake  feature, which includes data governance capabilities such as row filters and column masking. Unity Catalog in Databricks allows you to apply data governance policies such as row filters and column masks to sensitive data. Let’s break it down: Row Filters : Row filters enable you to apply a filter to a table so that subsequent queries only return rows for which the filter predicate evaluates to true. To create a row filter, follow these steps: Write a SQL user-defined function (UDF) to define the filter policy. CREATE FUNCTIO...

GenAI Speech to Sentiment Analysis with Azure Data Factory

  Photo by Tara Winstead Azure Data Factory (ADF) is a powerful data integration service, and it can be seamlessly integrated with several other Azure services to enhance your data workflows. Here are some key services that work closely with ADF: Azure Synapse Analytics : Formerly known as SQL Data Warehouse, Azure Synapse Analytics provides an integrated analytics service that combines big data and data warehousing. You can use ADF to move data into Synapse Analytics for advanced analytics, reporting, and business intelligence. Azure Databricks : Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform. ADF can orchestrate data movement between Databricks and other data stores, enabling you to process and analyze large datasets efficiently. Azure Blob Storage : ADF can seamlessly copy data to and from Azure Blob Storage. It’s a cost-effective storage solution for unstructured data, backups, and serving static content. Azure SQL Database : Use ADF ...

Databricks with Azure Past and Present

  Let's dive into the evolution of Azure Databricks and its performance differences. Azure Databricks is a powerful analytics platform built on Apache Spark, designed to process large-scale data workloads. It provides a collaborative environment for data engineers, data scientists, and analysts. Over time, Databricks has undergone significant changes, impacting its performance and capabilities. Previous State: In the past, Databricks primarily relied on an open-source version of Apache Spark . While this version was versatile, it had limitations in terms of performance and scalability. Users could run Spark workloads, but there was room for improvement. Current State: Today, Azure Databricks has evolved significantly. Here’s what’s changed: Optimized Spark Engine: Databricks now offers an optimized version of Apache Spark . This enhanced engine provides 50 times increased performance compared to the open-source version. Users can leverage GPU-enabled clusters, enabling faster ...

Redhat Openshift for Data Science Project

  Photo by Tim Mossholder Red Hat OpenShift Data Science is a powerful platform designed for data scientists and developers working on artificial intelligence (AI) applications. Let’s dive into the details: What is Red Hat OpenShift Data Science? Red Hat OpenShift Data Science provides a fully supported environment for developing, training, testing, and deploying machine learning models. It allows you to work with AI applications both on-premises and in the public cloud . You can use it as a managed cloud service add-on to Red Hat’s OpenShift cloud services or as self-managed software that you can install on-premise or in the public cloud. Key Features and Benefits : Rapid Development : OpenShift Data Science streamlines the development process, allowing you to focus on building and refining your models. Model Training : Train your machine learning models efficiently within the platform. Testing and Validation : Easily validate your models before deployment. Deployment Flexibi...