Skip to main content

Posts

Showing posts from November 14, 2023

Data Drift and MLOps

                                                                                Photo by chris howard Data drift refers to the phenomenon where the statistical properties of the incoming data used to train a machine learning model change over time. This change in data distribution can negatively impact the model's performance and predictive accuracy. Data drift can occur for various reasons and has significant implications for the effectiveness of machine learning models in production. Key points about data drift include: 1. Causes of Data Drift:    - Seasonal Changes: Data patterns may vary with seasons or other periodic trends.    - External Factors: Changes in the external environment, such as economic conditions, regulations, ...

Quick Start with PySpark and Snowflake

Snowflake is a cloud-based data warehouse that provides a secure, scalable, and high-performance platform for data storage, processing, and analytics. It is a fully managed service, so you don't have to worry about managing infrastructure or software. Snowflake is used by a wide range of customers, including businesses of all sizes, government agencies, and educational institutions. Here is an example of an end-to-end Snowflake workflow: Data ingestion: Snowflake supports a variety of data ingestion methods, including CSV, JSON, Parquet, and ORC. You can load data into Snowflake from on-premises systems, cloud storage, or SaaS applications. Data storage: Snowflake stores data in a columnar format, which makes it very efficient for querying. Snowflake also supports multiple storage tiers, so you can optimize your costs by storing data in the tier that best meets your needs. Data processing: Snowflake provides a variety of data processing capabilities, including SQL, Spark, an...