Skip to main content

Basic Machine Learning Alogrithms


Here is a table of the machine learning algorithms, along with whether they are supervised or unsupervised learning algorithms:

AlgorithmSupervisedUnsupervised
Linear regressionSupervisedNo
Decision treesSupervisedNo
Random forestSupervisedNo
Ada boostSupervisedNo
Gradient boostSupervisedNo
Logistic regressionSupervisedNo
K-nearest neighbors (KNN)SupervisedNo
Support vector machines (SVM)SupervisedNo
K-meansUnsupervisedYes
Collaborative filteringUnsupervisedYes
Principal component analysis (PCA)UnsupervisedYes

In supervised learning, the algorithm is given labeled data, which means that the data is paired with the correct output. The algorithm then learns to map the input data to the output data. In unsupervised learning, the algorithm is not given labeled data. The algorithm must learn to find patterns in the data without any guidance.

Here is a table of the above machine learning algorithms whether they can be used for regression or classification:

AlgorithmRegressionClassification
Linear regressionYesNo
Decision treesYesYes
Random forestYesYes
Ada boostYesYes
Gradient boostYesYes
Logistic regressionYesYes
K-nearest neighbors (KNN)YesYes
Support vector machines (SVM)YesYes
K-meansNoNo
Collaborative filteringNoNo
Principal component analysis (PCA)NoNo

As you can see, all of the algorithms except for K-means, collaborative filtering, and PCA can be used for both regression and classification. However, some algorithms are better suited for one task than the other. For example, linear regression is typically used for regression tasks, while decision trees and random forests are typically used for classification tasks.

Here are some specific examples of how these algorithms can be used for regression and classification:

  • Linear regression can be used to predict the price of a house based on its features, such as the number of bedrooms, the square footage, and the location.
  • Decision trees can be used to classify spam emails based on their content.
  • Random forests can be used to classify images of animals based on their features.
  • Logistic regression can be used to predict whether a patient will have a heart attack based on their medical history.
  • K-nearest neighbors can be used to recommend movies to users based on their ratings of other movies.
  • Support vector machines can be used to classify handwritten digits.


  • Linear regression is a supervised learning algorithm that predicts a continuous value. It works by fitting a line or curve to the data points. The line or curve is chosen in such a way that it minimizes the errors between the predicted values and the actual values.





import numpy as np
import matplotlib.pyplot as plt

# Generate some data
x = np.linspace(0, 10, 100)
y = 2 * x + 5

# Fit a linear regression model
model = np.polyfit(x, y, 1)

# Predict the values of y for the given values of x
y_pred = model[0] * x + model[1]

# Plot the data and the fitted line
plt.plot(x, y, 'o')
plt.plot(x, y_pred)
plt.show()

  • Decision trees are a supervised learning algorithm that predicts a categorical value. It works by creating a tree-like structure of decisions. Each decision splits the data into two or more smaller groups, and the process is repeated until all of the data points are classified.




from sklearn.tree import DecisionTreeClassifier

# Create a decision tree classifier
clf = DecisionTreeClassifier()

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Random forest is an ensemble learning algorithm that combines multiple decision trees. It works by training each decision tree on a different subset of the data, and then averaging the predictions of the trees. This helps to reduce the variance of the predictions and improve the accuracy of the model.




from sklearn.ensemble import RandomForestClassifier

# Create a random forest classifier
clf = RandomForestClassifier(n_estimators=100)

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Ada boost is an ensemble learning algorithm that combines multiple decision trees. It works by training each decision tree on a weighted version of the data. The weights are adjusted after each tree is trained so that the trees focus on the misclassified data points.




from sklearn.ensemble import AdaBoostClassifier

# Create an AdaBoost classifier
clf = AdaBoostClassifier(n_estimators=100)

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Gradient boost is an ensemble learning algorithm that combines multiple decision trees. It works by training each decision tree to correct the errors of the previous trees. This helps to improve the accuracy of the model over time.




from sklearn.ensemble import GradientBoostingClassifier

# Create a gradient boosting classifier
clf = GradientBoostingClassifier(n_estimators=100)

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Logistic regression is a supervised learning algorithm that predicts a binary value. It works by fitting a logistic curve to the data points. The logistic curve is a sigmoid function that maps the predicted values to a probability.




from sklearn.linear_model import LogisticRegression

# Create a logistic regression classifier
clf = LogisticRegression()

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • K-nearest neighbors (KNN) is a non-parametric supervised learning algorithm that predicts a value based on the k most similar training examples. The k nearest neighbors are the training examples that are closest to the new data point.




from sklearn.neighbors import KNeighborsClassifier

# Create a KNN classifier
clf = KNeighborsClassifier(n_neighbors=5)

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Support vector machines (SVM) are a supervised learning algorithm that can be used for both classification and regression tasks. SVM works by finding the hyperplane that best separates the data points. The hyperplane is a line or curve that divides the data into two or more classes.




from sklearn.svm import SVC

# Create an SVM classifier
clf = SVC(kernel='linear')

# Fit the classifier to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • K-means is an unsupervised learning algorithm that clusters data points into k groups. The k clusters are chosen in such a way that the sum of the squared distances between the data points and the cluster centroids is minimized.




from sklearn.cluster import KMeans

# Create a KMeans clustering model
clf = KMeans(n_clusters=3)

# Fit the model to the data
clf.fit(X)

# Get the cluster labels
labels = clf.labels_

  • Collaborative filtering is a technique that recommends items to users based on the ratings of other users. It works by finding users who have similar interests and then recommending items that those users have rated highly.





from sklearn.neighbors import NearestNeighbors

# Create a KNN collaborative filtering model
clf = NearestNeighbors(n_neighbors=5)

# Fit the model to the data
clf.fit(X, y)

# Make predictions on new data
predictions = clf.predict(X_test)

  • Principal component analysis (PCA) is a dimensionality reduction technique that reduces the number of features in a dataset while preserving the most important information. PCA works by finding the principal components, which are the directions in which the data varies the most.




from sklearn.decomposition import PCA

# Create a PCA model
clf = PCA(n_components=2)

# Fit the model to the data
clf.fit(X)

# Transform the data
X_new = clf.transform(X)


photo by Google DeepMind, towardsai, Wikipedia, wikimapia, geekforgeeks

Comments

Popular posts from this blog

Financial Engineering

Financial Engineering: Key Concepts Financial engineering is a multidisciplinary field that combines financial theory, mathematics, and computer science to design and develop innovative financial products and solutions. Here's an in-depth look at the key concepts you mentioned: 1. Statistical Analysis Statistical analysis is a crucial component of financial engineering. It involves using statistical techniques to analyze and interpret financial data, such as: Hypothesis testing : to validate assumptions about financial data Regression analysis : to model relationships between variables Time series analysis : to forecast future values based on historical data Probability distributions : to model and analyze risk Statistical analysis helps financial engineers to identify trends, patterns, and correlations in financial data, which informs decision-making and risk management. 2. Machine Learning Machine learning is a subset of artificial intelligence that involves training algorithms t...

Wholesale Customer Solution with Magento Commerce

The client want to have a shop where regular customers to be able to see products with their retail price, while Wholesale partners to see the prices with ? discount. The extra condition: retail and wholesale prices hasn’t mathematical dependency. So, a product could be $100 for retail and $50 for whole sale and another one could be $60 retail and $50 wholesale. And of course retail users should not be able to see wholesale prices at all. Basically, I will explain what I did step-by-step, but in order to understand what I mean, you should be familiar with the basics of Magento. 1. Creating two magento websites, stores and views (Magento meaning of website of course) It’s done from from System->Manage Stores. The result is: Website | Store | View ———————————————— Retail->Retail->Default Wholesale->Wholesale->Default Both sites using the same category/product tree 2. Setting the price scope in System->Configuration->Catalog->Catalog->Price set drop-down to...

How to Prepare for AI Driven Career

  Introduction We are all living in our "ChatGPT moment" now. It happened when I asked ChatGPT to plan a 10-day holiday in rural India. Within seconds, I had a detailed list of activities and places to explore. The speed and usefulness of the response left me stunned, and I realized instantly that life would never be the same again. ChatGPT felt like a bombshell—years of hype about Artificial Intelligence had finally materialized into something tangible and accessible. Suddenly, AI wasn’t just theoretical; it was writing limericks, crafting decent marketing content, and even generating code. The world is still adjusting to this rapid shift. We’re in the middle of a technological revolution—one so fast and transformative that it’s hard to fully comprehend. This revolution brings both exciting opportunities and inevitable challenges. On the one hand, AI is enabling remarkable breakthroughs. It can detect anomalies in MRI scans that even seasoned doctors might miss. It can trans...