Step 5: Model Deployment
As a seasoned expert in AI, Machine Learning, Generative AI, IoT and Robotics, I empower innovators and businesses to harness the potential of emerging technologies. With a passion for sharing knowledge, I curate insightful articles, tutorials and news on the latest advancements in AI, Robotics, Data Science, Cloud Computing and Open Source technologies. Hire Me Unlock cutting-edge solutions for your business. With expertise spanning AI, GenAI, IoT and Robotics, I deliver tailor services.
Four main topics in EDA are
Descriptive statistics are a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution. Some of the most common descriptive statistics include:
Here is an example of code that calculates the mean, median, mode, range, variance, and standard deviation of a dataset:
import numpy as n
import pandas as pd
# Create a dataset.
data = np.random.randint(0, 100, 100)
# Calculate the mean.
mean = np.mean(data)
# Calculate the median.
median = np.median(data)
# Calculate the mode.
mode = np.argmax(np.histogram(data)[0])
# Calculate the range.
range = np.max(data) - np.min(data)
# Calculate the variance.
variance = np.var(data)
# Calculate the standard deviation.
standard_deviation = np.std(data)
# Print the results.
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode)
print("Range:", range)
print("Variance:", variance)
print("Standard deviation:", standard_deviation)
Univariate analysis is a statistical method that is used to analyze a single variable. Univariate analysis can be used to describe the distribution of a variable, to identify outliers, and to test hypotheses about the variable. Some of the most common univariate analysis methods include:
Here is an example of code that creates a frequency distribution and a histogram of a variable:
import numpy as n
import pandas as pd
# Create a dataset.
data = np.random.randint(0, 100, 100)
# Create a frequency distribution.
frequency_distribution = pd.value_counts(data)
# Create a histogram.
plt.hist(data)
plt.show()
Bivariate analysis is a statistical method that is used to analyze two variables. Bivariate analysis can be used to investigate the relationship between two variables, to identify factors that influence a variable, and to make predictions about a variable. Some of the most common bivariate analysis methods include:
Here is an example of code that calculates the correlation coefficient between two variables:
import numpy as n
import pandas as pd
# Create two variables.
variable_1 = np.random.randint(0, 100, 100)
variable_2 = np.random.randint(0, 100, 100)
# Calculate the correlation coefficient.
correlation_coefficient = np.corrcoef(variable_1, variable_2)[0, 1]
# Print the correlation coefficient.
print("Correlation coefficient:", correlation_coefficient)
Multivariate analysis is a statistical method that is used to analyze multiple variables. Multivariate analysis can be used to investigate the relationships between multiple variables, to identify factors that influence multiple variables, and to make predictions about multiple variables.
It can be done with different ways:
Principal component analysis (PCA)
import numpy as n
import pandas as pd
from sklearn.decomposition import PCA
# Create a dataset.
data = np.random.randint(0, 100, (100, 3))
# Create a PCA model.
pca = PCA(n_components=2)
# Fit the PCA model to the data.
pca.fit(data)
# Transform the data to the principal components.
principal_components = pca.transform(data)
# Print the principal components.
print(principal_components)
Factor analysis (FA)
import numpy as n
import pandas as pd
from sklearn.decomposition import FactorAnalysis
# Create a dataset.
data = np.random.randint(0, 100, (100, 5))
# Create a FA model.
fa = FactorAnalysis(n_components=3)
# Fit the FA model to the data.
fa.fit(data)
# Transform the data to the factors.
factors = fa.transform(data)
# Print the factors.
print(factors)
Linear discriminant analysis (LDA)
import numpy as n
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
# Create a dataset.
data = np.random.randint(0, 100, (100, 2))
labels = np.random.randint(0, 2, 100)
# Create an LDA model.
lda = LinearDiscriminantAnalysis()
# Fit the LDA model to the data.
lda.fit(data, labels)
# Predict the labels for the data.
predicted_labels = lda.predict(data)
# Print the accuracy of the model.
print(lda.score(data, labels))
Logistic regression:
import numpy as n
import pandas as pd
from sklearn.linear_model import LogisticRegression
# Create a dataset.
data = np.random.randint(0, 100, (100, 2))
labels = np.random.randint(0, 2, 100)
# Create a logistic regression model.
logistic_regression = LogisticRegression()
# Fit the logistic regression model to the data.
logistic_regression.fit(data, labels)
# Predict the labels for the data.
predicted_labels = logistic_regression.predict(data)
# Print the accuracy of the model.
print(logistic_regression.score(data, labels))
There are many good tutorials on above subjects. However here you will get a quick idea and example as well.
I am a Software Architect | AI, Data Science, IoT, Cloud ⌨️ 👨🏽 💻
Developing software, system ….. for more that 26 years. Thank you.