LSTM and GRU

Long Short-Term Memory (LSTM) Networks

LSTMs are a type of Recurrent Neural Network (RNN) designed to handle sequential data with long-term dependencies.

Key Features:

Cell State: Preserves information over long periods.

Gates: Control information flow (input, output, and forget gates).

Hidden State: Temporary memory for short-term information.

Related Technologies:

Recurrent Neural Networks (RNNs): Basic architecture for sequential data.

Gated Recurrent Units (GRUs): Simplified version of LSTMs.

Bidirectional RNNs/LSTMs: Process input sequences in both directions.

Encoder-Decoder Architecture: Used for sequence-to-sequence tasks.

Real-World Applications:

Language Translation

Speech Recognition

Text Generation

Time Series Forecasting

GRUs are an alternative to LSTMs, designed to be faster and more efficient while still capturing long-term dependencies.

Key Differences from LSTMs:

Simplified Architecture: Fewer gates (update and reset) and fewer state vectors.

Faster Computation: Reduced number of parameters.

Technical Details for LSTMs and GRUs:

LSTM Mathematical Formulation:

Let x_t be the input at time t, h_t be the hidden state, and c_t be the cell state.

Input Gate: i_t = sigmoid(W_i * x_t + U_i * h_(t-1) + b_i)

Forget Gate: f_t = sigmoid(W_f * x_t + U_f * h_(t-1) + b_f)

Cell State Update: c_t = f_t * c_(t-1) + i_t * tanh(W_c * x_t + U_c * h_(t-1) + b_c)

Output Gate: o_t = sigmoid(W_o * x_t + U_o * h_(t-1) + b_o)

Hidden State Update: h_t = o_t * tanh(c_t)

Parameters:

W_i, W_f, W_c, W_o: Weight matrices for input, forget, cell, and output gates.

U_i, U_f, U_c, U_o: Weight matrices for hidden state.

b_i, b_f, b_c, b_o: Bias vectors.

GRU Mathematical Formulation:

Let x_t be the input at time t, h_t be the hidden state.

Update Gate: z_t = sigmoid(W_z * x_t + U_z * h_(t-1) + b_z)

Reset Gate: r_t = sigmoid(W_r * x_t + U_r * h_(t-1) + b_r)

Hidden State Update: h_t = (1 - z_t) * h_(t-1) + z_t * tanh(W_h * x_t + U_h * (r_t * h_(t-1)) + b_h)

Parameters:

W_z, W_r, W_h: Weight matrices for update, reset, and hidden state.

U_z, U_r, U_h: Weight matrices for hidden state.

b_z, b_r, b_h: Bias vectors.

Here's a small mathematical example for an LSTM network:

Example:

Suppose we have an LSTM network with:

Input dimension: 1

Hidden dimension: 2

Output dimension: 1

Input at time t (x_t)

x_t = 0.5

Previous Hidden State (h_(t-1)) and Cell State (c_(t-1))

h_(t-1) = [0.2, 0.3]

c_(t-1) = [0.4, 0.5]

Weight Matrices and Bias Vectors

W_i = [[0.1, 0.2], [0.3, 0.4]]

W_f = [[0.5, 0.6], [0.7, 0.8]]

W_c = [[0.9, 1.0], [1.1, 1.2]]

W_o = [[1.3, 1.4], [1.5, 1.6]]

U_i = [[1.7, 1.8], [1.9, 2.0]]

U_f = [[2.1, 2.2], [2.3, 2.4]]

U_c = [[2.5, 2.6], [2.7, 2.8]]

U_o = [[2.9, 3.0], [3.1, 3.2]]

b_i = [0.1, 0.2]

b_f = [0.3, 0.4]

b_c = [0.5, 0.6]

b_o = [0.7, 0.8]

Calculations

Input Gate

i_t = sigmoid(W_i * x_t + U_i * h_(t-1) + b_i)

= sigmoid([[0.1, 0.2], [0.3, 0.4]] * 0.5 + [[1.7, 1.8], [1.9, 2.0]] * [0.2, 0.3] + [0.1, 0.2])

= sigmoid([0.05 + 0.55, 0.1 + 0.65])

= sigmoid([0.6, 0.75])

= [0.55, 0.68]

Forget Gate

f_t = sigmoid(W_f * x_t + U_f * h_(t-1) + b_f)

= sigmoid([[0.5, 0.6], [0.7, 0.8]] * 0.5 + [[2.1, 2.2], [2.3, 2.4]] * [0.2, 0.3] + [0.3, 0.4])

= sigmoid([0.25 + 0.75, 0.35 + 0.85])

= sigmoid([1.0, 1.2])

= [0.73, 0.78]

Cell State Update

c_t = f_t * c_(t-1) + i_t * tanh(W_c * x_t + U_c * h_(t-1) + b_c)

= [0.73, 0.78] * [0.4, 0.5] + [0.55, 0.68] * tanh([[0.9, 1.0], [1.1, 1.2]] * 0.5 + [[2.5, 2.6], [2.7, 2.8]] * [0.2, 0.3] + [0.5, 0.6])

= [0.292, 0.39] + [0.55, 0.68] * tanh([0.45 + 0.7, 0.55 + 0.8])

= [0.292, 0.39] + [0.55, 0.68] * [0.58, 0.66]

= [0.479, 0.63]

Output Gate

o_t = sigmoid(W_o * x_t + U_o * h_(t-1) + b_o)

= sigmoid([[1.3, 1.4], [1.5, 1.6]] * 0.5 + [[2.9, 3.0], [3.1, 3.2]] * [0.2, 0.3] + [0.7, 0.8])

= sigmoid([0.65 + 0.95, 0.75 + 1.05])

= sigmoid([1.6, 1.8])

= [0.82, 0.87]

Hidden State Update

h_t = o_t * tanh(c_t)

= [0.82, 0.87] * tanh([0.479, 0.63])

= [0.82, 0.87] * [0.44, 0.53]

= [0.36, 0.46]

Output

y_t = h_t

= [0.36, 0.46]

This completes the LSTM calculation for one time step.

Here's a small mathematical example for a GRU (Gated Recurrent Unit) network:

Example:

Suppose we have a GRU network with:

Input dimension: 1

Hidden dimension: 2

Input at time t (x_t)

x_t = 0.5

Previous Hidden State (h_(t-1))

h_(t-1) = [0.2, 0.3]

Weight Matrices and Bias Vectors

W_z = [[0.1, 0.2], [0.3, 0.4]]

W_r = [[0.5, 0.6], [0.7, 0.8]]

W_h = [[0.9, 1.0], [1.1, 1.2]]

U_z = [[1.3, 1.4], [1.5, 1.6]]

U_r = [[1.7, 1.8], [1.9, 2.0]]

U_h = [[2.1, 2.2], [2.3, 2.4]]

b_z = [0.1, 0.2]

b_r = [0.3, 0.4]

b_h = [0.5, 0.6]

Calculations

Update Gate

z_t = sigmoid(W_z * x_t + U_z * h_(t-1) + b_z)

= sigmoid([[0.1, 0.2], [0.3, 0.4]] * 0.5 + [[1.3, 1.4], [1.5, 1.6]] * [0.2, 0.3] + [0.1, 0.2])

= sigmoid([0.05 + 0.45, 0.1 + 0.55])

= sigmoid([0.5, 0.65])

= [0.62, 0.66]

Reset Gate

r_t = sigmoid(W_r * x_t + U_r * h_(t-1) + b_r)

= sigmoid([[0.5, 0.6], [0.7, 0.8]] * 0.5 + [[1.7, 1.8], [1.9, 2.0]] * [0.2, 0.3] + [0.3, 0.4])

= sigmoid([0.25 + 0.65, 0.35 + 0.75])

= sigmoid([0.9, 1.1])

= [0.71, 0.75]

Hidden State Update

h~t = tanh(W_h * x_t + U_h * (r_t * h(t-1)) + b_h)

= tanh([[0.9, 1.0], [1.1, 1.2]] * 0.5 + [[2.1, 2.2], [2.3, 2.4]] * ([0.71, 0.75] * [0.2, 0.3]) + [0.5, 0.6])

= tanh([0.45 + 0.55, 0.55 + 0.65])

= tanh([1.0, 1.2])

= [0.58, 0.62]

Hidden State

h_t = (1 - z_t) * h_(t-1) + z_t * h~_t

= (1 - [0.62, 0.66]) * [0.2, 0.3] + [0.62, 0.66] * [0.58, 0.62]

= [0.38, 0.42] + [0.36, 0.41]

= [0.74, 0.83]

This completes the GRU calculation for one time step.

Here are examples of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks:

LSTM Example

Python

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt

# Generate sample dataset (time series data)
np.random.seed(0)
time_steps = 100
future_pred = 30
data = np.sin(np.linspace(0, 10 * np.pi, time_steps)) + 0.2 * np.random.normal(0, 1, time_steps)

# Plot original data
plt.figure(figsize=(10, 6))
plt.plot(data)
plt.title('Original Data')
plt.show()

# Scale data
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data.reshape(-1, 1))

# Split data into training and testing sets
train_size = int(0.8 * len(data_scaled))
train_data, test_data = data_scaled[0:train_size], data_scaled[train_size:]

# Split data into X (input) and y (output)
def split_data(data, future_pred):
X, y = [], []
for i in range(len(data) - future_pred):
X.append(data[i:i + future_pred])
y.append(data[i + future_pred])
return np.array(X), np.array(y)

X_train, y_train = split_data(train_data, future_pred)
X_test, y_test = split_data(test_data, future_pred)

# Reshape data for LSTM input
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Build LSTM model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(future_pred, 1)))
model.add(LSTM(50, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))

# Compile model
model.compile(optimizer='adam', loss='mean_squared_error')

# Early stopping callback
early_stopping = EarlyStopping(patience=5, min_delta=0.001)

# Train model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), callbacks=[early_stopping])

# Make predictions
predictions = model.predict(X_test)

# Plot predictions
plt.figure(figsize=(10, 6))
plt.plot(y_test, label='Actual')
plt.plot(predictions, label='Predicted')
plt.legend()
plt.title('Predictions')
plt.show()

GRU Example

Python
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt

# Generate sample dataset (time series data)
np.random.seed(0)
time_steps = 100
future_pred = 30
data = np.sin(np.linspace(0, 10 * np.pi, time_steps)) + 0.2 * np.random.normal(0, 1, time_steps)

# Plot original data
plt.figure(figsize=(10, 6))
plt.plot(data)
plt.title('Original Data')
plt.show()

# Scale data
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data.reshape(-1, 1))

# Split data into training and testing sets
train_size = int(0.8 * len(data_scaled))
train_data, test_data = data_scaled[0:train_size], data_scaled[train_size:]

# Split data into X (input) and y (output)
def split_data(data, future_pred):
X, y = [], []
for i in range(len(data) - future_pred):
X.append(data[i:i + future_pred])
y.append(data[i + future_pred])
return np.array(X), np.array(y)

X_train, y_train = split_data(train_data, future_pred)
X_test, y_test = split_data(test_data, future_pred)

# Reshape data for GRU input
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Build GRU model
model = Sequential()
model.add(GRU(50, activation='relu', return_sequences=True, input_shape=(future_pred, 1)))
model.add(GRU(50, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))

# Compile model
model.compile(optimizer='adam', loss='mean_squared_error')

# Early stopping callback
early_stopping = EarlyStopping(patience=5, min_delta=0.001)

# Train model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), callbacks=[early_stopping])

# Make predictions
predictions = model.predict(X_test)

# Plot predictions
plt.figure(figsize=(10, 6))
plt.plot(y_test, label='Actual')
plt.plot(predictions, label='Predicted')
plt.legend()
plt.title('Predictions')
plt.show()

Key Differences:

Architecture:

LSTM has three gates (input, output, and forget) and three state vectors (cell state and two hidden states).

GRU has two gates (update and reset) and two state vectors (hidden state).

Computational Complexity:

LSTM is computationally more expensive due to the additional gate and state.

GRU is faster and more efficient.

Performance:

LSTM generally performs better on tasks requiring longer-term dependencies.

GRU performs better on tasks with shorter-term dependencies.

Use Cases:

LSTM:

Language modeling

Text generation

Speech recognition

GRU:

Time series forecasting

Speech recognition

Machine translation

These examples demonstrate basic LSTM and GRU architectures. Depending on your specific task, you may need to adjust parameters, add layers, or experiment with different optimizers and loss functions.

Wholesale Customer Solution with Magento Commerce

The client want to have a shop where regular customers to be able to see products with their retail price, while Wholesale partners to see the prices with ? discount. The extra condition: retail and wholesale prices hasn’t mathematical dependency. So, a product could be $100 for retail and $50 for whole sale and another one could be $60 retail and $50 wholesale. And of course retail users should not be able to see wholesale prices at all. Basically, I will explain what I did step-by-step, but in order to understand what I mean, you should be familiar with the basics of Magento. 1. Creating two magento websites, stores and views (Magento meaning of website of course) It’s done from from System->Manage Stores. The result is: Website | Store | View ———————————————— Retail->Retail->Default Wholesale->Wholesale->Default Both sites using the same category/product tree 2. Setting the price scope in System->Configuration->Catalog->Catalog->Price set drop-down to...

Think Different

Search This Blog

LSTM and GRU

Labels

Comments

Popular posts from this blog

Financial Engineering

Wholesale Customer Solution with Magento Commerce

How to Prepare for AI Driven Career