Skip to main content

Posts

Showing posts from October 17, 2024

Multi-Head Attention and Self-Attention of Transformers

  Transformer Architecture Multi-Head Attention and Self-Attention are key components of the Transformer architecture, introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017. Self-Attention (or Intrusive Attention) Self-Attention is a mechanism that allows the model to attend to different parts of the input sequence simultaneously and weigh their importance. It's called "self" because the attention is applied to the input sequence itself, rather than to some external context. Given an input sequence of tokens (e.g., words or characters), the Self-Attention mechanism computes the representation of each token in the sequence by attending to all other tokens. This is done by: Query (Q): The input sequence is linearly transformed into a query matrix. Key (K): The input sequence is linearly transformed into a key matrix. Value (V): The input sequence is linearly transformed into a value matrix. Compute Attention Weights: The dot product of Q an...

CNN, RNN & Transformers

Let's first see what are the most popular deep learning models.  Deep Learning Models Deep learning models are a subset of machine learning algorithms that utilize artificial neural networks to analyze complex patterns in data. Inspired by the human brain's neural structure, these models comprise multiple layers of interconnected nodes (neurons) that process and transform inputs into meaningful representations. Deep learning has revolutionized various domains, including computer vision, natural language processing, speech recognition, and recommender systems, due to its ability to learn hierarchical representations, capture non-linear relationships, and generalize well to unseen data. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) The emergence of CNNs and RNNs marked significant milestones in deep learning's evolution. CNNs, introduced in the 1980s, excel at image and signal processing tasks, leveraging convolutional and pooling layers to extract...