Introducing the attention mechanism into RNN to solve sequence prediction problems in five major areas

[[198915]]

The encoder-decoder structure is popular because it has shown the current state of the art in many fields. The limitation of this structure is that it encodes the input sequence into an internal representation of a fixed length. This limits the length of the input sequence and causes the model to perform poorly for particularly long input sequences.

In this blog post, we will discover that we can overcome this limitation by using an attention mechanism in recurrent neural networks.

After reading this blog, you will know:

Limitations of encoder-decoder architecture and fixed-length internal representations
Let the network learn to pay attention to the corresponding position in the input sequence for each item in the output sequence
Application of recurrent neural networks with attention mechanism in 5 fields including text translation, speech recognition, etc.

The problem with long sequences

In an encoder-decoder recurrent neural network, a series of long short-term memory networks (LSTMs) learn to encode the input sequence into a fixed-length internal representation, and another part of the LSTM network reads the internal representation and decodes it into the output sequence. This structure has demonstrated the current state-of-the-art in difficult sequence prediction problems (such as text translation) and has quickly become the dominant method. For example, the following two papers:

Sequence to Sequence Learning with Neural Networks (2014)
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014)

The encoder-decoder architecture is still able to achieve excellent results on many problems. However, it suffers from a limitation that all input sequences are forced to be encoded into internal vectors of fixed length. This limitation limits the performance of these networks, especially when considering longer input sequences, such as long sentences in text translation.

“One potential problem with this encoder-decoder approach is that the neural network needs to compress all the necessary information in the source sentence into a fixed-length vector. This makes it difficult for the neural network to process long sentences, especially those longer than the training corpus.”

—— Dzmitry Bahdanau, et al., Neural machine translation by jointly learning to align and translate, 2015

Attention mechanism in sequence

The attention mechanism is a method that frees the encoder-decoder architecture from fixed-length internal representations. It works by keeping the intermediate output of the LSTM encoder for each step of the input sequence, and then training the model to learn how to selectively pay attention to the inputs and associate them with items in the output sequence. In other words, each item in the output sequence depends on the selected items in the input sequence.

“The model proposed in the paper searches for the most relevant information in a sequence of positions in the source sentence for each word generated during translation. It then predicts the next target word based on the context vector and the positions in the source text and the previously generated target words.” “…The model encodes the input sentence into a sequence of vectors and adaptively selects a subset of these vectors when decoding the translation. This saves the neural translation model from having to compress all the information in source sentences of varying lengths into a fixed-length vector.”

——Dzmitry Bahdanau, et al., Neural machine translation by jointly learning to align and translate (https://arxiv.org/abs/1409.0473), 2015

Although this will increase the computational burden of the model, it will form a more targeted and better performing model. In addition, the model can also show how to focus on the input sequence when predicting the output sequence. This will help us understand and analyze what the model is focusing on and to what extent it focuses on specific input-output pairs.

“The proposed method allows us to visually observe the (soft) alignment of each word in the generated sequence with some words in the input sequence, which can be achieved by visualizing the annotation weights… Each row of the matrix in each figure represents the weight associated with an annotation. This allows us to see which position in the source sentence is emphasized when generating the target word.”

——Dzmitry Bahdanau, et al., Neural machine translation by jointly learning to align and translate (https://arxiv.org/abs/1409.0473), 2015

Problems with using large images

Convolutional neural networks used in computer vision face a similar problem: it is difficult to train models with very large images. As a result, images are observed a large number of times to get an approximate impression before making a prediction.

“An important feature of human perception is that we tend not to process the entire scene at once, but selectively focus our attention on certain parts of the visual space to obtain the required information, and combine local information at different time points to construct an internal representation of the entire scene, thereby guiding subsequent eye movements and decisions.”

——Recurrent Models of Visual Attention (https://arxiv.org/abs/1406.6247), 2014

These glimpse-based corrections can also be considered attention mechanisms, but they are not the attention mechanisms discussed in this article.

5 Examples of Using Attention Mechanism for Sequence Prediction

This section gives some concrete examples of combining attention mechanisms with recurrent neural networks for sequence prediction.

1. Attention Mechanism in Text Translation

We have already mentioned the example of text translation. Given an input sequence of French sentences, translate it and output an English sentence. The attention mechanism is used to observe the specific words in the input sequence that correspond to each word in the output sequence.

“We extend the basic encoder-decoder architecture by having the model search for some input words or word annotations computed by the encoder when generating each target word. This frees the model from having to encode the entire source sentence into a fixed-length vector and allows the model to focus only on information relevant to the next target word.”

——Dzmitry Bahdanau, et al., Neural machine translation by jointly learning to align and translate (https://arxiv.org/abs/1409.0473, 2015

Figure caption: Columns are input sequences, rows are output sequences, and the highlighted blocks represent the association between the two. The lighter the color, the stronger the association.

Image from the paper: Dzmitry Bahdanau, et al., Neural machine translation by jointly learning to align and translate, 2015

2. Attention Mechanism in Image Description

Unlike the glimpse method, sequence-based attention mechanisms can be applied to computer vision problems to help find ways to better use convolutional neural networks to focus on input images when outputting sequences, such as in typical image description tasks. Given an input image, output an English description of the image. The attention mechanism is used to focus on the local image associated with each word in the output sequence.

“We propose an attention-based approach that achieves state-of-the-art performance on three benchmark datasets… We also show how to use the learned attention mechanism to provide more interpretability to the model generation process, demonstrating that the learned alignment is highly consistent with human intuition.”

—— Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2016

Figure caption: Similar to the above figure, the underlined words in the output text correspond to the floodlit areas in the right picture

Image from the paper: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2016

3. Attention Mechanism in Semantic Entailment

Given a premise scenario and a hypothesis about the scenario in English, the output is whether the premise and hypothesis contradict each other, whether the two are related to each other, or whether the premise implies the hypothesis.

For example:

Prerequisite: "Photos from the wedding"
Assumption: "Someone is getting married"

The attention mechanism is used to associate each word in the hypothesis with the words in the premise and vice versa.

We propose a neural LSTM-based model that reads two sentences as one for semantic entailment analysis, rather than encoding each sentence independently into a semantic vector. We then extend the model with a neural word-by-word attention mechanism to encourage reasoning about entailment relationships between pairs of words and phrases… The extended model achieves a 2.6% improvement on the benchmark score of the LSTM, setting a new accuracy record…

——Reasoning about Entailment with Neural Attention (https://arxiv.org/abs/1509.06664), 2016

Image from the paper: Reasoning about Entailment with Neural Attention, 2016

4. Attention Mechanism in Speech Recognition

Given an English speech segment as input, it outputs a sequence of phonemes. The attention mechanism is used to associate each phoneme in the output sequence with a specific speech frame in the input sequence.

“…propose a novel end-to-end trainable speech recognition architecture based on a hybrid attention mechanism that combines both content and position information to select the next position in the input sequence during decoding. The model is promising in that it can recognize speech longer than the corpus it was trained on.”

——Attention-Based Models for Speech Recognition (https://arxiv.org/abs/1506.07503), 2015.

Image from the paper: Attention-Based Models for Speech Recognition, 2015.

5. Attention Mechanism in Text Summarization

Given an English article as an input sequence, output an English text to summarize the input sequence. The attention mechanism is used to associate each word in the summary text with the corresponding word in the source text.

“…propose a model for abstractive summarization based on a neutral attention mechanism, building on recent advances in neural machine translation. We combine this probabilistic model with a generative algorithm that produces accurate abstractive summaries.”

——A Neural Attention Model for Abstractive Sentence Summarization (https://arxiv.org/abs/1509.00685), 2015

Image from the paper: A Neural Attention Model for Abstractive Sentence Summarization, 2015.

Summarize

This blog post introduces the use of attention mechanism in LSTM recurrent neural networks for sequence prediction.

Specifically:

The encoder-decoder structure in recurrent neural networks uses a fixed-length internal representation, which imposes limitations on the learning of very long sequences.
The attention mechanism overcomes this limitation of the encoder-decoder architecture by allowing the network to learn to correspond each item in the output sequence to the corresponding item in the input sequence.
This method has been applied to a variety of sequence prediction problems, including text translation, speech recognition, etc.

<<: Activity launch mode (launchMode) detailed explanation

>>: The second round of recruitment of Aiti tribe administrators has begun

Zhu Chaodong丨We dug deep into the ground to find traces of wild bees

Black Widow withdrawn from theaters, release date to be determined, originally planned to be released in North America on May 1

Blog

Only 5 steps to achieve a cold start from 0 to 1

Blog

Academician Huang Xuhua: Pursuing the dream of nuclear submarines throughout his life, his achievements will be recorded in history forever

At 20:30 on February 6, 2025, Huang Xuhua, an aca...

Case Analysis | What should we pay attention to when using the invitation registration mechanism for low-cost customer acquisition?

The invitation registration mechanism is a functi...

Introducing the attention mechanism into RNN to solve sequence prediction problems in five major areas

The problem with long sequences

Attention mechanism in sequence

Problems with using large images

5 Examples of Using Attention Mechanism for Sequence Prediction

1. Attention Mechanism in Text Translation

2. Attention Mechanism in Image Description

3. Attention Mechanism in Semantic Entailment

4. Attention Mechanism in Speech Recognition

5. Attention Mechanism in Text Summarization

Further reading

Summarize

Zhu Chaodong丨We dug deep into the ground to find traces of wild bees

Fructose, friend or foe?

Aiti Tribe Stories (27): The Growth Diary of a Neighboring Operations Engineer

Ice and fire, Kilimanjaro on the equator is a snowy mountain and also a volcano!

From 700 billion to 1 trillion: Apple's crisis of prosperity

Android and iOS are drifting apart: whether to copy the Windows model is the difference

Nut 3 review: Outrageous balanced design and battery life will move you to tears

Let the concept car hit the road! Roewe MARVEL X reaches 100 km/h in 4 seconds, redefining smart cars

Black Widow withdrawn from theaters, release date to be determined, originally planned to be released in North America on May 1

Only 5 steps to achieve a cold start from 0 to 1

Recommend

Will the smallest robot be smaller than an ant?

Several J-10s took off with missiles!

Yibin Mini Program Agency, how much does it cost to be an agent for a film and television mini program?

What is it like to have an international award at the entrance of the village? | Great Power Technology

What? Why are satellites scheduled for morning and afternoon shifts?

Thanks to Chinese companies, Qualcomm's profits are higher than Apple's

How to develop an online event planning plan?

Academician Huang Xuhua: Pursuing the dream of nuclear submarines throughout his life, his achievements will be recorded in history forever

Are Brands Misreading Metaverse Marketing?

Hejun "Zeng Qiao's Capital Observation" Season 5

The last capital! This ancient city wall ruins brings new breakthroughs in Shouchun archaeology

Case Analysis | What should we pay attention to when using the invitation registration mechanism for low-cost customer acquisition?

How to quickly understand user operations without struggling to find traffic?

Science glove box: Unveiling the mysteries of space experiments

These fields of marine science and technology have achieved fruitful results!