One-Sentence Definition
A recurrent neural network (RNN) is a type of neural network designed to process sequential data -- like text, audio, or time series -- by maintaining a hidden state that carries information from one step in the sequence to the next.
How It Works
Standard neural networks process each input independently. An RNN adds a feedback loop: at each step in a sequence, the network takes two inputs -- the current data point and a hidden state vector summarizing everything it has seen so far. It produces an output and an updated hidden state, which is passed to the next step. This allows the model to maintain context across a sequence.
Consider reading a sentence word by word. After processing "The cat sat on the," the hidden state encodes the context needed to predict that "mat" or "floor" is more likely than "airplane." The hidden state acts as a compressed memory of the sequence so far.
Basic RNNs struggle with long sequences because gradients can vanish or explode during training, making it hard to learn dependencies between words that are far apart. Two variants solved this problem. Long Short-Term Memory networks (LSTMs), introduced by Hochreiter and Schmidhuber in 1997, added gating mechanisms -- input, forget, and output gates -- that control what information to keep or discard. Gated Recurrent Units (GRUs) simplified the LSTM design with fewer parameters while achieving comparable performance.
From roughly 2014 to 2017, LSTMs and GRUs were the dominant architecture for machine translation (Google Translate), speech recognition (Apple Siri), and text generation. The encoder-decoder pattern, where one RNN reads an input sequence and another generates an output sequence, powered the first wave of neural machine translation.
Why It Matters
RNNs introduced the idea that neural networks could process data with a temporal or sequential dimension, which was a conceptual breakthrough. However, the transformer architecture (introduced in 2017) largely replaced RNNs for most language and sequence tasks because transformers process all positions in parallel rather than one at a time, enabling much faster training on modern GPUs.
In 2026, pure RNNs are rare in production language systems. But the ideas live on. State-space models like Mamba borrow the sequential processing concept from RNNs while solving the parallelization problem. RWKV, an open-source model, combines RNN-like recurrence with transformer-like training efficiency. And LSTMs still run in embedded systems and edge devices where their small footprint and low latency matter -- hearing aids, keyword detection on smart speakers, and real-time sensor processing.
Key Takeaway
Recurrent neural networks process sequences by maintaining a running hidden state, and while transformers have replaced them for most language tasks, the RNN concept of sequential memory continues to influence modern architectures like Mamba and RWKV.
Part of the AI Weekly Glossary.