Understanding Markov Chains: Connections to Neuroscience, AI, and Poetry
Written on
Chapter 1: The Significance of Stochastic Processes
It may seem unusual to kick off a discussion about Markov chains—also known as Markov processes—with a quote from a Russian poem, but this link showcases the fascinating evolution of probability theory in the early 20th century.
Markov chains have become integral to modern technology, playing a vital role in fields like natural language processing, voice recognition, web search, and even self-driving vehicles. Before we explore the intersection of Markov chains and Russian literature, let's first highlight the broader importance of stochastic processes, starting with the concept of a random walk.
In scientific endeavors, ideas often seem to emerge simultaneously from various minds. This was evident at the turn of the 19th century when researchers recognized the significance of stochastic processes, such as random walks. Over a few years, random walks were linked to topics like mosquito population dynamics, Brownian motion, acoustics, and financial markets.
Formally defined, a random walk is a stochastic process that depicts the trajectory of a subject within a mathematical framework, which can include integers or multi-dimensional Euclidean spaces. Starting from an initial position, we take n steps by aggregating random variables drawn from a probability distribution at each turn. In this case, X represents the position in mathematical space, n is the total number of steps, and Zj are random variables following a specific distribution.
To illustrate, picture yourself in an infinitely large room flipping coins. Each time you land on heads, you step left; on tails, you step right. After 100 tosses, where will you end up? This can be easily modeled using Python, allowing us to visualize multiple instances of this process.
def random_walk(x0, p, steps):
random_walk = []
random_walk.append(x0)
x = x0
for i in range(steps):
s = np.random.binomial(1, p, size=None)
if s == 1:
x += 1else:
x += -1random_walk.append(x)
return random_walk
The outcome after 100 flips can vary greatly, even though each step has an equal chance of going either way. While this process is largely random, it’s noteworthy that the law of large numbers implies that the final positions will tend to follow a Gaussian distribution with a mean of zero and a variance proportional to the square root of the number of steps. This can be demonstrated by plotting a histogram of the final positions across a sample of 500 random walks.
Although random walks may seem straightforward, it took scientists considerable time to recognize the vital role randomness plays in modeling our understanding of the world. The concept was only proposed by Karl Pearson in 1905.
Grasping that randomness can lead to vastly different outcomes can be challenging. As Robert Frost might put it: two paths diverged in a yellow wood, and I took the one determined by a coin toss, which made all the difference.
Despite our instinct to seek explanations, the properties of stochastic processes can be perplexing. Historically, Newtonian mechanics aimed to eliminate randomness, and the resistance to Boltzmann’s statistical interpretation of thermodynamics reflects our discomfort with incorporating chance into our theories. Nonetheless, random walks are now utilized across various scientific domains, where a stochastic description often proves more useful than a deterministic one, whether examining the flight patterns of malaria mosquitoes, molecular movements, or brain processes.
This conceptual shift has been crucial and remains relevant. Daniel Kahneman’s recent book "Noise" provides compelling examples of our struggle to acknowledge the significant impact of noisy, random processes in daily life, illustrating how poor judgments can lead to financial losses and, in the justice system, devastating consequences.
Section 1.1: Exploring Connections to Russian Poetry
Now that we grasp the basics of random walks, let’s circle back to Russian poetry. If we were to approach a text like "Eugene Onegin" without preconceived notions of language or poetry, we might view it abstractly as a sequence of interconnected events, where every letter represents an event. By analyzing "transition probabilities" between these events, we can uncover underlying structures.
We can categorize transitions into four types:
- Consonant → Consonant
- Consonant → Vowel
- Vowel → Vowel
- Vowel → Consonant
Markov, the pioneer of Markov chains, pondered what these probabilities might look like for a poem like "Eugene Onegin." By stripping away punctuation and analyzing the first 20,000 letters, we can employ a program to conduct this analysis.
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
vowels = ['a', 'e', 'i', 'o', 'u']
text = "mydreamsmydreamswhathasbecomeoftheirsweetnesswhatindeedhasbecomeofmyyouth"
def simple_markov_chain(text):
cv = cc = vv = vc = 0
for i in range(len(text) - 1):
if text[i] in consonants and text[i + 1] in consonants:
cc += 1if text[i] in consonants and text[i + 1] in vowels:
cv += 1if text[i] in vowels and text[i + 1] in vowels:
vv += 1if text[i] in vowels and text[i + 1] in consonants:
vc += 1
return cc / (len(text) - 1), vv / (len(text) - 1), cv / (len(text) - 1), vc / (len(text) - 1)
Analyzing this brief excerpt yields the following transition probabilities:
- Consonant → Consonant: 0.43
- Consonant → Vowel: 0.11
- Vowel → Vowel: 0.25
- Vowel → Consonant: 0.26
We can depict these transitions in a simple diagram representing a Markov chain. Although this analysis is based on a small sample, Markov noted that these transition probabilities can vary across texts by different authors, revealing insights into the probabilistic structure of the text itself.
Furthermore, it's intriguing to set these probabilities for various languages, allowing for comparisons of conserved or differing structures—an area of great interest to linguists.
The advantage of modeling text as a Markov chain is that it eliminates unrealistic independence assumptions. For instance, if we were to draw letters completely independently, the probability of encountering a consonant would hover around 68%, yet this drops to about 50% following a vowel, while it rises to 80% after another consonant.
Thus, the simple Markov chain described earlier accurately captures properties of real text in a more nuanced manner.
Section 1.2: The Broader Implications of Stochastic Models
Taking a broader view, similar dependency assumptions apply in other areas, such as weather forecasting, where consecutive rainy days are more likely to follow one another.
The realization that many real-world processes conceal structures expressible in probabilistic terms is profound. Contemporary neuroscience theories, such as the Bayesian brain hypothesis, place this probabilistic framework at the forefront of understanding how our brains interpret the world. Operating in environments laden with partial information, our brains must work with assumptions that much of our surroundings are influenced by noise or unpredictable factors. As a result, planning becomes a probabilistic endeavor, much like our unpredictable journeys influenced by chance.
Probabilistic models also lend themselves to generative modeling. As the term suggests, generative models can create entirely new data after learning from an existing framework, allowing predictions about future states of a system. These models find applications in various fields, including image, text, and audio generation.
To illustrate a basic Markov chain, we can develop a generative model that predicts potential text by sampling an initial state (vowel or consonant) based on baseline probabilities, then generating a sequence of states akin to the earlier random walk example.
import random
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
vowels = ['a', 'e', 'i', 'o', 'u']
def markov(iterations):
text = []
s0 = np.random.binomial(1, 0.32, size=None)
if s0 == 1:
l0 = random.choice(vowels)
text.append(l0)
else:
l0 = random.choice(consonants)
text.append(l0)
for i in range(iterations):
if s0 == 1:
l = random.choice(vowels)
text.append(l)
s0 = np.random.binomial(1, 0.49, size=None)
else:
l = random.choice(consonants)
text.append(l)
s0 = np.random.binomial(1, 0.2, size=None)
string = ''.join(text)
return string
Sampling 100 letters yields lines like:
iewnjgfqxqqlyoqbuoquwpgaavonozbylhjdnjyvtqaakxgmqpiigjauzwvxznqhqkhryiiaxcfblywigbjpshcnnhviqkfrpwai.
While we have the capability to generate random text, it’s evident that this basic Markov chain, while it captures some essential dependencies, is overly simplistic for generating meaningful language. Sampling all letters uniformly is unrealistic and typically produces nonsensical outputs.
Another limitation stems from the Markov property, which states that predicting the future state of the system requires only knowledge of its current state; past or future states are irrelevant. However, the definition of a "state" is open to interpretation. Analyzing language at the level of individual vowels and consonants is not the only approach.
For instance, while double consonants are common, three consecutive vowels are exceedingly rare, suggesting that a vowel following two vowels should be less likely. Therefore, we could expand our diagram to encompass larger sequences of letters (e.g., vowel-vowel → vowel-vowel, vowel-vowel → consonant-vowel) but this would exponentially increase the number of states in the chain.
As with many computer science concepts, there’s a trade-off between model complexity, computational demands, and overall quality. To create a more realistic language model, we can connect every possible word to others and establish transition probabilities among them, or examine transitions between different letters.
Autocorrect is a familiar example of this concept. It demonstrates how predicting transition probabilities and likely words based on input can be immensely beneficial in daily life, even if it occasionally results in absurd yet grammatically correct sentences.
Markov chains have vast potential applications. Google’s PageRank algorithm, for example, employs Markov chains to link billions of web pages and assign probabilities to transitions. To accurately model language, we must consider incorporating sufficient long-term dependencies, as these are crucial for generating coherent human-like texts. Advanced models like GPT-3 utilize transformers to express these dependencies efficiently.
In summary, Markov chains serve as powerful tools with applications across AI and engineering. More importantly, they provide a conceptual framework that aids in understanding the probabilistic structures underpinning much of reality, offering insights into how scaling these models can lead to robust representations of the world.