Is a Markov Chain the Same as a Finite State Machine?

1. Introduction

In this tutorial, we’ll study the differences between finite-state machines and Markov chains. Both systems are useful tools to describe time-dependent processes. Therefore, it makes sense to study them together as they indeed have some relationships with one another.

We’ll begin by studying finite-state machines and Markov chains, independently. Then, we’ll compare their differences and similarities and learn how to distinguish between the two.

2. Finite-State Machines

A finite-state machine is an automaton that we typically use to represent a process of computation, like the one that takes place in computers. For this reason, it’s frequently introduced as the first model of computation in undergraduate handbooks for theoretical computer science.

2.1. State Space

A finite-state machine makes use of the concept of states and attempts to model the process of change between the one that assumes at any given time and the next. The definition of a state is normally left outside of the formal definition of an automaton. In this sense, it constitutes one of the foundational concepts that are first described by analogies and only later defined formally. We can think of states, counter-intuitively maybe, in terms of things that change.

Certain physical processes lead objects to change configuration, shape, or form and assume one of a finite number of configurations, shapes, or forms which we can track over time. The typical example, in this sense, is that of water. Water, in fact, can assume one of three possible states according to the temperature at which we heat it (or respectively, cool it):

It makes sense to treat the possible configurations of water as a collection of three states, and not to distinguish further or finer, because we cannot identify any discernible features that allow us to further subset the states into micro-states. If we cannot find any such features to discriminate between states, by the principle of identity of indiscernible, then the two states are the same.

The set of states associated with a particular process takes the name of state space. It’s common to refer to individual states with the letter $s$ and a sequential natural number in subscript, so that $s_n$ indicates the state number $n$ in a collection $S={s_0, s_1, s_2, ..., s_n, ..., s_m}$ of possible states.

2.2. Rule Set or State-Transition Function

The second component of a finite-state machine is a set of rules that indicate the transition between states. This set of rules comprises a set of operations associated with each of the possible states of a machine and assigns to them a corresponding output state. In this sense, we can consider the set of rules as a function of state transition: in fact, this function tells us how to associate any of the possible states in the state space with another.

If $\delta$ is the symbol that indicates the state transition function, then we can think of $\delta$ as a function that maps the state space $S$ onto itself: $\delta: S \to S$ . Technically, the formal definition includes an alphabet of symbols, but this can be omitted for simplification purposes.

The simplest state transition function is the one that associates every state with itself: $\forall s \in S: \delta(s)=s$ . This state machine does, today and for all eternity, absolutely nothing! We can imagine observing a machine with this rule in regular intervals, and we would observe that if we know the state at any given time, we also know its state at any time in the future.

More complex machines can have more sophisticated state transitions. For example, they could associate each state in the state space to the next: $\delta(s_n) = s_{n+1}$ . If the set of states is finite, the last state in the set would need to map to some other state or to itself, or the state transition function would not hold.

Much more complex finite-state machines exist, and the domain of applicability is the primary determinant of the size of the state space and the state transition function. Interestingly, even though finite-state machines were conceptual tools used to model the process of a simple mechanical computation, they can now be used to simulate and facilitate the design of much more sophisticated and modern architectures of computers.

3. Markov Chains

3.1. Definition of a Markov Chain

A Markov chain is a model that we can use to describe stochastic time-dependent processes. A stochastic process is a process where the transition from the current state to the next is determined not only by the current state of the system, but also by an element of surprise or randomness that require a probabilistic description as opposed to a deterministic one.

To describe a Markov chain, we need one extra element that’s missing in the definition of finite-state machines: a distribution of probabilities that maps each state to each state, including the original state itself. If the Markov chain comprises $n$ states, it’s thus appropriate to use an $n \times n$ matrix to store these probabilities:

	$1$	$2$	$\cdots$	$n-1$	$n$
$1$	$P(1,1)$	$P(1,2)$	$\cdots$	$P(1,n-1)$	$P(1,n)$
$2$	$P(2,1)$	$P(2,2)$	$\cdots$	$P(2,n-1)$	$P(2,n)$
$\vdots$	$\vdots$	$\vdots$	$\vdots$	$\vdots$	$\vdots$
$n-1$	$P(n-1,1)$	$P(n-1,2)$	$\cdots$	$P(n-1,n-1)$	$P(n-1,n)$
$n$	$P(n,1)$	$P(n,2)$	$\cdots$	$P(n,n-1)$	$P(n,n)$

Above, each row of the matrix indicates the initial state, and each column is the transition state. The values contained in each row must therefore sum to 1; if this is not the case, in the sense that the row-wise probability distribution for the observed states amounts to less than 1, then we’re likely working with a hidden Markov chain. The latter is a generalization of Markov chains for which the set of possible states of a system is not fully observed.

3.2. Simplest Markov Chains

The simplest and most trivial Markov chain is one in which the probability distribution is such that, for any given initial state, there is only one to which transition can occur. This corresponds to having, for each row of the transition matrix, one and only one element with probability 1.

If, for example, the system cycles between three out of its only three states, then the Markov chain is fully connected. Its corresponding transition matrix is this:

	$S_1$	$S_2$	$S_3$
$S_1$	0	1	0
$S_2$	0	0	1
$S_3$	1	0	0

We can represent graphically this state transition matrix as a directed graph that contains a cycle:

3.3. Markov Chains and Connectedness

A Markov chain does not need to be fully connected though, and can in fact contain connected subgraphs while itself being disconnected:

The disjoint union of non-hidden Markov chains always comprises disconnected graphs.

3.3. A Sample Application of Markov Chains

Certain stochastic processes in computer science are typically modeled with the use of Markov chains. Let’s suppose we are developing a videogame, and we want to model the behavior of the final boss that the player has to defeat.

The boss has three behaviors: it defends from the player most of the time and only sporadically attacks them while simultaneously exposing itself to counterattacks. Very rarely, the boss withdraws and summons minions to assist in combat. If it’s summoning, it never goes into attack immediately after finishing but defends and waits for the player to defeat the summoned minions instead.

The transition of this boss from one pattern of behavior to another can well be modeled via a Markov chain with a transition matrix similar to this:

	Defend	Attack	Summon
Defend	0.6	0.3	0.1
Attack	0.9	0	0.1
Summon	1	0	0

The graph below represents the Markov chain that describes the state transition for this boss:

4. Is a Markov Chain the Same as a Finite State Machine?

We’re now ready to determine whether a Markov chain is the same as a finite state machine. They’re very similar, but an important difference exists between them.

These are the similarities between the two models, where both models:

are used to describe processes that depend on the time
require a set of states and a set of rules for transitioning between them
depend exclusively on the current state of the system and hold no memory of the history of the system’s states

We have also noted how an important difference exists between them, though. While the finite-state machine is deterministic in nature, the Markov chain requires a distribution of probabilities to model the transition between states. In the limit case, where the transition from any state to the next is defined by a probability of 1, a Markov chain corresponds to a finite-state machine.

In practice, however, we’ll end up using Markov chains for modeling non-deterministic systems, and finite-state machines to model deterministic ones.

5. Conclusions

In this article, we studied the differences and the similarities between finite-state machines and Markov chains.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung