How to Convert Finite Automata to Regular Expressions?

1. Overview

In this tutorial, we’ll explain how to convert finite automata to regular expressions using the state elimination method.

Regular expressions are used in many applications, and most programming languages offer a way to implement them.

2. Problem Description

Let $A$ be a finite automaton and $\alpha_{pq}$ the transition label of the branch going from $p$ to $q$ , where they denote any two states of $A$ . The conversion algorithm should return a regular expression defining the regular language that $\boldsymbol{A}$ accepts.

For example, let’s say we have the following automaton:

The regular expression it’s equivalent to is $b^*a(a+bb^*a)^*$ .

A use case for this is natural language processing (NLP). NLP engineers often design string-matching patterns, and for them, it’s much easier to read a regular expression than to look at a complex finite automaton.

3. The State Elimination Method

We first need to make $\boldsymbol{A}$ uniform. A uniform finite automaton has an initial state with no incoming transitions and a final state with no outgoing transitions.

Then, we repeatedly remove a randomly selected non-endpoint state and compute the regular expressions of the paths between the remaining pairs until we reduce $\boldsymbol{A}$ to only the initial and final states.

There are other methods besides the state elimination method; these include Adren’s method, Brzozowski’s algebraic method and the transitive closure method. We’ll show this one because it’s simple and easy to understand and implement.

3.1. Pseudocode

Here’s the pseudocode:

algorithm StateEliminationMethod(A):
    // INPUT
    //    A = a uniform finite automaton
    // OUTPUT
    //    a regular expression defining the regular language that A accepts

    while A contains more than two states:
        Randomly choose a state k (not the initial or the final state)

        for each pair of states (p, q) connected via k:
            path_regex <- alpha[p,q] + alpha[p,k] alpha[k,k]* alpha[k,q]
            Simplify path_regex
            Set path_regex as the transition label alpha[p,q]
        
        Remove k from A

    return the regular expression defining the regular language that A accepts

As we see, the algorithm repeats two steps in each iteration.

The first one is to randomly choose a state $k$ (not the initial or the final state) and remove it from $A$ .

The second step is to compute the regular expressions for the paths between each pair $(p, q)$ of the remaining states connected directly through $k$ . Then, we change the transition labels $\boldsymbol{\alpha_{pq}}$ to the computed regular expressions. The regular expressions of pairs of states that aren’t directly connected through $k$ remain the same.

The final step is to remove $k$ from $A$ since all the paths going through it are now in the updated or newly added transition labels.

3.2. Computing Regular Expressions for Paths

Let $k$ be the randomly chosen state.

We compute the new regular expressions for all possible pairs of states $p$ and $q$ in $A$ ( $p, q \neq k$ ) such that there’s a path $p \rightarrow k \rightarrow q$ using the following equation:

$\alpha'_{pq} = \alpha_{pq} + \alpha_{pk} \alpha_{kk}^* \alpha_{kq}$

where:

$\alpha_{pq}$ is the transition label from state $p$ to state $q$
$\alpha_{pk}$ is the transition label from state $p$ to state $k$
$\alpha_{kk}$ is the transition label from state $k$ to state $k$
$\alpha_{kq}$ is the transition label from state $k$ to state $q$

Afterward, we simplify the obtained regular expressions $\alpha'_{pq}$ and set the corresponding transition labels to those expressions.

4. Example

Let’s work out an example from a paper on state elimination.

Our input finite-state machine is:

First, we make it uniform:

Now, we randomly choose a state. Let it be $q_0$ . The next step is to compute the regular expressions for paths between the states connected directly via $q_0$ .

The new regular expression for the transition from $i$ to $q_1$ is:

$\alpha'_{iq_1} = \alpha_{iq_1} + \alpha_{iq_0} \alpha_{q_0q_0}^* \alpha_{q_0q_1}$

Substituting the labels for their values, we get:

$\alpha'_{iq_1} = \varepsilon b^*a$

Following this, we simplify the regular expression $\varepsilon b^* a$ to $b^*a$ . Then, we add a new transition between the states $i$ and $q_1$ :

Since there’s a path $q_1 \rightarrow q_0 \rightarrow q_1$ , we need to update the transition label $\alpha_{q_1q_1}$ . Following the same procedure, we get that the regular expression for the transition from $q_1$ to itself is $a+bb^*a$ . Consequently, the transition label from state $q_1$ to state $q_1$ is replaced with the new regular expression:

The new regular expression for the transition from state $q_1$ to state $f$ remains the same because the path between them doesn’t go through $q_0$ .

Finally, we remove $q_0$ :

Now, the only choice is $q_1$ since it’s the only non-initial and non-ending state. Repeating the steps, we get:

As a result, the automaton now contains only the initial and final states. Hence, we can no longer choose more states to remove. The label of the transition $i \rightarrow f$ is the regular expression equivalent to the input automaton.

5. Complexity

In the worst-case scenario, all the states are directly connected in both directions ( $p \rightarrow q$ and $q \rightarrow p$ ), there’s a transition from each state to itself, and each state $r$ connects all the pairs of the automaton’s states ( $p \rightarrow r \rightarrow q$ and $p \rightarrow r \rightarrow p$ for any $r$ and $p, q \neq r$ ).

Let $n$ be the number of states (excluding the initial and the final states).

Since all the states are connected in both directions, there are $\boldsymbol{(n-1)^2}$ possible pairs of states in the first iteration. Hence, we compute $(n-1)^2$ regular expressions.

In the second iteration, we compute $\boldsymbol{(n-2)^2}$ expressions because there are $n-1$ states.

We keep removing states and computing the paths between states until there are no more states except the initial and the final ones. Hence, the last regular expression computation is for the transition from the initial to the final states.

Therefore, the worst-case time complexity of the state elimination method is $\boldsymbol{O(n^3)}$ :

$O((n-1)^2 + ... + 9 + 4 + 1)=O\left( \sum_{j=1}^{n-1}j^2 \right) = O\left( \frac{n(n+1)(2n+1)}{6} - n^2\right) = O(n^3)$

The assumption is that computing regular expressions for paths is an $O(1)$ operation. Since the labels and expressions are strings, we can achieve that if we concatenate them in $O(1)$ time. For example, we can treat strings as arrays that can end with a pointer to another array. That way, we can construct $\alpha_{pq}{+}\alpha_{pk}\alpha_{kk}^*\alpha_{kq}$ in five steps, so regex computation is $O(1)$ .

6. Conclusion

In this article, we showed how to convert a finite-state automaton to a regular expression. We explained the $O(n^3)$ state elimination method. It works by eliminating states and changing the transition labels to the regular expressions of the three-state paths going over the eliminated states.

Core Concepts

Operating Systems

Neural Networks

Graph Theory

Latex

Full Archive

About Baeldung