1. Introduction

In this tutorial, we’ll present three ways to check the periodicity of a string. In other words, we check if it’s a repetition of one of its substrings.

2. Problem Statement

We have a string \boldsymbol{s=s_1s_2\ldots s_n} (\boldsymbol{n>1}) and want to check if we can obtain it by concatenating one of its proper prefixes to itself multiple times. The substring of s we repeatedly concatenate to get s has to be its prefix. Otherwise, it wouldn’t be possible to form the whole string.

For example, if s=abcabcabc, then we get s by concatenating z=abc two times to itself:

    \[abc + abc + abc\]

which we write as s=z^3. Since bca isn’t a prefix of s, we can’t get abcabcabc from it.

So, formally, the prefix we want to find is of the form \boldsymbol{z=s_{1:m}=s_1 s_2 \ldots s_m} where \boldsymbol{1 \leq m < n}.

Let’s see how we can determine it.

3. The Brute-Force Approach

We can iterate over the prefixes of \boldsymbol{s} and check if concatenating them gives us \boldsymbol{s}. To test if z=s_1 s_2 \ldots s_m concatenates to s=s_1 s_2 \ldots s_n, we iterate over i=1,2,\ldots, n and check if s_{((i-1) \bmod m) + 1} = s_i for all i. This way, we cyclicly iterate over z:

    \[\begin{aligned} i && 1 && 2 && \ldots && m && m + 1 && m + 2 && \ldots \\ s_i && s_1 && s_2 && \ldots && s_m && s_{m+1} && s_{m+2} && \ldots \\ s_{((i-1) \bmod m) + 1} && s_1 && s_2 && \ldots && s_m && s_1 && s_2 && \ldots \\ \end{aligned}\]

If we find an i for which the test fails, we can stop the iteration. Otherwise, we return z since we’ve proven that s=z + z + \ldots + z \equiv z z \ldots z \equiv z^k for some integer k.

Here’s the pseudocode:

Rendered by QuickLaTeX.com

3.1. Complexity

This algorithm’s worst-case time complexity is \boldsymbol{O(n^2)}. Here’s why. The worst-case input is a string of the form s_1 = s_2 = \ldots s_{n-1} \neq s_n. If given such an input, the algorithm runs n symbol equality tests for each m=1,2,\ldots, n-1 of the outer loop. So, the total number of checks is (n-1)n \in O(n^2).

The complexity would stay quadratic even if we started checks from s_{m+1} in the inner loop, skipping the unnecessary checks of s_1 = s_1, s_2 = s_2, \ldots, s_m = s_m:

    \[\sum_{m=1}^{n-1}(n-m)= \left(\sum_{m=1}^{n-1}n \right) - \left(\sum_{m=1}^{n-1}m \right) = n(n-1)-\frac{n(n-1)}{2} = \frac{1}{2}n^2 - \frac{1}{2}n \in O(n^2)\]

The problem with this approach is that it tries to get \boldsymbol{s} even from the prefixes for which we can conclude in advance that they can’t form \boldsymbol{s}. The key is to observe that \boldsymbol{m} has to divide \boldsymbol{n} without remainder for \boldsymbol{s} to be a repetition of \boldsymbol{s_{1:m}}. For instance, |ab|=2 and |abcabcabc|=9. Even without trying to get the latter from the former, we can say that it’s an impossible task because 9 \mod 2 =1 \neq 0.

4. The Improved Algorithm

So, we start from \boldsymbol{m=1} in the outer loop as before but don’t test whether \boldsymbol{s=z^k} unless \boldsymbol{m} is a divisor of \boldsymbol{n}.

The iteration can stop at \boldsymbol{\lfloor n/2 \rfloor} because no number greater than n/2  divides n except n, and m has to be strictly lower than n:

Here’s the pseudocode:

Rendered by QuickLaTeX.com

Another way would be to identify the divisors of n in advance to skip unnecessary checks whether n \mod m = 0.

4.1. The Complexity

Let’s derive the worst-case complexity of this approach. The worst-case input is of the same form as in the previous algorithm (s_1 = s_2 = \ldots s_{n-1} \neq s_n). The iterations of the inner loop still perform n checks but since we enter the inner loops only if n \mod m = 0, the number of inner loops isn’t \boldsymbol{n-1} but is equal to \boldsymbol{d(n)-1}, where \boldsymbol{d(n)} is the number of different divisors of \boldsymbol{n}.

Since \boldsymbol{d(n) \in n^{o(1)}}, the total complexity is \boldsymbol{n^{1+o(1)}}, so the algorithm is definitely sub-quadratic. We can get a tighter bound if we use the result:

    \[\limsup_{n \rightarrow \infty} \frac{ \log d(n) }{ \frac{\log n}{ \log \log n}} = 2\]

From there, we conclude that d(n) \in O \left( \frac{ \log n }{\log \log n}\right), so the algorithms’ complexity is:

    \[\boldsymbol{O \left( n \frac{ \log n }{\log \log n} \right)}\]

That means that the algorithm’s worst-case performance is better than O(n \log n) but worse than O(n). Is there a way to solve the problem in the O(n) time?

5. The Rotation Algorithm

We can design a linear-time algorithm by using the theory of strings.

5.1. The Rotation Theorem

We’ll use the theorem which states that a string \boldsymbol{s=s_1 s_2 \ldots s_n} is a repetition of one of its substrings if and only if \boldsymbol{s} is a non-trivial rotation of itself. Let’s prove it.

If s=z z \ldots z for some z of length m, then by removing the first m symbols and appending them to the last occurrence of z in s, we get the same string.

To prove the theorem in the other direction, we assume that s is a non-trivial rotation of itself. That means that there’s an m such that we can remove the first m < n symbols, append them to the rest of s in the same order, and obtain the starting string, s:

    \[s_1 s_2 \ldots s_m s_{m+1} s_{m+2} \ldots s_n = s_{m+1} s_{m+2} \ldots s_{2m} s_{2m+1} \ldots s_{n-1} s_{n} s_1 s_2 \ldots s_{m}\]

Since the LHS and RHS are equal, it must hold that s_1 = s_{m+1}, s_2 = s_{m+2}, \ldots, s_{m} = s_{2m}. So, we’ve proven that:

    \[s = z z s_{2m:n} \quad \text{ for } z = s_{1:m}\]

However, since the rotation of s is equal to s, we can rotate it again by m symbols to get the same string and conclude that s_{m+1}=s_{2m+1}, s_{m+2} = s_{2m+2}, \ldots, s_{2m}=s_{3m}, so s=z z z s_{3m:n}. Following the approach, we get that s = z^{ \frac{n}{m}}.

5.2. The Algorithm

So, the problem is now to check whether the input string \boldsymbol{s} is a non-trivial rotation of itself. If that’s the case, \boldsymbol{s} will be a proper substring of \boldsymbol{s s} (not starting at the \boldsymbol{1}st and the \boldsymbol{n}th positions). Therefore, we reduce the original problem to a string search. We try to find the index k of s in s s different from 1 and n. If successful, we’ll show that s is periodic, and the building block z will be s_{1:(k-1)}.

We can use an efficient string matching algorithm that uses the substring index of s s (constructible in \Theta(n) time) and then look for a non-trivial occurrence of s in O(n) time. Therefore, we can solve the original problem in the linear worst-case time.

6. Conclusion

In this article, we presented three algorithms to check string periodicity. The first and the second ones are easier to understand and implement than the third one. However, the rotation algorithm is more efficient. Its worst-case complexity is O(n), whereas the other two have super-linear time complexities.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.