MODULE 2:   Finite Automata
Practice Set 2
1 Solved Problems
Problem (Odd ones out)

Draw a DFA that solves the language \[L = \{x: \text{$x$ has an even number of ${\texttt{1}}$'s and an odd number of ${\texttt{0}}$'s}\}\] over the alphabet \(\Sigma = \{{\texttt{0}}, {\texttt{1}}\}\).

Solution

We have \(4\) states with the following meanings. The string we have seen so far has:

  • even number of \({\texttt{0}}\)’s and even number of \({\texttt{1}}\)’s (state \(q_0\));

  • even number of \({\texttt{0}}\)’s and odd number of \({\texttt{1}}\)’s (state \(q_1\));

  • odd number of \({\texttt{0}}\)’s and even number of \({\texttt{1}}\)’s (state \(q_2\));

  • odd number of \({\texttt{0}}\)’s and odd number of \({\texttt{1}}\)’s (state \(q_3\)).

image

Problem (Dededen)

In Turkish, the word \(dededen\) means “from grandfather”. Suppose we wish to search for this word within some text. We will assume the text alphabet is \(\Sigma = \{{\texttt{n}}, {\texttt{e}}, {\texttt{r}}, {\texttt{d}}\}\). Let \[T = \{x \in \Sigma^* : x \text{ contains ${\texttt{dededen}}$ as a substring}\}\] Draw a DFA \(M\) that solves \(T\). Then define all the 5 components of the DFA.

For practice, you might test out your ideas on the following text: \[{\texttt{nerederenderendededededededededi}}\] Well, actually that practice text ends in an \({\texttt{i}}\); just pretend it’s an \({\texttt{r}}\). By the way, this text is a flattening of the following dramatic one-act play:

“Nerede rende?”

“Rende de dedede,” dede dedi.

Which in Turkish means:

“Where’s the grater?”

“The grater is also with grandfather,” grandfather said.

Solution

image

Let \(S = \{\epsilon, \text{d}, \text{de}, \text{ded}, \text{dede}, \text{deded}, \text{dedede}, \text{dededen}\}\). Then we define:

  • \(Q = \{q_s : s \in S\}\).

  • \(q_{0} = q_{\epsilon}\).

  • \(F = \{ q_{\text{dededen}} \}\).

  • For \(\sigma \in \Sigma\), \(\delta( q_{\text{dededen}}, \sigma ) = q_{\text{dededen}}\).

  • For \(\sigma \in \Sigma\) and \(s \neq \text{dededen}\), \(\delta(q_{s}, \sigma) = q_{u}\) where \(u\) is the longest prefix of “\(\text{dededen}\)” that is a suffix of \(s\sigma\).

Problem (Adam, I’m Ada!)

Show that, if \(|\Sigma| > 1\), then \[L = \{x \mid x \in \Sigma^* \text{ and } x = x^R \}\] is a non-regular language.

Solution

Any string \(x\) that satisfies \(x = x^R\) is called a palindrome.

As usual, we will prove that the language is non-regular by a proof by contradiction and apply the pigeonhole principle (PHP). So assume for the sake of contradiction that there exists a DFA with \(k\) states that solves \(L\). Consider two distinct symbols \(a,b \in \Sigma\) (since we assumed \(|\Sigma| > 1\)). Consider the \(k+1\) strings \(b^na\) for \(n \in \{0, \ldots, k\}\). Since there are only \(k\) states, by PHP, there must exist some \(i,j\), \(0 \leq i < j \leq k\), such that \(b^ia\) and \(b^ja\) end up in the same state. Thus, \(b^iab^i\) and \(b^jab^i\) must end up in the same state. However, \(b^iab^i\) is a palindrome, so it should end up in an accepting state, and \(b^jab^i\) is not a palindrome, so it should end up in a rejecting state. This is the desired contradiction since a state cannot be both accepting and rejecting.

Problem (Which one is regular?)

Consider the following languages over the alphabet \(\{{\texttt{a}}, {\texttt{b}}, {\texttt{c}}\}\): \[\begin{aligned} L & = \{{\texttt{a}}^n{\texttt{b}}^m{\texttt{c}}^s : n, m, s \in \mathbb N, n < m < s\},\\ K & = \{{\texttt{a}}^n{\texttt{b}}^m{\texttt{c}}^s : n, m, s \in \mathbb N\}.\end{aligned}\]

  1. Show that one of these languages is regular and the other is not. For the proof of regularity, all you need to do is draw or describe a DFA; a proof of correctness is not required. For showing that the language is non-regular, give a complete proof.

  2. For the regular language you have identified above, what is the minimum number of states needed in any DFA solving the language? Prove your answer.

Solution

\(L\) is non-regular. Assume for sake of contradiction that \(L\) is regular. Then there exists a DFA \(M\) with \(k\) states accepting \(L\). Consider the set of \(k + 1\) strings \(\{{\texttt{a}}^{n}{\texttt{b}}^{n + 1} : 1 \leq n \leq k + 1\}\). By the pigeonhole principle, there exist \(i,j\), \(i < j\), such that \({\texttt{a}}^{i}{\texttt{b}}^{i + 1}\) and \({\texttt{a}}^{j}{\texttt{b}}^{j + 1}\) end on the same state. This means that for all strings \(w\), \({\texttt{a}}^{i}{\texttt{b}}^{i + 1}w\) and \({\texttt{a}}^{j}{\texttt{b}}^{j + 1}w\) end on the same state. Consider \(w = {\texttt{c}}^{i + 2}\). Suppose \({\texttt{a}}^{i}{\texttt{b}}^{i + 1}{\texttt{c}}^{i + 2}\) ends in state \(q\). Then \({\texttt{a}}^{j}{\texttt{b}}^{j + 1}{\texttt{c}}^{i + 2}\) also ends in state \(q\). Since \({\texttt{a}}^{i}{\texttt{b}}^{i + 1}{\texttt{c}}^{i + 2} \in L\), we know \(q\) is an accepting state. However, \(i < j\) so \({\texttt{a}}^{j}{\texttt{b}}^{j + 1}{\texttt{c}}^{i + 2} \notin L\), implying \(q\) is a rejecting state. This is a contradiction, so \(L\) is non-regular.

\(K\) is regular. We can write it as \(\{ {\texttt{a}} \}^{*}\{ {\texttt{b}} \}^{*}\{ {\texttt{c}} \}^{*}\), which is regular by the fact that regular languages are closed under concatenation and star operations. The following DFA accepts \(K\):

image

To prove this is minimal, let \(M\) be a DFA accepting \(K\). Consider the strings \({\texttt{a}}\), \({\texttt{ab}}\), \({\texttt{abc}}\), and \({\texttt{abca}}\). We claim that all of these strings must end in a different state when run through \(M\), which establishes that \(M\) must have at least \(4\) states.

We now prove the claim. Note that only \({\texttt{abca}}\) is not in \(K\), so it must lead to a rejecting state while the other three strings must go to accepting states. If \({\texttt{ab}}\) and \({\texttt{abc}}\) go to the same state, so do \({\texttt{abb}}\) and \({\texttt{abcb}}\), which is a contradiction as the former but not the latter is in \(K\). Similarly, \({\texttt{a}}\) and \({\texttt{ab}}\) can’t go to the same state since \({\texttt{aa}} \in K\) while \({\texttt{aba}} \notin K\). Finally, \({\texttt{a}}\) and \({\texttt{abc}}\) can’t end in the same state since \({\texttt{aa}} \in K\) and \({\texttt{abca}} \notin K\). Thus, the DFA \(M\) must have at least \(4\) distinct states.

Problem (When is the language of a DFA infinite?)

Let \(M\) be a DFA with \(k\) states. Prove that \(M\) accepts infinitely many strings if and only if \(M\) accepts some string \(w\) with \(|w| > k\).

Solution

One of the directions is quite straightforward. If \(M\) accepts infinitely many strings, then for every \(m \in \mathbb N\), \(M\) must accept a string of length at least \(m\).

For the other direction, suppose \(M\) (which has \(k\) states) accepts the string \(w = w_1 w_2 \ldots w_n\), where \(n > k\). Then by the pigeonhole principle, there exists \(i\) and \(j\), \(i < j\), such that \(w_1 \ldots w_i\) and \(w_1\ldots w_j\) end up at the same state \(q\).

image

It follows that for all \(m \in \mathbb N\), the string \[w_1 \ldots w_i (w_{i + 1} \ldots w_j)^m\] ends up at state \(q\). So for all \(m \in \mathbb N\), the string \[w_1 \ldots w_i (w_{i + 1} \ldots w_j)^m w_{j + 1} \ldots w_n\] is accepted. Therefore, infinitely many strings are accepted.

Problem (State the number of states)

For any \(k \ge 1\), let \[R_k = \{ x : x \in \{{\texttt{0}},{\texttt{1}}\}^* \text{ and the $k$-th symbol from the right is a 1} \}.\] Show that any DFA that solves \(R_k\) must have at least \(2^k\) states.

Solution

Fix an arbitrary \(k \geq 1\). We will show, by contradiction, that any DFA solving \(R_k\) must have at least \(2^k\) states. So assume that there exists some DFA \(M\) with less than \(2^k\) states that solves \(R_k\). Consider the set \(B_k\) of all binary strings of length \(k\). Note that \(|B_k|=2^k\). By the pigeonhole principle, there must be at least two strings in \(B_k\), denoted by \(x\) and \(y\), that end up in the same state of \(M\). Since these are different strings, \(x\) and \(y\) must have some bit, say \(x_i\) and \(y_i\), such that \(x_i \neq y_i\).

Let’s now consider the strings \(x' = x{\texttt{0}}^{i}\) and \(y' = y{\texttt{0}}^{i}\). Note that in \(x'\) and \(y'\), the bits \(x_i\) and \(y_i\) are the \(k\)’th characters from the right. Since \(x\) and \(y\) end up in the same state, and we append the same string \({\texttt{0}}^i\) to both to obtain \(x'\) and \(y'\), we know \(x'\) and \(y'\) end up in the same state. However, since \(x_i \neq y_i\), only one of \(x'\) and \(y'\) should end up in an accepting state. This is the desired contradiction.

Problem (Infinity of primes using DFAs)

Fix the alphabet \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\). For a string \(w \in \Sigma^*\), let \(\text{diff}(w) \in \mathbb Z\) denote the number of \({\texttt{1}}\)’s in \(w\) minus the number of \({\texttt{0}}\)’s.

  1. Define \(L_n = \{w \in \Sigma^* : n \text{ divides } \text{diff}(w)\}\). Describe a DFA solving \(L_n\). A proof of correctness is not required.

  2. Define \(L = \bigcup_{p \text{ a prime}} L_p\). And define \(K = \{w \in \Sigma^* : \text{diff}(w) \not\in \{-1,1\}\}\). Show that \(L = K\).

  3. Using the “pigeon set” \(P = \{({\texttt{111}})^n : n \in \mathbb N\}\), show that \(K\) (and therefore \(L\)) is not regular.

  4. Using the fact that regular languages are closed under the union operation, conclude: There are infinitely many primes!

Solution
  1. Our set of states is \(Q = \{q_0, q_1, \ldots, q_{n-1}\}\). We want the meaning of \(q_i\) to be as follows: a string \(w\) that ends in \(q_i\) is such that \(\text{diff}(w) \equiv i \pmod n\). Therefore we set \(F = \{q_0\}\). To make sure the states have the advertised meaning, we define, for \(q_i \in Q\), \(\delta(q_i, {\texttt{1}}) = q_{(i+1) \mod n}\) and \(\delta(q_i, {\texttt{0}}) = q_{(i-1) \mod n}\). This completes the description of the DFA.

  2. We show \(L = K\) using a double-containment argument.

    To show \(L \subseteq K\), consider a word \(w \in L\). Then by the definition of \(L\), there is some prime number \(p\) such that \(w \in L_p\). By the definition of \(L_p\), this means \(p\) divides \(\text{diff}(w)\). This implies \(\text{diff}(w) \not \in \{-1,1\}\) since a prime number cannot divide \(-1\) or \(1\). Therefore, by the definition of \(K\), we have \(w \in K\).

    To show \(K \subseteq L\), consider a word \(w \in K\). Then by the definition of \(K\), \(\text{diff}(w) \not \in \{-1, 1\}\). This implies there exists some prime number \(p\) dividing \(\text{diff}(w)\), and therefore \(w\) is in \(L_p\). And this implies \(w \in L\).

  3. We will show that in any DFA solving \(K\), the strings in \(P\) must all end up in a separate state. Since \(P\) is an infinite set, this implies there cannot be a DFA solving \(K\) (since by definition, any DFA must have a finite number of states).

    To show that the strings in \(P\) must end up in different states, we pick two arbitrary strings \(x = {\texttt{1}}^{3i}\) and \(y = {\texttt{1}}^{3j}\) in \(P\) such that \(i \neq j\) and argue they must end up in separate states. Assume for the sake of contradiction that \(x\) and \(y\) do end up in the same state. Without loss of generality, assume \(i > j\). Let \(z = {\texttt{0}}^{3j + 1}\). Then \(xz\) and \(yz\) must also end up in the same state. However, \(xz = {\texttt{1}}^{3i}{\texttt{0}}^{3j + 1}\) and so \(\text{diff}(xz) \geq 2\). This means \(xz\) should end up in an accepting state. On the other hand, \(yz = {\texttt{1}}^{3j}{\texttt{0}}^{3j + 1}\) and so \(\text{diff}(yz) = -1\). This means \(yz\) should end up in a rejecting state. This is the desired contradiction.

  4. Assume for the sake of contradiction that there are only finitely many prime numbers. Then since each \(L_p\) is regular (as shown in part 1), and regular languages are closed under finite unions, we conclude \(L = K\) is regular. However, we showed in part 3 that \(K\) is not regular, so this is a contradiction. The conclusion is that there must be infinitely many prime numbers.

Problem (Suffering with suffixes)

Given a word \(w\), we say that \(u\) is a proper suffix of \(w\) if there is \(v \neq \epsilon\) such that \(w = vu\). For a language \(L\), define \[\text{SUFF}(L) = \{w \in L : \text{no proper suffix of $w$ is in $L$}\}.\] Show that if \(L\) is regular, then so is \(\text{SUFF}(L)\), as follows. Give an exact description of a DFA solving \(\text{SUFF}(L)\), explicitly stating how \(Q\), \(\delta\), \(q_0\) and \(F\) are defined. Furthermore, briefly explain the reasoning behind your construction. A detailed proof of correctness is not needed.

Hint

The construction is very similar to the construction we have seen for showing regular languages are closed under the concatenation operation. In that proof, we opened up a thread every time we encountered an accepting state of the first machine. In this case, you should open up a new thread after reading each symbol.

Solution

The hint gives the general idea behind the construction. We give the details below.

Since \(L\) is regular, we know that there is some DFA \(M = (Q, \Sigma, \delta, q_0, F)\) that solves \(L\). Recall that for \(S \subseteq Q\) and \(\sigma \in \Sigma\), we let \(\delta_{\wp}(S, \sigma) = \{\delta(s, \sigma) : s \in S\}\). Now, we define \(M' = (Q', \Sigma, \delta', q_0', F')\) that solves \(\text{SUFF}(L)\) as follows:

  • \(Q' = Q \times \wp(Q)\)

  • For any \(q \in Q\), \(S \subseteq Q\), and \(\sigma \in \Sigma\), define \(\delta'((q, S), \sigma) = (\delta(q, \sigma), \delta_{\wp}(S, \sigma) \cup \{q_0\})\)

  • \(q_0' = (q_0, \varnothing)\)

  • \(F' = \{(q, S) : q \in F, S \cap F = \varnothing\}\)

Intuitively, the first component of the state tuple (in \(M'\)) keeps track of ‘the main thread’ (i.e. the computation of the DFA on the whole string), while the second component keeps track of all the ‘suffix threads’ (i.e. the computation of the DFA on proper suffixes of the whole string). The transition function is defined with this design in mind. At each character, the main thread and each of the suffix threads takes one step; additionally, a new suffix thread is started. In the end, we accept if and only if the main thread has accepted and all the suffix threads have rejected.

Problem (Multiple multiples)

Let \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\). Define \[C_3 = \{x \in \Sigma^* \mid x \text{ is a binary number that is a multiple of } 3\}.\] Show that \(C_3\) is regular.

Solution

Our set of states will be \(Q = \{q_0, q_1, q_2\}\) corresponding to the possible remainders modulo \(3\). We want the following property to hold: a string corresponding to the binary representation of a number \(w\) should end on state \(q_i\) if \(w \equiv i \pmod 3\). If we can establish this, then we can set \(F = \{q_0\}\).

As we read a binary string \(s\), we can determine what number/value it corresponds to as follows: Start with value \(0\). Scan \(s\) from left to right. If we see a \({\texttt{0}}\), multiply the value by 2. If we see a \({\texttt{1}}\), multiply the value by 2 and add 1. (Take a moment to verify that this is true.)

With this observation in mind, we’ll define our transition function as follows. \(\delta(q_i, {\texttt{0}}) = q_{2i},\) and \(\delta(q_i, {\texttt{1}}) = q_{2i+1}\), where indices of the \(q\)’s are taken modulo \(3\).

We leave the following as exercises for you: (i) Why is the above transition function correct? (ii) Does the argument generalize to mod values other than \(3\)?

Problem (Regular expressions)

As mentioned in lecture, regular languages can be defined recursively/inductively as follows:

  • \(\varnothing\) is regular.

  • \(\{a\}\) for each \(a \in \Sigma\) is regular.

  • If \(L_1\) and \(L_2\) are regular, then \(L_1 \cup L_2\) is regular.

  • If \(L_1\) and \(L_2\) are regular, then \(L_1 L_2\) is regular.

  • If \(L\) is regular, then \(L^*\) is regular.

Often, regular languages are represented compactly using what is known as a regular expression. We define regular expressions inductively as follows:

  • Any single string is a regular expression (representing the singleton set containing that string).

  • If \(A\) and \(B\) are regular expressions, then \((A+B)\) is a regular expression (representing the union of \(A\) and \(B\)).

  • If \(A\) and \(B\) are regular expressions, then \((AB)\) is a regular expression (representing the concatenation of \(A\) and \(B\)).

  • If \(A\) is a regular expression, then \(A^*\) is the regular expression.

Noting the correspondence between the two inductive definitions above, a language is regular if and only if it can be represented by a regular expression.

Here are a few shortcuts related to regular expressions. We use \(\Sigma\) to denote the regular expression corresponding to taking the union of all strings of length \(1\). So if \(\Sigma = \{{\texttt{0}}, {\texttt{1}}\}\) then \(\Sigma\) would denote \(({\texttt{0}} + {\texttt{1}})\). We also use \(A^k\) to denote the regular expression corresponding to taking the concatenation of \(A\) with itself \(k\) times. Parentheses in regular expressions can be omitted to reduce clutter, provided the removal does not introduce any ambiguity.

For the problems below, you do not need to provide justifications for your answers.

  1. Let \(\Sigma = \{{\texttt{0}}, {\texttt{1}}\}\). For each regular expression below, describe in English the language it represents.

    • \({\texttt{1}}\Sigma^*\)

    • \(\Sigma^* {\texttt{101}} \Sigma^*\)

    • \({\texttt{1}}({\texttt{00}})^*{\texttt{1}}\)

    • \((\Sigma^{251})^*\)

  2. Let \(\Sigma = \{{\texttt{a}}, {\texttt{b}}\}\). Give a regular expression for the languages below.

    • The set of all strings with a single \({\texttt{a}}\).

    • The set of all strings that start and end with the same symbol.

    • The set of all strings that contain at least two \({\texttt{a}}\)’s and at most one \({\texttt{b}}\).

    • The set of all strings containing an even number of \({\texttt{a}}\)’s, an odd number of \({\texttt{b}}\)’s, and not containing the substring \({\texttt{ab}}\).

Solution
    • The set of all strings that start with a \({\texttt{1}}\).

    • The set of all strings that contain \({\texttt{101}}\) as a substring.

    • The set of all even-length strings that have exactly two \({\texttt{1}}\)’s, one at the start of the string and one at the end.

    • The set of all strings whose length is a multiple of \(251\).

    • \({\texttt{b}}^*{\texttt{a}}{\texttt{b}}^*\)

    • \({\texttt{a}}\Sigma^*{\texttt{a}} + {\texttt{b}}\Sigma^*{\texttt{b}} + {\texttt{a}} + {\texttt{b}}\)

    • \({\texttt{aa}}^*{\texttt{a}} + {\texttt{aa}}{\texttt{a}}^*{\texttt{b}}{\texttt{a}}^* + {\texttt{a}}{\texttt{a}}^*{\texttt{b}}{\texttt{a}}^*{\texttt{a}} + {\texttt{a}}^*{\texttt{b}}{\texttt{a}}^*{\texttt{a}}{\texttt{a}}\)

    • \({\texttt{b}}({\texttt{bb}})^*({\texttt{aa}})^*\)

2 Unsolved Problems
Problem (Regular or not)

Fix the alphabet \(\Sigma = \{{\texttt{a}},{\texttt{b}}\}\). Which of the following languages are regular? If you think a language is regular, draw the state diagram of a DFA that solves the language. You do not have to prove that the DFA solves the language. If you think a language is not regular, then provide a proof.

  1. \(L = \{ {\texttt{a}}^n{\texttt{b}}^m \; | \; n,m \geq 1, n \equiv m \text{ mod } 3\}\).

  2. \(L = \{ w \in \Sigma^* \; | \; \text{$w$ contains at least two ${\texttt{a}}$'s and at most one ${\texttt{b}}$}\}\).

  3. \(L = \{ x{\texttt{a}}y : x, y \in \Sigma^*, |x| = |y|\}\) (\(L\) is the set of strings with an \(a\) at the center).

  4. \(L = \{ xwx^R \; | \; x,w \in \Sigma^*, |x|, |w| > 0 \}\).

  5. \(L = \{{\texttt{a}}^n {\texttt{b}}^m \; | \; n,m \geq 1, \gcd(n,m)=1\}\), where \(\gcd(n,m)\) denotes the greatest common divisor of \(n\) and \(m\).

Problem (Regularly rational)

Let \(x \in [0, 1)\) be a real number written in binary as \(x = 0.b_1b_2\ldots\) (where each \(b_i \in \{0,1\}\)). We treat \(x\) as an infinite string, appending additional \(0\)’s if the binary representation is finite. We define the following language over the alphabet \(\Sigma = \{0,1\}\): \[L_x = \{b_1b_2 \ldots b_i : i \in \mathbb N^+ \} = \{b_1, b_1b_2, b_1b_2b_3,\ldots\}.\] Prove that for \(x \in [0, 1)\), \(L_x\) is regular if and only if \(x\) is rational.

For this problem, you may assume the following without proving it: The number \(x = 0.b_1b_2\ldots\) is rational if and only if there exists \(k, \ell \in \mathbb N^+\) such that for all \(i \geq k\), \(b_i = b_{i+\ell}\).

Problem (MIX it up)

Fix an alphabet \(\Sigma\). If \(x = a_1a_2 \ldots a_n\) and \(y = b_1b_2 \ldots b_n\) are two strings of the same length, define \(\text{mix}(x,y)\) to be the string in which the symbols of \(x\) and \(y\) alternate, starting with the first symbol of \(x\), that is, \(\text{mix}(x,y) = a_1b_1a_2b_2 \ldots a_nb_n\). If \(L_1\) and \(L_2\) are languages, define \(\text{MIX}(L_1, L_2)\) to be the language of all strings of the form \(\text{mix}(x, y)\), where \(x\) is any string in \(L_1\) and \(y\) is any string in \(L_2\) of the same length. Prove that if \(L_1\) and \(L_2\) are regular, then so is \(\text{MIX}(L_1, L_2)\). You should give an exact description of a DFA solving \(\text{MIX}(L_1,L_2)\), explicitly stating how \(Q\), \(\delta\), \(q_0\) and \(F\) are defined. Once you have the exact definition, give a short explanation why this DFA solves \(\text{MIX}(L_1,L_2)\). A detailed proof of correctness is not required.

Problem (The ONE)

Given a language \(L \subseteq \{{\texttt{0}},{\texttt{1}}\}^*\), define \[\text{ONE}(L) = \{{\texttt{1}}^{|x|} : x \in L\} \subseteq \{{\texttt{1}}\}^*,\] where we view \(\text{ONE}(L)\) as a language over the alphabet \(\Sigma = \{{\texttt{1}}\}\). Show that if \(L\) is regular, then \(\text{ONE}(L)\) must also be regular. To do this give an exact description of a DFA solving \(\text{ONE}(L)\), explicitly stating how \(Q\), \(\delta\), \(q_0\) and \(F\) are defined. Furthermore, briefly explain the reasoning behind your construction. A detailed proof of correctness is not required.

Problem (Fooling pigeons)

We have seen how to prove that a language is non-regular by identifying an appropriate set of “pigeons” and then applying the pigeonhole principle. In this problem we will explore whether this is a “complete” proof technique: If a language is non-regular, can we always find an appropriate set of pigeons and apply the pigeonhole principle?

Let \(L \subseteq \Sigma^*\) be a language. We say that two strings \(x\) and \(y\) are distinguishable (with respect to \(L\)) if there is some \(z \in \Sigma^*\) such that exactly one of the words \(xz\) and \(yz\) is in \(L\). We say that a set of words \(P\) is a fooling pigeon set for \(L\) if every pair of words in \(P\) is distinguishable.

  1. Show that \(A = \{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\} \subseteq \{{\texttt{0}},{\texttt{1}}\}^*\) and \(B = \{{\texttt{a}}^{2^n}: n \in \mathbb N\} \subseteq \{{\texttt{a}}\}^*\) each have a fooling pigeon set of infinite size. Show that every fooling pigeon set for \(C = \{{\texttt{0}}^n{\texttt{1}}^m : n,m \in \mathbb N\} \subseteq \{{\texttt{0}},{\texttt{1}}\}^*\) is finite.

  2. Prove that \(L\) is a regular language if and only if every fooling pigeon set for \(L\) is finite. Or in other words, \(L\) is non-regular if and only if there is a fooling pigeon set for \(L\) which is infinite. (Note that this, combined with part 1, allows us to reprove that \(A\) and \(B\) above are non-regular, and \(C\) is regular.)

Hint

There are two directions to prove. For the harder direction, form an equivalence relation over \(\Sigma^*\) where two strings are related if they are indistinguishable. Observe that if every fooling pigeon set for \(L\) is finite, then there is a finite number of equivalence classes. Can you then use this to construct a DFA for \(L\)?