Of Automata, Epsilons and Rusty machines

It’s been a while, but with the exams finally behind me (for a few months at least), I recently found some time to work on a project that I’d started quite a long while ago - FAr, or Finite Automata in Rust.

As I will be a teaching assistant for the theoretical computer science course next semester, this little program was initially created because I wanted to recap some things and maybe develop something useful for students attending my exercise class. Now, I think it has progressed enough to be published.

A primer on automata

If you are aware of what finite automata are, feel free to skip this segment. I’ll try to keep it accessible, so that you can understand this post without any prior knowledge.

So what’s a finite automaton?

A finite automaton - abbreviated as FA - is a finite state machine that, given a string of symbols, rejects or accepts that string by means of running through a state sequence.

Now for an explanation that actually tells you what this is all about, consider the following image:

Example of a FA

The arrow that isn’t labeled shows us which state is the starting state - here, \(S_1\).

The labeled arrows are then simple instructions on what state our automaton should be in when reading the label: Say our automaton reads ‘0’. According to the diagram, it will then switch to state \( S_2 \). If it then reads ‘1’, it will stay in \( S_2 \).

The double-circled states are so-called accepting states. An automaton that ends reading a word in such a state will accept that word, and otherwise reject it.

Why should I care?

As computer scientists, or even just avid user of computers, you might be familiar with the term regular expression. Regular expressions are a sequence of characters that specify a search pattern. If you are familiar with sed, awk or even vim and its search and replace commands, a pattern such as /M | [TN]B/ tells you exactly what kind of string you are looking for.

It turns out, that FA can exactly describe any such regular expression - and therefore decide for any string whether it matches that pattern or not. So, although FAs might be a very mathematical model - they are actually implemented by soft- and hardware for problems like pattern matching.

The Birth of FAr

So, what’s the purpose of FAr? When I started to code, I had two goals in mind:

Read user input and output an automaton
Test an arbitrary word against one such automaton

Along the way, I also added the option to define a non-deterministic finite automaton (NFA) - more on that later on.

How To Model a Model

Once I had the initial idea, I started planning out how my code should end up looking. One of the very first questions, and probably the most important one for many aspects of the program, was how to model an FA. In the end, I settled for this:

pub struct Dfa {
    alphabet: Vec<char>,
    nextid: u32,
    states: HashMap<u32, State>,
    startid: Option<u32>,
    acceptids: Vec<u32>,
    transitions: HashMap<(char, u32), u32>,
    isconsistent: bool,
}

struct State {
    id: u32,
    accepting: bool,
    starting: bool,
    epsclosure: RefCell<HashSet<u32>>,
}

Do note that I called it Dfa - for deterministic finite automaton. The non-deterministic variant differs slightly.

The whole State struct came to be mostly for the non-deterministic variant - essentially, we are always working with their ids for the deterministic automata.

Some remarks:

nextid is conveniently both the ID that will be assigned to the next state, as well as the number of states at the moment (thanks 0-indexing!)
isconsistent is purely for convenience, and can help save on checks performed before and during the execution of a test
epsclosure is a rather simple thing that leads to quite the complication, but necessary for the (performant) implementation of NFAs

In- and exporting such a struct is made easy by the existence of a crate called serde. This saved me a lot of headache and let me focus on the details of implementing the inner workings of an FA.

Custom Errors

Since I would still consider myself a rookie when it comes to programming in Rust, I wanted to gather some experience with properly handling errors - similar to my discord bot project. Since the program was supposed to be easy to understand and give output understandable to someone that may not know Rust or even programming at all, I needed my errors to be descript, more flexible than just including a string description of what went wrong. So, I decided to create my own kind of errors:

pub enum Error {
    NonDetTransition((u32, u32, char), (u32, u32, char)),
    NonExistentState(u32),
    NotInAlphabet(char),
    IllegalCharacter(char),
    InconsistentFA(InconsistencyError),
}

pub enum InconsistencyError {
    MissingStartState,
    MissingTransition(u32, char),
}

Not only is the name of the error a pretty apt description of what went wrong, the added tuples include vital information that can neatly be presented to the user. To make the errors nicely printable, I added this code snippet:

impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self{
    Error::NonDetTransition(first , second) => write!(f, "You tried adding a non-deterministic transition to a deterministic automaton. You tried to add {:?}, which conflicts with the existing transition {:?}", first, second),
    Error::NonExistantState(id) => write!(f, "There is no state with ID {}, you are likely out of bounds", id),
    Error::NotInAlphabet(character) => write!(f, "The character {} is not part of the this automaton's alphabet", character),
    Error::IllegalCharacter(character) => write!(f, "You have used an illegal character: {} is reserved and can't be used", character),
    Error::InconsistentFA(inner_error) => write!(f, "This automaton is inconsistent: {}", inner_error),
        }
    }
}

impl fmt::Display for InconsistencyError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            InconsistencyError::MissingStartState => {
                write!(f, "This automaton has no starting state")
            }
            InconsistencyError::MissingTransition(state, character) => write!(
                f,
                "This automaton is missing a transition for character {} from state {}",
                character, state
            ),
        }
    }
}

Now that I had erros and the basic structure of the FA nailed down - let’s work on implementing it!

The Tedium of Doing Things the Right Way

Surprisingly enough, basically every bit of code that followed concerning the functionality of the FA was rather simple. It was however still quite a bit of work to check all necessary conditions, properly return errors, and just implementing every function that I’d need before I could start working on user interaction. To show just one example, here’s the code responsible for adding a new transition:

fn add_transition(&mut self, c: char, first: u32, second: u32) -> Result<()> {
        if !(first < self.nextid) {
            return Err(Error::NonExistantState(first));
        }
        if !(second < self.nextid) {
            return Err(Error::NonExistantState(second));
        }
        if !(self.alphabet.contains(&c)) {
            return Err(Error::NotInAlphabet(c));
        }
        if let Some(q) = self.transitions.get(&(c, first)) {
            if q == &second {
                Ok(())
            } else {
                return Err(Error::NonDetTransition((first, second, c), (first, *q, c)));
            }
        } else {
            self.transitions.insert((c, first), second);
            Ok(())
        }
    }

It’s not very interesting to talk about this code, so I’ll leave it up to you whether or not you want to check out the full version of it on my gitlab.

Even the main goal of testing a given word turned out rather simple: Read a character, proceed according to transitions, repeat until word is completely read - accept if in accepting state. Done.

Spicing Things Up - NFAs

Non-deterministic FAs differ from ’normal’ FAs in that transitions do not have to adhere to the ‘one transition per character per state’ rule. Additionally, they allow for ’epsilon-transitions’, allowing to change state without reading any input. This complicated things a bit, but with some slight alterations, I could reuse most of the DFA structure.

One new thing I added to each state struct was a new field I called epsclosure. This would encapsulate the set of all states reachable with only epsilon-transitions. During the creation of the automaton, this would be updated as necessary, making it much less heavy on performance when running a test compared to dynamically calculating the epsilonclosure.

Testing things can be easy

With this rather simple addition, the testing process needed to be adapted slightly for NFAs. First of all, it needed to handle the newly introduced epsilonclosures, but it also needed to account for all the different possible states the automaton could be in at any given step. This essentially boiled down to having a set of possible current states instead of a single one.

When reading a character, every transition from every state in the current-state set had to be checked. While this sounds like an absolute horror for runtime, the lookup time is actually rather short thanks to hashmaps and -sets respectively - however, for very dense FAs, the sheer number of states that would need to be added in every step would indeed prove devastating for runtime (running in the ballpark of \( \mathcal{O}(n^2 k) \), where \( n \) is the number of states and \( k \) is the characters of the to-be-tested word). Since I mainly developed this for small scale automata, I didn’t bother to improve this, as I’d have to either minimise the automaton (which would be costly for large alphabets) or do some other shenanigans, and even then the worst case might not even improve.

Overall, I’m quite happy with how FAr turned out. It was probably the largest projects I have coded in Rust, and I’ve learned quite a bit - especially concerning handling user input, logging and handling errors. While I think those things are important to learn and master, they make for rather dry reading, so I focused this post mostly on the theoretical, underlying principles of FAr. Thanks for reading!

As per usual, you can check out the code in its current state on gitlab.