I’ve been using rust for side projects for just over a year now and it’s about time I got around to writing about it.

If you’ve heard of Rust you’ve probably heard that it provides a lot of memory safety through its type system - you can’t get use after free vulnerabilities, for example.

Getting memory safety for free (Rust is crazy fast) is pretty big for a system’s language. But we can take that safety further - using Rust’s type system we can encode logic into our programs that will prevent other errors before the program ever runs.

I’ve been working on an IMAP library for Rust in my free time with the goal of preventing illogical IMAP commands at compile time. For example, IMAP has a LOGIN command, and a SELECT command, and it makes no sense to SELECT before you LOGIN.

IMAP is a somewhat ideal usecase for this because IMAP client’s can easily be represented by a state machine. From the IMAP RFC:

+----------------------+
|connection established|
+----------------------+
           ||
           \/
+--------------------------------------+
|          server greeting             |
+--------------------------------------+
   || (1)       || (2)        || (3)
   \/           ||            ||
+-----------------+    ||            ||
|Not Authenticated|    ||            ||
+-----------------+    ||            ||
|| (7)   || (4)       ||            ||
||       \/           \/            ||
||     +----------------+           ||
||     | Authenticated  |<=++       ||
||     +----------------+  ||       ||
||       || (7)   || (5)   || (6)   ||
||       ||       \/       ||       ||
||       ||    +--------+  ||       ||
||       ||    |Selected|==++       ||
||       ||    +--------+           ||
||       ||       || (7)            ||
\/       \/       \/                \/
+--------------------------------------+
|               Logout                 |
+--------------------------------------+
           ||
           \/
+-------------------------------+
|both sides close the connection|
+-------------------------------+


(1) connection without pre-authentication (OK greeting)
(2) pre-authenticated connection (PREAUTH greeting)
(3) rejected connection (BYE greeting)
(4) successful LOGIN or AUTHENTICATE command
(5) successful SELECT or EXAMINE command
(6) CLOSE command, or failed SELECT or EXAMINE command
(7) LOGOUT command, server shutdown, or connection closed

Every one of the IMAP commands above either leaves the client in a new, defined state, or in the old state.

There are a few ways to build a client around this. The easiest would be something like this: (Note that this is pseduo-rust for simplicity)


  struct IMAPClient {}

  impl IMAPClient {

    fn connect(&mut self) -> Result<(), Error> {unimplemented!()}
    fn login(&mut self) -> Result<(), Error> {unimplemented!()}
    fn select(&mut self) -> Result<(), Error> {unimplemented!()}
    fn authenticate(&mut self) -> Result<(), Error> {unimplemented!()}
    fn logout(&mut self) -> Result<(), Error> {unimplemented!()}

  }

In this implementation the client exists in a single all-encompassing state. If a user tries to select before they connect, we can return an Err(Error). We can reason about this potential error at compile time, but we can’t really explain to the compiler that this makes no sense. We can easily run into runtime errors here.

In order to move that state machine logic into something the compiler can understand, we leverage Rust’s native affine type system to create a session type.

An affine type is a type that can only be used once. That means that if I have some affine type ‘x’, when I use ‘x’ it is consumed and no one else can use ‘x’.

By chaining a group of affine types together, where a value is consumed and a new value of a new type is produced, we create our session type - a compile time state machine.

We’ll encode every state into a type, a struct, and define a state transition where the transition consumes that state and produces a new state.

The code looks (mostly) like this:


  struct Initial{
      connection: connectionType
  }

  struct UnAuthenticated {
    connection: connectionType
  }

  struct Authenticated {
    connection: connectionType
  }

  struct Selected {
    connection: connectionType
  }

  struct LoggedOut {
    connection: connectionType
  }

  impl Initial {
    fn connect(self) -> Result<UnAuthenticated, (Initial, Error)> {
      unimplemented!()
    }
  }

  impl UnAuthenticated {
    fn login(self) -> Result<Authenticated, (UnAuthenticated, Error)> {
      unimplemented!()
    }

    fn authenticate(self) -> Result<Authenticated, (UnAuthenticated, Error)> {
      unimplemented!()
    }

    fn logout(self) -> Result<Logout, (UnAuthenticated, Error)> {
      unimplemented!()
    }
  }

  impl Authenticated {
    fn select(self) -> Result<Selected, (Authenticated, Error)> {
      unimplemented!()
    }

    fn logout(self) -> Result<Logout, (Authenticated, Error)> {
      unimplemented!()
    }
  }

  impl Selected {
    fn logout(self) -> Result<Logout, (Selected, Error)> {
      unimplemented!()
    }
  }

It all boils down to this:

                              //v-- Our successful or unsuccessful state transition
  fn transition(self) -> Result<NewState, OldState>
                //^--consumes self

As you can see, every state is its own type, and every type has access to a state transforming member function. Given a state transitioning function, you are either given a new state or an old state with an error. This means that, at compile time, every state transition is validated and any errors resulting in no state transition must be handled.

Note that because ‘self’ is being taken by value, and self is an affine type, it is statically verified that no references to an old state can exist after a transition. The affine type system in rust provides this for us for free.

Commands such as ‘fetch’, which do not have a state transition, are only logically callable from certain states - and we can not call these at compile time from an invalid state.

The code for using both of these versions of the IMAPClient looks nearly the same due to Rust’s type inference. The library code requires a bit of extra code but it’s mostly just splitting up functions - it took a trivial amount of time.

There are many other applications of Rust’s type system to prevent non-memory related errors. This is one technique for one type of error, but you could easily imagine other problems that can be solved through a type system. You could write code for a database that only ever inserts strings of the EscapedString type, or maybe encode a Same Origin Policy into the types used in a browser.

The ability to encode this information into your program means that you can enforce semantic characteristics of your API, and build very robust, easy to reason about software in Rust.



blog comments powered by Disqus

Published

30 May 2016

Categories