For the last few months I’ve been working on a side project - implementing the IMAP protocol in rust. The goal is to have a type safe library, meaning that you can reason about potential error situations, as well as your data, at compile time.

I’ve written a bit about this process so far, with the Beyond Memory Safety With Types post, in which I showed how to encode the IMAP state machine described in RFC 3501 into the rust type system. I want to build the entire library around concepts like this - using rust’s type system to make things easier and safer.

What I’ve been working on since then is actually parsing out the IMAP response data into rust structures. To do this I’ve been using Nom, a parser combinator library with a focus on 0 or close-to-0 copy parsing. So far working with nom has been a pretty solid experience. I’ve got dozens of parsers built up and the ‘flow’ involved in a parser combinator approach is very appealing; you break your data down into pieces, write a parser for each piece, and combine them as needed.

My one gripe so far has been the lack of documentation, and I had to rely a lot on this article on parsing ISO 8601 dates, as well as asking questions on IRC and reddit. Thankfully, the rust community is always eager to help and I got all of the input I required when I hit a blocker.

IMAP uses the ABNF grammar to describe the various aspects of the protocol, so the bulk of the work has been rewriting ABNF rules as nom parsers. For example,

This is the ABNF representation of an IMAP ‘address’:

    address = "(" addr-name SP addr-adl SP addr-mailbox SP
            addr-host ")"

And here is the nom parser:

named!(pub address <&[u8], Address>,
    chain!(
        char!('(') ~
        addr_name: nstring ~
        char!(' ') ~
        addr_adl: nstring ~
        char!(' ') ~
        addr_mailbox: nstring ~
        char!(' ') ~
        addr_host: nstring ~
        char!(')'),
        || {
            Address {
                addr_name: addr_name,
                addr_adl: addr_adl,
                addr_mailbox: addr_mailbox,
                addr_host: addr_host,
            }
        }
    )
);

nstring is another type, which is defined in ABNF as:

nil             = "NIL"
nstring         = string / nil

And we represent that in nom with:

named!(pub nil,
    alt!(
        tag!(b"NIL")
        | tag!(b"nil")
    )
);

named!(pub nstring,
       alt!(
           delimited!(
               char!('"'),
               is_not!("\""),
               char!('"')
           )
           | nil
       )
   );

So the first step was to define Nil, which is either NIL or nil (I think, gotta check that one at some point).

Then define an nstring, which is either a string or nil.

Then define an address, which is a group of space separated nstrings surrounded by parenthesis.

This approach of writing small parsers and then combining them makes it very easy to quickly build your parser out. In a fairly short period of time I had written parsers for the address, envelope, and sequence-set structures - the biggest bottleneck being my unfamiliarity with the project.

When I feel like I actually have a firm grasp on nom I’ll try to write some better examples but I think this should be sufficient to get the general idea. Nom is not as hard to use as I’d expected, but it definitely needs more example code.

Hopefully I can continue to write these parsers at a decent pace, and, when I’m done, that’ll leave very little work left before I can release the library. I don’t think IMAP is the most popular protocol but hopefully if you’re looking to use it, this library will make it easier.

I only put time into this project once every few weeks so if you’re interested in contributing there’s plenty to be done. In particular, there are dozens more parsers to write, tests for those parsers, testing against an actual IMAP server, and a few other things. Feel free to open a PR or message me about it.



blog comments powered by Disqus

Published

28 June 2016

Categories