Over the last few months I’ve been working on Grapl, a platform for DFIR built largely around graph structures.

I wanted Grapl to be trivial to deploy, both because it would ease others’ work to get started with it, and because it’ll make my test cycle a lot faster.

Grapl consists of around 7 Lambdas, some S3 buckets, SNS topics, SQS queues, and the connections and policies between them. Configuring and changing these in the console quickly became untenable. I evaluated two projects to make my life easier here, with the goal of having a near-one-command deployment.

Terraform

The first project I started with was Terraform. Terraform is developed by Hashicorp, and is basically a DSL for describing CloudFormation policies, written in HCL.

I found HCL to mostly be unbearable. It’s very ‘stringy’ and weird - it definitely feels like a DSL, and not like a typical programming language, which I don’t really see the appeal of since I spend 99% of my time using typical languages.

Here’s an example:

# Create a subnet to launch our instances into
resource "aws_subnet" "default" {
  vpc_id                  = "${aws_vpc.default.id}"
  cidr_block              = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

This defines a VPC subnet resource, referencing a VPC by id through string interpollatoin.

On top of that Terraform does very little to help you out. Everything has to be defined - policies, subnets, etc. Why? Why must I state that my Queue should be publishable to by my SNS Topic? And then I also have to define that my SNS topic can publish to my Queue? It felt so redundant and easy to get wrong - I had to understand CloudFormation and AWS policies.

I ditched Terraform and, for a while, just did things with the console.

aws-cdk

Later I found out about aws-cdk, a new approach to configuring AWS resources through code.

The nice thing about CDK is that it is not a DSL. It’s a library that I can use from various programming languages.

By leveraging real languages I can use the tools I’m used to - classes, functions, generics, loops, branches, etc. I didn’t have to learn the CDK way, I just approached it as I would any problem.

What this meant was I could move really fast, building my configurations in a way that was very readable and, at least to some extent, DRY.

function subscribe_lambda_to_queue(stack: cdk.Stack, id: string, fn: lambda.Function, queue: sqs.Queue) {

    // TODO: Build the S3 Endpoint and allow traffic only through that endpoint
    new lambda.cloudformation.EventSourceMappingResource(stack, id + 'Events', {
        functionName: fn.functionName,
        eventSourceArn: queue.queueArn
    });

    fn.addToRolePolicy(new cdk.PolicyStatement()
        .addAction('sqs:ReceiveMessage')
        .addAction('sqs:DeleteMessage')
        .addAction('sqs:GetQueueAttributes')
        .addAction('sqs:*')
        .addResource(queue.queueArn));
}

Here’s a function I wrote to factor out some common logic I had - subscribing my lambdas to an SQS Queue.

Note that in this case I had to add the policy to my lambda. This is actually atypical - cdk will, in almost every case, generate these for you. I only had to do so here because it’s a very young library and they haven’t automated this yet.

But consider this code:

    const event_producer = new sns.Topic(this, "ProducerName");
    const graph_merger_queue = new sqs.Queue(this, "QueueName");
    event_producer.subscribeQueue(graph_merger_queue);

This code defines a Topic and a Queue, and subscribes the Queue to the Topic. I don’t need to define any policy - it’s obvious what it should be, allow the queue to read from the topic, so cdk just does it for me.

I felt confident that cdk would build be policies that are least privilege by default, and that I couldn’t accidentally mess them up.

I also use a database, and I wanted my db username and passwords to be stored in an environment variable. Because I was using typescript this was trivial - just npm install node-env-file and use it.

    const history_db = new HistoryDb(
        this,
        'history-db',
        network.grapl_vpc,
        new cdk.Token(process.env.HISTORY_DB_USERNAME),
        new cdk.Token(process.env.HISTORY_DB_PASSWORD)
    );

I pass my custom HistoryDb class the necessary information, pulling the credentials from my .env, and I’m done.

Similarly, I pass these credentials to my lambdas that need to access the history db (I intend to move to KMS later).

    let node_identity_mapper = new lambda.Function(
        this, 'node-identity-mapper', {
            runtime: lambda.Runtime.Go1x,
            handler: 'node-identity-mapper',
            code: lambda.Code.file('./node-identity-mapper.zip'),
            vpc: vpc,
            environment: {
                "HISTORY_DB_USERNAME": process.env.HISTORY_DB_USERNAME,
                "HISTORY_DB_PASSWORD": process.env.HISTORY_DB_PASSWORD,
                "BUCKET_PREFIX": process.env.BUCKET_PREFIX
            },
            timeout: 30
        }
    );

This same approach allows me to deal with the fact that S3 buckets are global. If someone else wants to set up Grapl they just provide a unique prefix in the .env file and all services will be made aware of it. Easy.

CDK is still early days but I really couldn’t recommend it more. Deploying Grapl is practically trivial and adding new CloudFormation stacks, or modifying existing ones, has been incredibly smooth.



Published

05 November 2018

Categories