Over the last few months I’ve been working on Grapl, a platform for DFIR built largely around graph structures.

I wanted Grapl to be trivial to deploy, both because it would ease others’ work to get started with it, and because it’ll make my test cycle a lot faster.

Grapl consists of around 7 Lambdas, some S3 buckets, SNS topics, SQS queues, and the connections and policies between them. Configuring and changing these in the console quickly became untenable. I evaluated two projects to make my life easier here, with the goal of having a near-one-command deployment.

Terraform

The first project I started with was Terraform. Terraform is developed by Hashicorp, and is basically a DSL for describing CloudFormation policies, written in HCL.

I found HCL to mostly be unbearable. It’s very ‘stringy’ and weird - it definitely feels like a DSL, and not like a typical programming language, which I don’t really see the appeal of since I spend 99% of my time using typical languages.

Here’s an example:

# Create a subnet to launch our instances into
resource "aws_subnet" "default" {
  vpc_id                  = "${aws_vpc.default.id}"
  cidr_block              = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

This defines a VPC subnet resource, referencing a VPC by id through string interpollatoin.

On top of that Terraform does very little to help you out. Everything has to be defined - policies, subnets, etc. Why? Why must I state that my Queue should be publishable to by my SNS Topic? And then I also have to define that my SNS topic can publish to my Queue? It felt so redundant and easy to get wrong - I had to understand CloudFormation and AWS policies.

I ditched Terraform and, for a while, just did things with the console.

aws-cdk

Later I found out about aws-cdk, a new approach to configuring AWS resources through code.

The nice thing about CDK is that it is not a DSL. It’s a library that I can use from various programming languages.

By leveraging real languages I can use the tools I’m used to - classes, functions, generics, loops, branches, etc. I didn’t have to learn the CDK way, I just approached it as I would any problem.

What this meant was I could move really fast, building my configurations in a way that was very readable and, at least to some extent, DRY.

function subscribe_lambda_to_queue(stack: cdk.Stack, id: string, fn: lambda.Function, queue: sqs.Queue) {

    // TODO: Build the S3 Endpoint and allow traffic only through that endpoint
    new lambda.cloudformation.EventSourceMappingResource(stack, id + 'Events', {
        functionName: fn.functionName,
        eventSourceArn: queue.queueArn
    });

    fn.addToRolePolicy(new cdk.PolicyStatement()
        .addAction('sqs:ReceiveMessage')
        .addAction('sqs:DeleteMessage')
        .addAction('sqs:GetQueueAttributes')
        .addAction('sqs:*')
        .addResource(queue.queueArn));
}

Here’s a function I wrote to factor out some common logic I had - subscribing my lambdas to an SQS Queue.

Note that in this case I had to add the policy to my lambda. This is actually atypical - cdk will, in almost every case, generate these for you. I only had to do so here because it’s a very young library and they haven’t automated this yet.

But consider this code:

    const event_producer = new sns.Topic(this, "ProducerName");
    const graph_merger_queue = new sqs.Queue(this, "QueueName");
    event_producer.subscribeQueue(graph_merger_queue);

This code defines a Topic and a Queue, and subscribes the Queue to the Topic. I don’t need to define any policy - it’s obvious what it should be, allow the queue to read from the topic, so cdk just does it for me.

I felt confident that cdk would build be policies that are least privilege by default, and that I couldn’t accidentally mess them up.

I also use a database, and I wanted my db username and passwords to be stored in an environment variable. Because I was using typescript this was trivial - just npm install node-env-file and use it.

    const history_db = new HistoryDb(
        this,
        'history-db',
        network.grapl_vpc,
        new cdk.Token(process.env.HISTORY_DB_USERNAME),
        new cdk.Token(process.env.HISTORY_DB_PASSWORD)
    );

I pass my custom HistoryDb class the necessary information, pulling the credentials from my .env, and I’m done.

Similarly, I pass these credentials to my lambdas that need to access the history db (I intend to move to KMS later).

    let node_identity_mapper = new lambda.Function(
        this, 'node-identity-mapper', {
            runtime: lambda.Runtime.Go1x,
            handler: 'node-identity-mapper',
            code: lambda.Code.file('./node-identity-mapper.zip'),
            vpc: vpc,
            environment: {
                "HISTORY_DB_USERNAME": process.env.HISTORY_DB_USERNAME,
                "HISTORY_DB_PASSWORD": process.env.HISTORY_DB_PASSWORD,
                "BUCKET_PREFIX": process.env.BUCKET_PREFIX
            },
            timeout: 30
        }
    );

This same approach allows me to deal with the fact that S3 buckets are global. If someone else wants to set up Grapl they just provide a unique prefix in the .env file and all services will be made aware of it. Easy.

CDK is still early days but I really couldn’t recommend it more. Deploying Grapl is practically trivial and adding new CloudFormation stacks, or modifying existing ones, has been incredibly smooth.



blog comments powered by Disqus

Published

05 November 2018

Categories