Grapl’s Detection Story - Graph Analyzers, Risk, and Lenses
Grapl is a Graph based detection and response platform, but what does this workflow actually look like? What does Grapl do differently, and how does it all fit together?
Grapl does a ton of work to get you the data you need in the best format for analysis, and provide the tools you need to understand your environment; it provides your logs with identity, it combines them together into a concise format, and it links them together into a graph that exposes their relationships.
In this post I want to focus on some of the features I’ve been working on lately - the new Analyzer library, risk based alerting, and lenses
Analyzers provide the first tier of what I call “local correlation” - it’s where we define things like TTPs or interesting, connected patterns in our master graph of events. “Local” means that the detection can be represented through a single connected graph.
Analyzers can be quite simple:
Here we have a query for any process with the process name
evil.exe. At a minimum this gives us the basic powers of what most log based systems do - we can do querying with regexes across various fields.
Where the real power comes in is when you want to look at behaviors.
Here is a look at abuse of the CMSTP process, inspired by https://attack.mitre.org/techniques/T1191/
ProcessQuery() .with_process_name(ends_with="CMSTP.exe") .with_read_files( FileQuery().with_file_ext(eq=".inf") )
With this signature we can track all reads to
.inf files from CMSTP.exe.
Further refinement of the alert can take a count of the combination of
CMSTP.exe and the
.inf file, and output if the combination has been seen zero or one times. This way, even if your environment has legitimate executions of CMSTP.exe, you can take advantage of the attacker’s
.inf file being non-standard.
p = ( ProcessQuery() .with_process_name(ends_with="CMSTP.exe") .with_read_files( FileQuery().with_file_path().with_file_ext(eq=".inf") ).query_first(dgraph_client) ) count = ( ProcessFileCounter(dgraph_client) .count( process_name=p.process_name, file_name=p.read_files.file_path, ) ) if count <= Seen.Once: print("Unique CMSTP.exe with .inf combination")
Grapl’s Python and Graph based signatures allow expression of complex behaviors and can track those behaviors over time using counters. The combination allows anyone to write high fidelity alerts quickly.
Of course, the reality of detection is that it’s impossible to say, in the general case, that anything is bad. It’s absurd what users will do - especially at a tech company, when you’ve got developers debugging systems in all sorts of ways.
Treating signatures as binary statements of badness is going to leave you in a bad situation - you’ll either be so inundated with triage that you never get anything done, or you’ll never manage to push signatures out because they have too many false positives.
The graph based approach is extraordinarily powerful and can help you build alerts with powerful whitelisting, but even still, these signatures are heuristics.
This is why Grapl provides a concept of risk. Risk is just a number indicating how suspicious you think this behavior is. Known malware executing? Maybe that’s a risk of 180. Unique parent child process? Maybe that’s closer to 50. The numbers are made up, it’s the relative distance that matters.
Let’s look at our previous example, modified to include risk:
p = ( ProcessQuery() .with_process_name(ends_with="CMSTP.exe") .with_read_files( FileQuery().with_file_path().with_file_ext(eq=".inf") ).query_first(dgraph_client) ) count = ( ProcessFileCounter(dgraph_client) .count(process_name=p.process_name, file_name=p.read_files.file_path) ) if count == Seen.Never: output( suspicious_graph=p, risk=150, )
All that is needed is to add a score, stating that this is a “150” level risk.
Grapl leveraging Python means we can really easily express more dynamic scoring. For example,
p = ( ProcessQuery() .with_process_name(ends_with="CMSTP.exe") .with_read_files( FileQuery().with_file_path().with_file_ext(eq=".inf") ).query_first(dgraph_client) ) count = ( ProcessFileCounter(dgraph_client) .count(process_name=p.process_name, file_name=p.read_files.file_path) ) # Unique is extra scary - risk is 150 if count == Seen.Never: output( suspicious_graph=p, risk=150, ) # If we've seen it once that's still sketchy - risk is 120 elif count == Seen.Once: output( suspicious_graph=p, risk=120, ) # If we've seen it more than once it *might* be sketchy, but not worth # raising alarms over, let's drop risk down to 20 else: output( suspicious_graph=p, risk=20, )
We can pull in peripheral information, such as the count of the combination of process name and filename, and use that to determine a score. Maybe CMSTP.exe is actually something we see in the environment sometimes, so if it’s a file we’ve seen a lot, it could be bad, but we’ll drop the score a lot.
Risk is so powerful because you can throw everything into it. If you’ve ever wanted to write an alert but just couldn’t cut the false positives down, risk probably could have helped you.
It is too often the case that the signatures that are mostly likely to catch an attacker are too noisy to investigate every time - attach a risk to it, and now you can sort it across other risks in the environment.
Of course, Grapl is a graph based system, and the real power of its analyzers and risks lies in that approach. Analyzers provide us with local correlation - we can see a process with a direct read connection to a file. But what if another analyzer had found a suspicious pattern elsewhere in that process tree? It would be great if we could do correlation even across disconnected graphs.
This is where lenses come in. The lens is a way to view groups of local correlations through some focal point - in Grapl’s case, the currently supported focal point is the asset lens. An asset would be someone’s laptop, or a server, so an asset lens would allow us to see all of the risks associated with, for example, various suspicious activities on a users’ laptop.
Consider the situation of Microsoft Word or Excel executing a child process.
ProcessQuery() .with_process_name(eq=["winword.exe", "excel.exe"]) .with_children( ProcessQuery() ) output(p, risk=120)
Well, there’s probably more to that story, right? A file must have been read to execute a macro, or something along those lines.
Maybe on that same asset we have a low risk signature, looking for files downloaded from common browsers.
p = ProcessQuery() .with_process_name(eq=["chrome.exe", "firefox.exe", "iexplorer.exe"]) .created_files(FileQuery()) .query_first(dgraph_client) output(p, risk=5)
This is an incredibly low risk signature. Users download files all the time. But this is where non local correlation comes in.
Both of these signatures triggered for the same asset, and so we can view them through that lens.
We can now see a way to correlate these isolated subgraphs - when investigating, you can just start connecting the paths between these nodes.
Let’s create another low-risk analyzer. We’ll call this one: “Commonly Targeted Application - Unique File Read”.
Certain applications are targeted a lot - word, excel, pdf readers, and similar software. These applications are often targeted through malicious file reads - for example, an attacker will convince a user to open a malicious pdf, exploit adobe reader, and take over their computer. Further, we can assume that the user downloaded the file from the browser.
So let’s build this signature out.
common_targets = ["winword.exe", "excel.exe", "adobereader.exe"] p = ProcessQuery() .with_process_name(eq=common_targets) .with_read_files( FileQuery() .created_by( ProcessQuery().with_process_name(eq=["chrome.exe", "firefox.exe"]) ) ) .query_first(dgraph_client) output(p, risk=10)
We can give this a risk of 10.
Now our lens shows us a new connection.
We have a pretty compelling story here for the attack - pretty easy to see what’s going on.
But more importantly, we have multiple overlapping risks within a lens. So let’s make those risks explicit.
What’s important to note here is that we have multiple distinct risks that are correlating both locally and non-locally.
The three risks are:
- “Browser Created File”
- “Word With Child Process”
- “Commonly Targeted App Read Browser Created File”
The local correlation is where the risks overlap - the
word.exe node has edges to two distinct risks. The non-local correlation is where the risks don’t overlap, but the lens allows us to see them together - Browser Created File, for example.
When a node has multiple risks, we get something like:
risk_sum = sum(node_risks) risk_sum += risk_sum * (0.10 * len(risk_sum - 1))
Essentially, if two nodes correlate, risk is increased 10%. If three nodes correlate, 20%.
Lens views of your assets risk is a powerful concept, but it can go so much further. We can create arbitrary lenses to view your environment. A lens for users would track actions attributed to a user, regardless of which assets the actions occurred on. A lens for the kill chain would take attack signatures that map to the kill chain and provide a lens to correlate across them.
Lens-based correlation is also a great example of how graphs apply to different areas of Detection and Response. Not only do graph based signatures let us express powerful attack signatures, but because the signatures output graphs we can trivially connect the outputs together, giving us an almost arbitrarily powerful tool for correlation.
If you’re interested in talking more about Grapl, check out the project or reach out - I’m always interested in hearing thoughts about the project.
Github: https://github.com/insanitybit/grapl Twitter: @insanitybit
blog comments powered by Disqus