I Rolled My Own Crypto
I rolled my own crypto
Disclaimer
- This code is not meaningfully tested or reviewed. It was about two hours of programming to get it done.
- I’m not releasing the code in-full because I really can’t hammer home enough how unusable this is.
With that out of the way… blog post!
So, I rolled my own crypto. I was feeling inspired and PSN is down and it’s cold out.
I think the main thing I wanted to see was what it would be like to approach cryptography as a developer who genuinely wanted to roll their own crazy thing without taking care and consideration into that being a bad idea. Like, normally if I were doing actual cryptography I’d offload to a high level library in as bog-standard a way as possible, get it reviewed by cryptography friends, and do as little as possible other than adding as much testing/ type annotations/ documentation as I could manage. But it was fun to think about a world in which those libraries didn’t exist, I wanted these properties, and I was unwilling to do things in a standard way for whatever reason.
I’ve been thinking about crypto stuff a lot lately and reading about it and there are some features of crypto algorithms that I think would be cool to have, just neat things. I thought, I’ll just build one with the features.
The Features!
- Context committing. Modifying any byte of any of the output (ciphertext, aad, ivs, etc) should lead to a failure to decrypt.
- IV-reuse resistance. We want to ensure that the odds of a key + IV being reused have bounds that allow for > 2^32 encryption operations per second while still feeling extremely comfy.
- Nonce-Hiding. I honestly wonder what the benefit is. In theory if the nonce is hidden our key is effectively “nonce-bits” stronger, right? So if I have a 16 byte nonce that the attacker doesn’t have access to it’s as if I’ve stuffed an additional 16 bytes into the key? That’s the idea, at least.
(1) and (2) are desirable properties. You can get both of those using standard algorithms + an HMAC, so if you want those to just go do that. I did it in a really silly way for fun. (3) is like… maybe good? Genuinely, I’m unconvinced of value here. Full context committing already prevents writes to the IVs and adding more entropy to our key shouldn’t protect us from anything because brute forcing the secret should already be infeasible, right?
But this isn’t about security, this is about fun, or learning, or something idk. (future note: I did learn and it was fun).
So first thing’s first, I decided on a few things.
aes-256-cbc
would be the baseline algorithm- An additional 16 byte IV would be added to reduce (key + IV) reuse
aes-128
to encrypt the IVshmac-sha256
for the authentication and key derivation- Derived keys for every operation
Let’s go through these one by one.
aes-256-cbc
would be the baseline algorithm
I decided on aes-256-cbc
because of its 16 byte IV and I like that previous blocks “chain” into future blocks, which will come up later with another terrible idea I had and I learned that my idea of how aes-256-cbc chains wasn’t even right! I considered aes-256-ctr
as well since it could be made parallel but I’m less familiar with it.
I also considered something like ChaCha, or rolling my own xChaCha8Blake3, but I’ve actually done that before so it wouldn’t be as fun.
An additional 16 byte IV would be added to reduce (key + IV) reuse
aes-256-cbc
uses a 16 byte IV. This is probably sufficient for a lot of people, maybe everyone, but I recall hearing that AWS can encrypt a value over 2^32 times in a single second. 16 bytes is still a long enough time to not care, I think. To reach a 1% chance of collision would take decades of encrypting constantly. But we can turn decades into “billions of times longer than the age of the universe” by increasing that.
The simple reason to choose 16 is because I see no benefit in choosing a smaller value. We’re going to be encrypting this value, aes uses 16 byte blocks, and I see no other good use of that space.
aes-128
**to encrypt the IVs
aes-128
should be a red flag. There’s no IV or counters involved, it’s basically aes-128-e
cb without the padding logic. Some may be familiar with the ecb penguin, which demonstrates that ecb. So why choose it?
- The IV is generated randomly. The ECB Penguin shows that ECB leaks patterns but there are no patterns to leak when the IV is random.
- The value is 16 bytes so we don’t really get a lot out of the multi-block algorithms - there’s no place to incorporate a counter and we obviously don’t want to add another IV to the mix.
aes-128
vsaes-256
did not feel relevant here (if it ever is?). The block size is 16 bytes either way. What changes is the key size and the number of rounds used. aes-128 uses a 128bit key with 10 rounds, aes-256 uses a 256bit key with 14 rounds.
Thinking a bit more, even if we assumed that key recovery was feasible for aes-128 in a way that it wasn’t for aes-256, I use derived keys, so does it really matter? Genuine question.
hmac-sha256
for the authentication and key derivation
I guess my thinking here is “it’s very fast and very well understood”. I actually ended up using blake3 to hash the ciphertext in the end, with SHA256 for everything else.
Derived keys for every operation
I guess this is just a cleanliness thing/ good practice. At least one derived key makes a lot of sense as it allows me to mix in an IV/ otherwise prep the key and it puts a very clear layer between “the secret” and “the mess of code that wants to do stuff with a secret”, and ensuring key separation should help with other potential pitfalls. The cost is that I perform a few extra hmacs.
Alright, so those are the core choices I made. Here’s what it ends up looking like. The output of
+-----------------------+------------+-------------------+-----------+
| Ciphertext | AAD | Encrypted IVs | Auth Tag |
| (AES-256-CBC output) | (variable) |(32 bytes, AES-128)| (32 bytes)|
+-----------------------+------------+-------------------+-----------+
Derived Keys
My understanding is that so long as you’re using an algorithm like hmac-sha256
you can derive subkeys for operations very easily. For example,
hmac_sha256(base_key,
"
for authentication
"
)
is a sufficient way to derive a key to be used for the auth tag. There’s no need to get fancy here like using a uuid or whatever, as far as I know, because of the properties of hmac_sha256. I guess in the interest of building the most homegrown bullshit ever I could have done something like make those domain values 64 bytes of random garbage but I’m not going back to adjust the code.
edit: Actually, I’ve now learned maybe this isn’t so straightforward? https://ciphersweet.paragonie.com/internals/key-hierarchy
Here are the various keys I use. Note that I call the ecb_key
“ecb” even though in the actual code I just use plain AES on a single block, there’s no actual ECB but it’s equivalent and I didn’t want to come up with another name.
base_key (32 bytes)
│
┌───────────┴──────────────┐
│ │
▼ ▼
Compute ecb_key Compute derived_subkey
via: HMAC-SHA256( via: HMAC-SHA256(base_key,
base_key, iv_b)
"ecb")
(Take first 16 bytes) (Result is 32 bytes)
│ │
│ └────────────┐
│ │
▼ ▼
ecb_key (16 bytes) Derived Subkey (32 bytes)
│
┌─────────────────┴─────────────────┐
│ │
▼ ▼
Compute encryption_key Compute auth_key
via: HMAC-SHA256(derived_subkey, via: HMAC-SHA256(derived_subkey,
"encryption_v1") "auth_v1")
(Result is 32 bytes) (Result is 32 bytes)
Authentication So one of the properties I wanted was context committing. The trivial way to get this would just be something like:
auth_tag = hmac_sha256(auth_key, ciphertext + encrypted_ivs + aad)
And that’s what I did at first. But then I was thinking… isn’t that a bit much? I mean, the entire ciphertext? Sha256 is really fast but could we take a shortcut? Again, this is all terrible, I feel like I should mention that again. The goal here was fun and to learn.
So I came up with an idea. What if instead of that approach we did something like this:
first_bytes = chomp_first(ciphertext, size)
last_bytes = chomp_last(ciphertext, size)
auth_tag = hmac_sha256(auth_key, first_bytes + last_bytes + encrypted_ivs + aad)
Now we’re only adding a fixed size of the ciphertext to our hmac. And that size could be really small… right? SHA256 works on 64 byte blocks, so there’s no need to take less than 64 bytes in terms of performance. So what about the first 32 bytes and the last 64 bytes?
Here’s my thinking.
aes-256-cbc
has 16 byte block size.- When encrypting a block N the block N-1 is XOR’d against the IV, with block 0 being XOR’d against only the IV. This property implies that if I tamper with block N all blocks after N are effected.
- If I only authenticate the first two blocks the attacker can modify all subsequent blocks without impact.
- If I only authenticate the last two blocks the attacker can’t modify any previous blocks without impacting them.
So at that point I’m thinking “okay so I could just validate the last 4 blocks”, right? That’s be one round of SHA256. Then I was thinking that the first block is kind of a weirdo, not having any previous blocks to XOR against. It feels like an important block to protect, and if we’re going to protect a block I think it’s fair to protect the block after it too.
And I didn’t really think about it more than that until later. So by default my algorithm was doing this for a while:
first_bytes = chomp_first(ciphertext, 2 * BLOCK_SIZE)
last_bytes = chomp_last(ciphertext, 2 * BLOCK_SIZE)
auth_tag = hmac_sha256(auth_key, first_bytes + last_bytes + encrypted_ivs + aad)
This meant that an attacker had total control over all intermediary blocks when the ciphertext is > 64 bytes with the assumption that intermediary blocks would lead to later blocks being corrupted.
Can you spot my mistake? I was flat out wrong on this. aes-256-cbc does not imply that a modification of block N does anything to blocks N+2 or beyond. All it does is scramble N, flip bits in the plaintext of N+1, and literally nothing to all future blocks. I very nearly optimized my way out of context committing for no reason at all.
Instead, I decided to do this:
blake3_hash = blake3::hash(ciphertext)
auth_tag = hmac_sha256(auth_key, blake3_hash + encrypted_ivs + aad)
blake3 is fast, especially for larger amounts of data (I could have chosen the parallel impl but laziness). We rely on blake3 being collision resistant for committing to the ciphertext but that’s probably fine. I still chose sha256 for the final hmac because it feels like it’s well studied and easy to rely on. I could have, maybe should have, just done sha256 for the entire thing, but we’re rolling our own crypto here so I decided to lean in.
Anyway, here’s what this looks like:
fn custom_encrypt(
base_key: &[u8; 32],
plaintext: &[u8],
aad: &[u8],
auth_range: AuthCiphertextRange,
) -> Vec<u8> {
let (iv_b, derived_subkey) = derive_key(base_key);
let mut iv_a = [0u8; 16];
getrandom::fill(&mut iv_a).expect("Failed to generate IV");
let encryption_key = compute_encryption_key(&derived_subkey);
let ciphertext = encrypt_aes256_cbc(&encryption_key, &iv_a, plaintext);
let plain_aes_key = compute_plain_aes_key(base_key);
let encrypted_ivs = encrypt_ivs_plain_aes(&plain_aes_key, &iv_a, &iv_b);
let auth_key = compute_auth_key(&derived_subkey);
let auth_tag = compute_auth_tag(&auth_key, &ciphertext, &encrypted_ivs, aad, auth_range);
encode_message(&ciphertext, aad, &encrypted_ivs, &auth_tag)
}
fn custom_decrypt(
base_key: &[u8; 32],
encrypted_message: &[u8],
aad: &[u8],
auth_range: AuthCiphertextRange,
) -> Result<Vec<u8>, String> {
let encrypted_ivs_len = 32;
let auth_tag_len = 32;
let total_len = encrypted_message.len();
let ciphertext_len = total_len - (aad.len() + encrypted_ivs_len + auth_tag_len);
let ciphertext = &encrypted_message[..ciphertext_len];
let aad_in_message = &encrypted_message[ciphertext_len..ciphertext_len + aad.len()];
let encrypted_ivs = &encrypted_message
[ciphertext_len + aad.len()..ciphertext_len + aad.len() + encrypted_ivs_len];
let auth_tag = &encrypted_message[ciphertext_len + aad.len() + encrypted_ivs_len..];
if aad_in_message != aad {
return Err("AAD does not match".into());
}
let plain_aes_key = compute_plain_aes_key(base_key);
let (iv_a, iv_b) = decrypt_ivs_plain_aes(&plain_aes_key, encrypted_ivs);
let derived_subkey = compute_subkey(base_key, &iv_b);
let encryption_key = compute_encryption_key(&derived_subkey);
let auth_key = compute_auth_key(&derived_subkey);
let expected_auth_tag = compute_auth_tag(&auth_key, ciphertext, encrypted_ivs, aad, auth_range);
if !<subtle::Choice as Into<bool>>::into(expected_auth_tag.ct_eq(&auth_tag)) {
return Err("Authentication tag mismatch".into());
}
let plaintext = decrypt_aes256_cbc(&encryption_key, &iv_a, ciphertext);
Ok(plaintext.to_vec())
}
Testing
So I’m a developer and not a cryptographer. That means I really have no idea if all of this works. But again, this is all a terrible idea, and I want it to work, so I’m going to test it.
I ended up with about 20 tests. The idea was that I should be able to ensure that various properties hold. I wrote some simple unit tests as well as some not so simple ones as well as some quickcheck tests.
I had some assumptions and the tests cover them.
- I can encrypt and decrypt successfully.
- Encrypting the same value twice with the same key and aad leads to different ciphertexts. I have two tests for this, one of which splits the entire thing into blocks and asserts that no blocks are identical.
- Tampering explicitly with specific parts of the encoded output leads to failure.
- Tampering via bruteforce of various interesting byte ranges leads to failure (I found that it was fast enough to try to bruteforce ranges up to 2^20 bits, so I did so in various places)
-
I tried tampering by applying various random masks across different ranges of the data
test tests::prop_full_mask_tampering … ok test tests::prop_key_tampering … ok test tests::test_authentication_failure_on_aad_mismatch … ok test tests::test_authentication_failure_on_tampering … ok test tests::test_authentication_failure_on_tampering_with_encrypted_ivs … ok test tests::prop_roundtrip … ok test tests::prop_output_length … ok test tests::prop_tamper_detection … ok test tests::prop_non_determinism … ok test tests::test_encrypt_block_differences … ok test tests::test_encrypt_decrypt_happy_path … ok test tests::test_encrypt_different_ciphertexts … ok test tests::prop_truncated_message … ok test tests::test_field_order_tampering … ok test tests::test_iv_padding_mask_space … ok test tests::prop_every_bit_tampering_bounded … ok test tests::test_exhaust_iv_b_last4 … ok test tests::test_exhaust_iv_a_first4 … ok test tests::test_exhaust_iv_b_first4 … ok
Here’s a pretty simple example:
fn prop_tamper_detection(plaintext: Vec<u8>, aad: Vec<u8>) -> bool {
test_auth_ranges(|auth_range| {
let mut encrypted = custom_encrypt(&BASE_KEY, &plaintext, &aad, auth_range);
if encrypted.is_empty() { return true; }
let idx = encrypted.len() / 2;
encrypted[idx] ^= 0x01;
custom_decrypt(&BASE_KEY, &encrypted, &aad, auth_range).is_err()
})
}
test_auth_ranges
just tests the default (first 2 blocks, last 2 blocks) and total (entire ciphertext).
Here’s another:
fn prop_key_tampering(
mask: u8,
plaintext: super::tests::BoundedVec<72>,
aad: super::tests::BoundedVec<72>
) -> bool {
test_auth_ranges(|auth_range| {
if mask == 0 { return true; }
let plaintext = plaintext.0.clone();
let aad = aad.0.clone();
let encrypted = custom_encrypt(&BASE_KEY, &plaintext, &aad, auth_range);
let mut tampered_key = [0u8; 32];
for i in 0..BASE_KEY.len() {
tampered_key[i] = BASE_KEY[i] ^ mask;
}
custom_decrypt(&tampered_key, &encrypted, &aad, auth_range).is_err()
})
}
And another:
#[test]
fn test_exhaust_iv_b_last4() {
let plaintext = b"secret message";
let aad = b"additional data";
for &auth_range in &[AuthCiphertextRange::total(), AuthCiphertextRange::default()] {
let encrypted = custom_encrypt(&BASE_KEY, plaintext, aad, auth_range);
let aad_len = aad.len();
let total_len = encrypted.len();
let encrypted_ivs_len = 32;
let ciphertext_len = total_len - (aad_len + encrypted_ivs_len + 32);
let iv_b_last4_start = ciphertext_len + aad_len + 16 + 8;
for candidate in 0..1048576u32 {
let candidate_bytes = candidate.to_be_bytes();
let mut tampered = encrypted.clone();
tampered[iv_b_last4_start..iv_b_last4_start + 4].copy_from_slice(&candidate_bytes);
if custom_decrypt(&BASE_KEY, &tampered, aad, auth_range).is_ok() {
panic!(
"Decryption succeeded with a tampered iv_b last 4 bytes candidate: {:?}",
candidate_bytes
);
}
}
}
}
I was sort of throwing spaghetti and seeing what I could even feasibly test. I ended up with 228 lines of code to drive the crypto and 340 lines of tests. Most of them attempt to generate encrypted data using randomized plaintext/aads (via proptest) and then use some sort of masking/ bit flipping approach to brute force the bit space of various components as much as possible and ensure that decryption fails.
Part of the problem with testing cryptography is that the output is a random blob and it’s very hard to know if it’s a good blob or a bad blob. Notably, these tests don’t check for actual attack cases that I think maybe viable.
Weaknesses
I think there’s probably a number of parts that are flat out broken. I have some ideas for attacks.
Given the auth optimization I had in place initially to only verify the first and last bytes of the ciphertext, is auth of ciphertext completely broken under that optimization? The answer is basically “yes”, and it’s one of the main things I learned from this. Earlier I wrote that aes cbc “chains” each block, but this was a misunderstanding. In fact, with aes-256-cbc, if you flip a single bit in a block N, only N and N+1 are impacted - all subsequent blocks will be perfect fine. N will be “garbage” and N+1 will have one bit flipped. This means that with my optimization the attacker can modify any block up to the block before the blocks that are part of the auth tag.
This was a great thing to learn and I caught it through testing, which led to me reading up on CBC again and learning how this attack works, printing out garbled (and surprisingly ungarbled!) outputs, etc. I learned about PCBC, which incorporates the plaintext into the mixing process in order to propagate errors. I ended up using blake3 instead, although I left the code for “first/last bytes” as an option because I don’t really care to rip it out, it’s never being released anyways.
There’s virtually no effort on my part to avoid leaking of information in the current code. I use constant time equality checks in some places, but even still, there’s a lot of unwrapping and stuff and I bet an attacker could leak information based on that. I don’t know if there’s anything fundamentally leaky about the algorithm though.
This code does nothing to lessen side channel attacks on AES. For all I know it makes it worse, I’ve given this zero thought but I suspect it’s worse in a handwavy way (there’s “more” aes going on, which feels like it adds more places for a side channel attack).
The code juggles tons of non-standard invariants that I’m just not in a position to ensure hold true, there’s so much space for something to go wrong.
Learning
So this was a fun enough way to waste 2 hours. I learned a few things, confirmed a few others, and have some open questions that I’d like to sit with. I found it valuable to even just think through things like the silly optimization I did for auth, like what would an attack that takes advantage of that look like? I wonder if hiding IVs has any value if they’re already authenticated, and I wonder if they have more value when you consider my silly optimization. I wonder how using CTR instead of CBC would change things - cryptopals has this challenge, which feels verrrry relevant.
It does seem that hiding the IV was ultimately a very straightforward process, if there is value in it it’s at least seemingly achievable.
UPDATE: I was notified of some interesting conversations here. https://groups.google.com/g/crypto-competitions/c/n5ECGwYr6Vk/m/bsEfPWqSAU4J https://eprint.iacr.org/2019/624
The justification at a glance is primarily for counter based IVs, which leak information like how many messages have been previously encrypted. The email makes some interesting remarks about minimalism that are interesting.
end of update :)
Despite not rolling my own underlying encryption algorithm in the sense of replacing AES with my own mixing function, despite plenty of confidence in the underlying technologies individually like hmac-sha256, aes-256-cbc, etc, I won’t be releasing this code and I’ll never use it myself. I lack confidence in it because of how extremely custom it is - the number of properties this code assumes are upheld is just ridiculous, like yeah okay I can assume sha256 is collision resistant but I’m juggling that and 50 other things in the air combinatorially.
The “developer approach” is fundamentally limited. Thinking about cryptography in terms of properties is the best that I can do but it’s insufficient because I don’t understand the underlying mechanism and mathematics, which means that while I’m capable of saying “collision resistant” I can’t understand why, I don’t know the math, I don’t know how that property is obtained, and therefor I am in no position to uphold that property. While I trust myself to use a high level library properly given lots of testing, that is truly the limit of what I, as a developer, even can do. Writing this post, writing out the code, made it all the more clear that my level of abstraction from cryptography is so much higher.
That said, I think there were some good things from this:
- I learned stuff. I really hardened my underlying knowledge of some of these algorithms and their mechanics. I had never even heard of PCBC before this project, it’s interesting.
- I found that writing tests caught an actual bug. It was because I had a unit test for tampering with the ciphertext that I realized my mistaken assumption regarding aes-256-cbc. It was kind of validating that this “developer approach” had some value. I realized that this was sort of obvious in retrospect as I did know that decryption for CBC was parallelizable, implying limited dependence.
blog comments powered by Disqus