SE 4472b

Midterm Study Guide

The following guide is presented as a convenience. Note: The midterm questions will be derived from topics covered in the lectures and assignments, the emphasis of which varies from year to year.

Security Notions

Concepts

Rules to live by:
- Kerkhoff’s principle
  - The security of an encryption scheme should be based on the secrecy of the key, not the secrecy of the encryption algorithm. More specifically: always assume the encryption scheme is publicly known.
- Don’t roll your own
  - Crypto algorithms and software implementations are really really ridiculously easy to get wrong in subtle ways that can easily turn into big vulnerabilities. It only takes one line of code (e.g., Apple’s goto fail, or Heartbleed) to mean the difference between secure and vulnerable.
- Don’t assume something has a certain form unless you check it
Brute-force attack
- Try every key and/or message until one “works”
Bits of security
- Exponential value describing how many operations (encryption, hashes, etc) are necessary to recover the message and/or key in a particular cryptosystem. E.g. 128 bits of security means the attacker has to do on the order of 2^128 operations to succeed
- If a system can be broken in approximately 2^b operations, we say it has b-bits of security.
Negligible quantity
- A value that is less than one over any polynomial function with degree less than or equal to the security parameter
Indistinguishability
- The probability that you can tell the difference between two things is less than a negligible quantity
Pseudo-random functions
- A random looking mapping of inputs to outputs
- A potentially many-to-one mapping
- Inverse does not necessarily exist
Pseudo-random permutations
- One-to-one mapping
- Image and pre-image sets are equivalent
- An unique inverse exists for every element
Oracles
- An abstract black box. Ask a question, get an answer
- Oracles considered:
  - Encryption Oracle
    - Here’s the encryption of your query
  - Decryption Oracle
    - Here’s the decryption of your query
  - Random Oracle
    - Here’s a fixed-length random value that I associate with this particular query
  - Padding Oracle
    - Yes or No: Your query is the encryption of a plaintext with valid padding

Attack Games

Attack games are a way to qualitatively addressing the following question: how much information can about a plaintext by seeing its associated ciphertext.

Features common to all games

Games involve two players A and B. The game begins with B generating an independent random secret encryption key (n.b., every time the game is played, B picks a fresh key). A chooses two messages m1 and m2 and sends them to B. B flips a coin and chooses one of the messages, mb, and encrypts it. This value c=Enc(mb) is called the challenge. Challenge c is returned to A. The game concludes with A guessing if b=1 or b=2, i.e., if c=Enc(m1) or c=Enc(m2). If A guesses correctly, she is said to win the game. The encryption scheme is said be indistinguishable if A has negligible advantage over a random guess of guessing b.

Game types / Security levels

Eavesdropping attack (EAV)
- A doesn’t get to make any queries.
Chosen plaintext attack (CPA)
- B behaves as an encryption oracle. That is, at any point in the game, B will encrypt any plaintext and return the ciphertext to A. A variant is called an adaptive chosen plaintext attack in which B continues to behave as an encryption oracle
Chosen ciphertext attack (CCA1)
- Prior to the challenge B will decrypt any ciphertext and return the plaintext to A
Adaptive chosem ciphertext attack (CCA2)
- B with decrypt any ciphertext and return the plaintext to A both before and after the challenge. The only limitation is that A cannot submit the challenge to be decrypted (otherwise the game is trivial).

Implications of Security Levels

IND-CCA2 implies IND-CCA1
IND-CCA1 implies IND-CPA
IND-CPA implies IND-EAV

How to achieve security levels (informal)

IND-CCA2
- It should not be possible to create a valid ciphertext without knowledge of a private/secret key
- Achieved in practice through the use of message authentication codes
IND-CPA
- Encrypting the same message twice should give two totally different looking ciphertexts
- Achieved in practice by using randomized encryption
IND-EAV
- You should have negligible advantage telling the difference between ciphertext, and random noise

Symmetric-key Primitives

Block Ciphers

Used for efficient bulk encryption of data. Encryption takes message (plaintext) and key and produces encryption (ciphertext). Decryption takes a ciphertext and key and produces a plaintext. One secret key used for both encryption and decryption.

Ideal functionality:
- Pseudo-random permutation
  - Imagine shuffling a deck of cards
  - For a given key, every message has a unique ciphertext, and vice versa
- Secret key determines the particular permutation
- Fixed length input maps to fixed-length output
Feistel Network
- A simple, provable way to take a pseudo-random function and turn it into a pseudo random permutation
Cipher modes of operation examined:
- Electronic Codebook Mode (ECB)
  - Not IND-EAV secure
- Cipher Block Chaining (CBC)
  - Requires an initialization vector (IV)
- Counter Mode (CTR)
  - A block cipher that acts like a stream cipher
  - Also requires an IV
  - Pros: random access, no decryption function needed
Initialization vectors (IVs)
- Works toward CPA security goal by making encryption of related messages different
- Public value transmitted with ciphertext, necessary for decryption
- Must be independently and randomly generated
- Must be unpredictable to adversary, otherwise encryption oracle attacks are possible
Block ciphers examined:
- AES
  - 128-bit block, 128-, 192-, and 256-bit key options

Sample Problems

Fortuna is a pseudo-random number generator based on AES in counter mode. Recall that CTR mode encrypts a successively incrementing counter value and xors it with a message. Suppose you don’t xor the key stream with any message, you just output it. This could form a useful pseudo-random number generator. This is a actually a pretty decent approach, but it’s not perfect. After a certain amount of output you can distinguish between Fortuna’s output and that of a perfectly random sequence. How much output would you need to expect to be able to do this?
- A: recall a block cipher is a pseudo-random permutation, i.e., every plaintext maps to a unique ciphertext. That means if you were using counter mode, you’d never see a ciphertext block repeat until the counter rolled over. AES has a 128-bit block, meaning you wouldn’t see a block repeat until 2^128 blocks were generated. A truly random sequence, however, does not have this property. Each block is randomly generated, and thus has a small probability of matching a previously generated block. So how many blocks would it take before you’d expect to see a repeat (i.e., where the probability was >=1/2)?

Hash Functions

Used for producing a “fingerprint” or “digest” of a message. Hashing accepts a message and produces a hash (doesn’t use a key in its basic form). Used for checking file integrity, storing passwords, and for making certain public-key operations more efficient.

Ideal functionality:
- Random oracle
  - Input: arbitrary length message
  - Output: fixed length string, l bits long where l is the output length
  - Given an input message m, the oracle flips l coins and associates m with the result in a giant lookup table
  - Every time you input m again, it gives you back the same l-bit result
  - Cannot exist in practice (requires infinite memory)
  - Differs from a real hash function in that the hashes of two messages m and m’ are chosen by completely independent coin tosses, whereas a real hash function generates the hashes using a deterministic (though highly non-linear) function.
Terms:
- Pre-image: input to hash function (message)
- Image: output of hash function (sometimes called message digest, fingerprint or simply “hash”)
- Hash length: the length l (in bits) of the hash
- Collision: two pre-images hash to the same value. Since pre-image space is unbounded, but there are 2^l images, collisions must exist due to the pigeon hole principle
Necessary properties:
- Pre-image resistance
  - Given a hash y, it should be hard to find an x s.t. h(x)=y
  - If hash function is indistinguishable from a random oracle, difficulty is 2^(l), i.e., l-bits of security
- Second pre-image resistance
  - Given a pre-image x, it should be hard to find a second pre-image y such that h(x)=h(y)
  - If hash function is indistinguishable from a random oracle, difficulty is 2^(l)
- Collision resistance
  - It should be hard to find any pair x,y such that h(x)=h(y)
  - If hash function is indistinguishable from a random oracle, difficulty should be 2^(l/2), i.e., (l/2)-bits of security, due to birthday paradox
Hash functions examined:
- MD5
  - l = 128
  - Is no longer indistinguishable from random oracle
  - NOT collision resistant
  - Collisions can be generated in much less than 2^(128/2)
  - Still technically pre-image resistant, but the best practice is to not use it at all anymore
- SHA1
  - l = 160
  - Collision resistant to 2^80
  - No longer meets NIST’s minimum 112-bit security level for collision resistance
  - Can still be used for pre-image resistance, but SHA-256 is the current minimum overall best practice
- SHA-256
  - l = 256
  - 128 bits of collision resistant

Sample Problems

Suppose there are two files f1 and f2 and suppose SHA1(f1) = SHA1(f2). Are these files identical?
- A: Possibly. If they were the same, they would definitely have the same hash value since hash functions are deterministic. If they are different, they still could possibly have the same hash (called a collision)
Suppose two files f1 and f2 and suppose SHA1(f1) != SHA1(f2). Are these files identical?
- A: No. As before, SHA1 is deterministic, meaning the same input always gives the same output. So if the outputs are different, its because the inputs are different

Cipher padding

Plaintext must be a multiple of the block length
Idea: Add padding so that plaintext is a multiple of block length
AES is a 16 byte block. For this cipher, plaintext must be a multiple of 16
Padding must be unambiguous: cannot be confused with plaintext and vice versa
PKCS7 is a standardized padding scheme: add n bytes of 0xn.
When using padding you always need to pad—even when the block length is a multiple of 16. In this case, you add an entire block of padding!

Sample Problems

Is 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 a valid AES plaintext block?
- A: No. It is 14 bytes, and AES requires a block length that is a multiple of 16.
What is the PKCS7 padded version of the previous plaintext?
- A: Message is 14 bytes, so add 16-14=2 bytes of padding, in this case, two bytes of 0x02: 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x02

Message Authentication Codes (MACs)

Used for verifying the integrity of data by associating a fixed-length value called a ‘tag’ with a given message. The tag is derived from a message and a secret key.

Tag creation
- Input: message m, secret key k
- Tag t = MAC_k(m)
- Output t
Tag verification
- Input: message m’, secret key k, tag t
- Tag t’ = MAC_k(m’)
- Output t’ == t
Ideal functionality:
- Like a keyed hash function
- Variable length input maps to fixed length output
MACs examined:
- HMAC, a MAC built from a hash function
- GHASH, the MAC portion of AES-GCM
  - Essentially a polynomial evaluated over a Galois field

Authenticated Encryption (AE)

A means of securely packaging a cipher with a MAC under one common interface. Simplifies (i.e., protects developers from themselves) by preventing the plaintext from being returned if the MAC tag was invalid. Uses the encrypt-then-mac strategy.

Authenticated Encryption:
- Accepts: message m, encryption key ke, mac key km
- c = Enc_ke(m)
- t = MAC_km(c)
- Output: (c,t)
Authenticated Decryption:
- Accepts: ciphertext c, tag t, decryption key ke, mac key km
- Check tag: t’ = MAC_km(c)
- If t’ != t
  - Output tag error
- Else:
  - Return Dec_ke(c)

Authenticated encryption function accepts a message, an encryption key, and a MAC key and produces a ciphertext and authenticator tag. Authenticated decryption accepts a ciphertext, an authenticator tag, a decryption key, and a MAC key. If the authenticator tag is valid, the function returns the plaintext, otherwise it returns an error condition.

Modes:
- Encrypt-then-MAC
  - MAC on ciphertext
  - Preferred choice
  - Used in AES/GCM
- MAC-then-Encrypt
  - Append MAC to plaintext before encryption
  - Used in TLS
  - Not optimal (subtle attacks possible in some cases)
- Encrypt-and-MAC
  - MAC plaintext, append to ciphertext
  - Used in SSH
Encryption, MAC keys, and IVs must be independently generated
Provides IND-CCA2 security Using encryption without a MAC is exploitable
- In the CCA2 game, the attacker can use the decryption oracle to indirectly determine m
- In practice a padding oracle attack can be used to bootstrap a decryption oracle to determine m
- A MAC scheme prevents an attacker from being successful, because they cannot produce another valid ciphertext without knowledge of the MAC key.
AE’s examined:
- AES/GCM
  - Encrypt-then-MAC
  - Encryption = AES in CTR mode. MAC = GHASH function, i.e., a MAC function that iteratively multiplies each ciphertext block by the MAC key in GF(2^128)

Sample Problems

Prove AES-CTR is not IND-CCA2 secure. Argue why AES-CTR becomes IND-CCA2 secure when the ciphertext is MAC’d.