Side-Channel Attacks and Chipwhisperer

A side-channel attack is any attack that is based on gaining information from the physical system that the process is being implemented on. Rather than exploit weaknesses in the algorithm itself or peculiarities in the code, like cryptanalysis and software bugs, these attacks use timing information, power consumption, electromagnetic leaks, or even sounds from the device as sources of information, which can then be exploited.

Some side-channel attacks require technical knowledge of the internal operation of the system. Others, such as differential power analysis, are effective as black-box attacks, meaning that the internal processes do not need to be well known or understood by the attacker. The rise of Web 2.0 applications and software-as-a-service has also raised the possibility of side-channel attacks on the web, even when transmissions between a web browser and server are encrypted (e.g. through HTTPS or WiFi encryption).

Attempts to break a cryptosystem by deceiving or coercing people with legitimate access are not side-channel attacks, these are usually social engineering attacks.

History

Though many people believed that side-channel attacks began a few years into the cold war era (1947 – 1991), a declassified NSA document shows that for the U.S. they began in 1943, during WWII. At the time Bell Labs was working on the Bell Telephone model 131-B2, a top secret encrypted teletype terminal used by the U.S. to transmit wartime communications. An engineer noticed that a freestanding oscilloscope on the other side of the lab would spike at the same time that the teletype encrypted a letter. It was then discovered that the spikes could be decrypted into the entire plain text message. Bell Telephone told the (U.S. Army) Signal Corps, who were using the teletype, about the vulnerability, but they were met with skepticism and disbelief it could be exploited under practical field conditions, and were told to prove it. So Bell engineers were placed in a building across the street (approx. 80 feet away) from the Signal Corps cryptocenter. The engineers recorded signals from the center for an hour. Three or four hours later, they produced 75% of the plain text that had been processed by the center during the recorded time. [1]

This information seems to have then been ‘forgotten’ by the U.S government until 1951. Other countries such as Russia however, appear to have also discovered similar vulnerabilities and are believed to have begun implementing protections against this technique by the beginning of the cold war.

Since then various agencies around the world have been documented using similar techniques:

  • In 1951, the CIA told the newly formed NSA that they had tested the Bell teletype machines and found they could read plain text from a quarter mile (400m) down the signal line.
  • In 1953, the Soviets planted not just the famous 40 microphones in the U.S.’s Moscow embassy, but also seeded mesh antenna into the concrete ceiling. The theorized purpose of which being to detect leaked energy pulses. This was not discovered by the U.S. until 1964. [2]
  • In the 1960s according to a former MI5 officer, the British Security Service analyzed emissions from French cipher equipment. [3]
  • In 1962, the Japanese, according to NSA documents, attempted to aim an antenna placed on top of a hospital at a U.S. crypto center. [1]
  • In the 1980s, the KGB was suspected of having planted microphones inside IBM Selectric typewriters to monitor the electrical noise generated as the type ball rotated and pitched to strike the paper; the characteristics of which could reveal which key was pressed. [4]
  • In 1985, computer researcher Wim van Eck published a paper explaining how cheap equipment could be used to pick up and redisplay information from a computer monitor, hundreds of meters away. This technique has since been coined as “Van Eck Phreaking

Since 1951, the U.S. has developed and refined the science of tuning into radio waves leaking from electronic equipment. The NSA also issued strict standards for shielding sensitive buildings and equipment. Those rules are now known to government agencies and defense contractors as TEMPEST, and they apply to everything from computer monitors to encrypted cell phones that handle classified information. The paper titled “TEMPEST: A Signal Problem” detailing the fundamental security vulnerability called “compromising emanations”, was written in 1972 and published in the NSA’s secret in-house journal “Cryptologic Spectrum”. The document was not declassified until 2008. The NSA did not declassify the entire paper however, leaving the description of two separate, but apparently related, types of attacks called “Flooding” and “Seismic”, redacted.

Classes of Side-Channel Attacks

In all cases, the underlying principle is that physical effects caused by the operation of a system can provide useful information about secrets in the system.

  • Cache attacks: based on an attacker’s ability to monitor cache accesses made by the victim in a shared physical system. Such as a virtualized environment or a type of cloud service.
  • Timing attacks: based on measuring how much time various computations take to perform.
  • Power-monitoring attacks: analyze the variance in power consumption by the hardware during computation.
  • Electromagnetic attacks: based on leaked electromagnetic radiation, which can directly reveal plaintexts and other information. These measurements can be used to infer cryptographic keys or they can be used in non-cryptographic attacks.
  • Acoustic cryptanalysis: attacks that exploit sound produced during a computation (similar to power analysis).
  • Fault analysis: secrets can be discovered by introducing faults in a computation.
  • Data remanence: in which sensitive data are read after supposedly having been deleted. (i.e. Cold boot attack)
  • Optical attacks: sensitive data can be read by visual recording using a high resolution camera or other devices that have such capabilities.

Cache Attacks

Cache attacks are based on having access to the cache behavior. In some cases this is paired with timing attacks because memory accesses are not always performed in constant time, and they can also be paired with power analysis as the use of cache memory has an impact on the power consumption. Usually the data contained within the cache cannot be seen directly and instead the areas of memory being accessed is all that the attacker can see. However there have been vulnerabilities that allowed an attacker to leak the memory contents of other processes and the operating system itself, like the meltdown and spectre vulnerabilities. [5]

An example of a typical cache attack is a flush + reload attack in which the attacker can flush a cached memory location, then wait and re-access the location. If the victim has accessed the location the re-access time for the attacker is fast, if the victim has not it is slow.

Timing Attacks

Timing attacks exploit the data-dependent behavioral characteristics of the implementation of an algorithm. Every logical operation performed by a computer takes some amount of time to execute, and this can vary based on the data being operated on, as well. Using precise measurements of the time required by each operation, an attacker can work backwards to recover the input. The attacker can get timing information by measuring something like the time it takes for the system to respond to specific queries. How much useful information about the system that the attacker can get will depend on the accuracy of the measurements, the design of the system, the CPU running the design, the algorithms used, whether any countermeasures have been implemented, and other implementation details. Due to the nature of these types of attacks they are usually only seen in research conditions and rarely (if ever) ‘in the wild’. As such avoidance of timing attacks is rarely considered when designing systems. Though timing attacks can still be important because they can be applied remotely through a network, while other side-channel leakages require the attacker to have gained some physical contact with the target. [6]

For Example:

In 2003, two Stanford University professors demonstrated a practical network-based timing attack on SSL-enabled web servers, based on another vulnerability having to do with the use of RSA with Chinese remainder theorem optimizations. The network distance was small in their experiments, but the attack successfully recovered a server private key within a matter of hours. This demonstration led to the widespread deployment and use of blinding techniques, which are mentioned in the countermeasures section, in SSL implementations. In this context, blinding is intended to remove correlations between the key and encryption time. [7]

Power-Monitoring Attacks

These types of attacks can provide detailed information about processes being run by observing the power consumption of a hardware device such as a CPU or cryptographic circuit. In a microprocessor when different instructions are performed the power consumption of the processor changes, creating a unique pattern of power profiles that reveal the operations that have been performed. As an example, in a power trace from a smart card (chip card) using DES encryption, the sixteen rounds of encryption can be clearly seen. Similarly, squaring and multiplication operations in RSA implementations can often be distinguished, enabling an attacker to compute the secret key. Even if the magnitude of the variations in power consumption are small, standard digital oscilloscopes can easily show the data-induced variations. Frequency filters and averaging functions, which are built into oscilloscopes, are often used to filter out high-frequency components. These attacks are roughly categorized into simple power analysis (SPA) and differential power analysis (DPA)

SPA involves visual examination of power traces usually collected by monitoring the current used by a device over time. Variations in power consumption occur as the device performs different operations.

DPA is a more advanced form of power analysis, which can allow an attacker to compute the intermediate values within cryptographic computations through statistical analysis of data collected from multiple cryptographic operations.

Power analysis attacks cannot generally be detected by a device, since the attacker’s monitoring is normally passive. In addition, the attack is non-invasive. As a result, physical enclosures, auditing capabilities, and attack detectors are ineffective. Instead, engineers must ensure that devices’ power variations do not reveal information usable by attackers.

Further detail will be provided in the section about the chipwhisperer tool.

Electromagnetic Attacks

These types of attacks require measuring the amount of electromagnetic radiation being emitted from a device and then performing signal analysis on those measurements. These attacks are both non-invasive and passive in nature, as they are performed by monitoring the normal behavior of the target. The techniques mentioned in the ‘History’ section of this document are types of electromagnetic attacks, and they are sometimes referred to as TEMPEST, Van Eck Phreaking, or radiation monitoring attacks. Every wire that carries current creates a magnetic field because of Ampere’s Law, and so electronic devices create some small magnetic fields when in use. The strength of these magnetic fields can unintentionally reveal information about the operations being performed on a device or the data being used by it. The term ‘device’ can refer to anything from a desktop computer, to mobile phone, to a smart card. Like power monitoring attacks these too can be broken up into simple analysis, where the encryption key or other information is deduced directly from the trace. Or differential analysis, which uses comparisons to deduce the information.

For Example:

Smart credit cards (chip cards) contain embedded integrated circuits the perform cryptographic functions when connected to a card reader that provides the power to the circuit and the clock. Tampering with the card reader or placing a false one over the real one has been proven an effective method of obtaining the information on the card. [8] By using an external USB sound card and an induction coil salvaged from a wireless charging pad, researchers were able to extract a user’s signing key in Android’s OpenSSL from a cellphone. [9] LCD screens are suceptable to Van Eck Phreaking because the amount of electricity passing through the crystals controls the colour that they display, when this happens electromagnetic radiation is released, and with an antenna and decoder you can create a replica of what someone has on their screen.

Acoustic Cryptanalysis

This type of attack exploits the sounds emitted by devices such as keyboards and internal computer components. In the past this has also included things like impact printers and electromechanical deciphering machines like enigma or bombe, though many different devices have been susceptible to this type of attack. Modern computers are often quite quiet to the human ear, but the microphone of a nearby cellphone can still pickup noise made by the processor. Some computer components make enough noise that they have even been used to make music, for instance when the Mars Curiosity Rover sang itself ‘Happy Birthday’ by running specific operations of it’s soil sampling instruments, or a youtuber who has recreated various songs from pop culture. Processors can create acoustic emissions when under stress. The power consumption of devices cause heating, which is then offset by cooling effects. These temperature changes create thermally induced mechanical stress, which can create low level acoustic emissions from operating CPUs (about 10 kHz in some cases). These emissions are not the human-audible humming of a cooling fan, but ultrasonic noise from the capacitors and inductors in the motherboards. Acoustic emissions occur in coils and capacitors because of small movements in the components when a current surge passes through them. Capacitors in particular change diameter slightly as their many layers experience electrostatic attraction/repulsion or piezoelectric effects.

For Example:

In 2004, researchers at the IBM Almaden Research Center announced that computer keyboards and keypads used on telephones and ATMs are vulnerable to acoustic attacks because of the sounds produced by different keys. The researchers employed a neural network to recognize which key had been pressed, and then they could record the information entered by users. These techniques could allow an attacker using covert listening devices to obtain passwords, passphrases, PINs, and other information entered via keyboards. [10] In 2013 Israeli researchers successfully implemented a timing attack based on acoustic emissions, on a laptop running an RSA implementation, using a mobile phone located close to the laptop. [11]

Fault analysis

The principle of this attack is to induce faults into a device’s operation, with the goal of revealing to the attacker the internal states. By inducing these faults the attacker is hoping that the processor may begin to output incorrect results, potentially due to physical data corruption, which could help them to deduce the instructions that the processor is running. Some examples of environmental conditions that can influence the operation of a CPU and cause these types of faults are high temperatures, unsupported supply voltages or currents, high overclocking, strong electromagnetic fields, or even radiation. Another way of introducing a fault is with a clock-glitch. A device with an external clock that can be manipulated could be given shorter clock pulses that cause the processor to execute the next instruction before the previous one was finished or to store invalid data in a memory location. [12]

Most of the countermeasures that have been proposed against this type of attack are error detection. For instance processors that limit or halt computations above certain detected temperatures.

For Example:

In 2002 it was shown that in implementations of DES and Triple DES, the last DES round key can be found after running less than 200 cipher texts by assuming that one bit of data in one of the 16 rounds is flipped with a uniform probability. [13] A similar attack was performed on the RC5 cipher by introducing register faults that affected the current round and then compared the faulty result with the correct one to obtain the round key. They found the round key of the final round first, then the round key of the second last round, and so on. [13]

Data Remanence

Data remanence is the residual representation of digital data that can remain even after attempts have been made to remove or erase said data. This may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment (e.g. thrown in the trash or lost), or if someone were to gain access to the sections where the data remnants are located.

Various techniques have been developed to counter data remanence. These techniques are classified as clearing, purging/sanitizing, or destruction. Specific methods include overwriting, degaussing, encryption, and media destruction. However, effective application of countermeasures can be complicated by several factors, including inaccessible media, media that cannot effectively be erased, advanced storage systems that maintain histories of data throughout the data’s life cycle, and persistence of data in memory that is typically considered volatile.

There are many tools that have been developed to recover data deleted by conventional means. The purpose of many of these is to recover accidentally deleted data, or return the data to a previous state after being corrupted.

Optical Attacks

The threat of optical attacks has increased recently due to the widespread use of surveillance cameras that often have little security. These could be used to try and see what’s on a computers monitor, but the LEDs on a computer can also tell a lot about what the device is doing, especially because they are usually easier to read than the monitor itself. There are three classes of LED used in today’s computer equipment. [14]

  • Unmodulated LEDs that indicate device state, such as power on.
  • Time modulated LEDs that indicate device activity levels.
  • Modulated LEDs that indicate the content of the data being processed.

This also opens a channel for exfiltrating data. That is, someone gains nominal access to the computer but is able to change the operation of an LED to transmit specific data. Storage drives have microprocessors embedded in their controllers, making them hackable, and their activity LEDs have demonstrated transmission speeds up to 4kbps using surveillance cameras as optical receivers. This is fast enough to handle encryption keys, keystroke logging, and text and binary files. Since drive lights flicker during normal operation and the human eye can only observe flickering up to about 60 Hz, users are unlikely to notice any additional flickering during data transmission. [15]

Chipwhisperer

ChipWhisperer is an open source tool for exploring how side-channel power analysis and glitching attacks work against a target device. It is designed for use against what is referred to as a target board, meaning a processor that can be programmed to perform some kind of ‘secure operation’. As a general research tool chipwhisperer was designed for breaking AES-128 encryption with power analysis, but it’s functions have since expanded to include other types of side-channel attacks like glitching. There are other similar types of projects available that can perform attacks like this, Side Channel Marvels has many tools, OpenSCA is a MATLAB tool-box, DPAContest contains many examples, and attackers can develop their own custom boards and software for these types of attacks.

AES

Advanced Encryption Standard (AES) also known as Rijndael was adopted in 2002 by the U.S. National Institute of Standards and Technology (NIST) after a 1997 competition for a more secure block cipher than the standard of the time. The cipher always uses 128 bit plaintext block sizes, but allows for three different key lengths: 128, 192, and 256 bits. The process of AES encryption involves multiple cycles of the same operations called rounds. The operations are ByteSub transformation, ShiftRow transformation, MixColumn, and Add Roundkey (ARK). It does these operations in order 10 times. There is a unique key for each round that is also used, and each roundkey is derived from the original 128 bit master key. Importantly even with this shortest key length, a computer operating at 10 GHz would take 7.97*1012 billion years.

Block Diagram of AES encryption

Implementations of algorithms such as AES and triple DES that are believed to be mathematically strong may be trivially breakable using power analysis attacks.

Power Analysis

To see how ChipWhisperer can perform a power analysis attack we’ll go through the steps of the attack. More information about correlation power analysis can be found at the link, and there is also a tutorial for performing the attack with ChipWhisperer on a target device.

Basic Ideas

Power analysis takes advantage of the fact that changing the state of a digital line requires a small amount of power. To turn 0 into 1 a charge needs to be moved. Digital integrated circuits also pre-charge lines between transitions in an effort to reduce delays. So every clock cycle the data bus goes from an intermediate state to a final state, the final state being when the data can be trusted. This means that you can infer the hamming weight (number of one’s) on a bus based on the power consumption.

Digital bus

For a simple example we can look at a system that performs XOR on some input data with a secret key and then keeps the result secret.

Diagram of system

Now we have control over the input data and can observe the power consumption. In the diagram you can see that we associate the power observations with each operation that was run. The diagram shows the known input XOR’d with the same unknown value each time to create an unknown output, and each of these operations have observations of the hamming weight associated with them.

What we have

Since the observations consist of the number of one’s that were required we can then guess what the secret key was and compare that guess to the real measurements, to figure out the real secret key.

From guessing

This guess and check works so well on complex encryption because it allows the attacker to break each byte of the encryption key separately. The AES-128 master key has 16 bytes, bytes are 8 bits, which are 256 possible values. The attacker needs to only make 256 guesses for each byte, or 256 * 16 = 4096 guesses. Unlike brute forcing the key which would take 2128 = 3.40*1038 guesses. This attack can also be done if the input is unknown and instead the output is known.

Correlation Power Analysis (CPA)

Following the above example a full correlation power analysis (CPA) attack can be used to find the secret key stored within a device. The CPA attack has 4 main steps.

  1. Write out a model for the target’s power consumption. The model needs to look at one specific point in the encryption algorithm, for example after step 2 of the encryption process. The result of the encryption at this intermediate step is x and so the power consumption is some function f(x).
  2. Have the target encrypt several different plaintext messages and record a trace of the power consumption during each encryption.
  3. Attack the small parts (sub-keys) of the secret key
    1. You need to consider every possible option for the sub-key values, then for each guess and plaintext create the intermediate value and use the model to calculate the estimate of the power consumption trace.
    2. Calculate the Pearson correlation coefficient between the modeled power trace and the actual recorded power consumption of the target for that plaintext.
    3. Determine which subkey guess creates the modeled power consumption traces that best match the measured power consumption.
  4. Put together the most closely correlated subkey guesses to obtain the full secret key.
Modeling the Power Consumption

Electronic Computers such as microcontrollers have two components to their power consumption.

  1. Static Power: the power required to keep the device running. This depends on the number of transistors inside the device and other factors, but is fairly constant.
  2. Dynamic Power: the power that is used depending on the data moving inside of the device. This depends on how many one’s or zero’s are required, and changes based on the instructions being executed and data being processed.

The simplest model for power consumption is the Hamming Distance model. The Hamming distance of two binary numbers is the number of different bits between the two numbers. Here the 4th, 7th, and 8th bits are different, so the hamming distance is 3.

HammingDistance(00110000, 00100011) = 3

The hamming distance relates to the previously mentioned hamming weight using the following relation where ^ is the xor operation.

HammingDistance(x, y) = HammingWeight(x ^ y)

This works because the hamming weight is the number of 1’s in a binary number, or the hamming distance between a number and the binary number 0.

The hamming distance model can be used for CPA attacks. If there is a point in the encryption algorithm where the target changes a variable from x to y then the power consumption is proportional to HammingDistance(x, y). Which is what will be shown in the AES example.

Pearson’s Correlation coefficient

With a model created there needs to be a way to compare the estimated power to the measured power traces. Pearson’s correlation coefficient is a measure of the linear correlation between two variables. The formula outputs a value between -1 and +1, +1 is total positive linear correlation, 0 is no linear correlation, and -1 is total negative correlation. The formula uses n as the number of pairs of traces, and it is important to remember that the each measured trace is compared to the paired estimated trace for every subkey guess. So the sums are over all of the measured traces and all of the estimated traces for a single guess.

Pearson Formula

This means that:

  • If Y always increases when X increases, correlation will be 1
  • If Y always decreases when X increases, correlation will be -1
  • If Y is totally independent of X, correlation will be 0.

This equation is used to pick out patterns from noisy signals. In this attack we’re looking for a our model calculated pattern in measured power traces, which are noisy signals.

Picking a Subkey Guess

The values of r that were calculated previously then are used to decide which subkey creates the power estimate that best matches the measured traces. There are two steps.

  1. For each subkey i, find the highest value of ri,j where j is a point in the trace. This means that we want the highest match, regardless of where in the trace the match occurs.
  2. For each of these values find the highest value of ri . Which will be the subkey that correlated more closely than any other guess.
AES-128 example

The psuedo code for an AES-128 algorithm is:

// AES-128 Cipher
// in:  128 bits (plaintext)
// out: 128 bits (ciphertext)
// w:   44 words, 32 bits each (expanded key)
state = in

AddRoundKey(state, w[0, Nb-1])  

for round = 1 step 1 to Nr–1
    SubBytes(state)           // <-- Attack this point in round 1
    ShiftRows(state)
    MixColumns(state)
    AddRoundKey(state, w[round*Nb, (round+1)*Nb-1])
end for

SubBytes(state)
ShiftRows(state)
AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1])

out = state

Since we only need to worry about one point in the algorithm, we’ll choose the point after the first SubBytes() step. After this step the state of the encryption is basically:

// State after AddRoundKey() and SubBytes()
state = sbox[in ^ key]

In AES the SubBytes() step uses a lookup table called an s-box. This is a good point to choose because we know that the power consumption here will be

hd,i = HammingWeight( sbox[pd ^ i])

Where:

  • pd is the plaintext byte
  • i is the Subkey
  • d denotes which of the set of plaintext inputs
  • ^ is the xor operation

The rest of the attack follows the steps laid out above in the example that follows.

Breaking AES (Manual CPA Attack)

To perform the attack four things need to be done

  1. Reading the data, which consists of the analog waveform (trace) and input text sent to the encryption core
  2. Making the power leakage model, where it takes a known input text along with a guess of the key byte
  3. Implementing the correlation equation, and then looping through all the traces
  4. Ranking the output of the correlation equation to determine the most likely key
Collecting Traces

The exact details of this will vary depending on the target board that is being used. First the power rail needs to be found and connected to the observation device, in this case a chipwhisperer board. The board also needs to be connected to a computer running the software that controls it. Then as long as all of the components have been detected by the software and the target board is running you can set the number of traces that you want to capture, 50 traces can often be enough for AES targets.

Chipwhisperer has python integrated into the API, so much of the setup process can be scripted. Setting up the connection is just a matter of connecting everything and running it. Each trace will look something like graph on the right of the image.

Power Traces

Creating the Guesses

Chipwhisperer supports python so the code here will be python

#Set 16 to something lower (like 1) to only go through a single subkey
for binaryNumber in range(0, 16):
    cpaOutput = [0]*256
    for keyGuess in range(0, 256):
        print "Subkey %d, hyp = %02x"%(binaryNumber, keyGuess)

This code goes through and creates the guesses for the subkeys. Once we have a guess we need to calculate the xor of the guessed key and the plaintext byte, then calculate the intermediate value so that we can create the estimated power trace for the guessed subkey. The diagram shows the point in the AES round 1 that we wish to create the intermediate value.

AES sub Bytes

After the xor operation the s-box output value is simple as AES always uses the same s-box so we use our value to select the value from the substitute box to receive the intermediate value. Then we need to take the intermediate value and get the hamming weight. The full code is below.

import numpy as np

HW = [bin(n).count("1") for n in range(0,256)]

sbox=(
0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76,
0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0,
0xb7,0xfd,0x93,0x26,0x36,0x3f,0xf7,0xcc,0x34,0xa5,0xe5,0xf1,0x71,0xd8,0x31,0x15,
0x04,0xc7,0x23,0xc3,0x18,0x96,0x05,0x9a,0x07,0x12,0x80,0xe2,0xeb,0x27,0xb2,0x75,
0x09,0x83,0x2c,0x1a,0x1b,0x6e,0x5a,0xa0,0x52,0x3b,0xd6,0xb3,0x29,0xe3,0x2f,0x84,
0x53,0xd1,0x00,0xed,0x20,0xfc,0xb1,0x5b,0x6a,0xcb,0xbe,0x39,0x4a,0x4c,0x58,0xcf,
0xd0,0xef,0xaa,0xfb,0x43,0x4d,0x33,0x85,0x45,0xf9,0x02,0x7f,0x50,0x3c,0x9f,0xa8,
0x51,0xa3,0x40,0x8f,0x92,0x9d,0x38,0xf5,0xbc,0xb6,0xda,0x21,0x10,0xff,0xf3,0xd2,
0xcd,0x0c,0x13,0xec,0x5f,0x97,0x44,0x17,0xc4,0xa7,0x7e,0x3d,0x64,0x5d,0x19,0x73,
0x60,0x81,0x4f,0xdc,0x22,0x2a,0x90,0x88,0x46,0xee,0xb8,0x14,0xde,0x5e,0x0b,0xdb,
0xe0,0x32,0x3a,0x0a,0x49,0x06,0x24,0x5c,0xc2,0xd3,0xac,0x62,0x91,0x95,0xe4,0x79,
0xe7,0xc8,0x37,0x6d,0x8d,0xd5,0x4e,0xa9,0x6c,0x56,0xf4,0xea,0x65,0x7a,0xae,0x08,
0xba,0x78,0x25,0x2e,0x1c,0xa6,0xb4,0xc6,0xe8,0xdd,0x74,0x1f,0x4b,0xbd,0x8b,0x8a,
0x70,0x3e,0xb5,0x66,0x48,0x03,0xf6,0x0e,0x61,0x35,0x57,0xb9,0x86,0xc1,0x1d,0x9e,
0xe1,0xf8,0x98,0x11,0x69,0xd9,0x8e,0x94,0x9b,0x1e,0x87,0xe9,0xce,0x55,0x28,0xdf,
0x8c,0xa1,0x89,0x0d,0xbf,0xe6,0x42,0x68,0x41,0x99,0x2d,0x0f,0xb0,0x54,0xbb,0x16)

def intermediate(pt, keyguess):
    return sbox[pt ^ keyguess]

traces = np.load(r'#location of trace data.npy')
pt = np.load(r'#location of plaintext data.npy')

numtraces = np.shape(traces)[0]
numpoint = np.shape(traces)[1]

#Use less than the maximum traces by setting numtraces to something
#numtraces = 15

for bnum in range(0, 16):
    cpaoutput = [0]*256
    for kguess in range(0, 256):
        print "Subkey %d, hyp = %02x"%(bnum, kguess)

        for tnum in range(0, numtraces):
            hypint = HW[intermediate(pt[tnum][bnum], kguess)]
Performing the Correlation

Another way of writing out the equation for the correlation coefficient is

Pearson Equation

Equation Variable Python Variable
d tnum
i kguess
j j index trace point, e.g.: traces[tnum][j]
h hypint
t traces

The equation has three sums all of which are done over all traces, so we can calculate these first in the code and initialize them to zero so we can use loops to fill the sum. We also need to save the hypothesis values with the current subkey guess and associated plaintext. The equation also requires the average of the hypothesis for that plaintext, and the mean for the traces. Some other common terms in the equation can be calculated together and then we can add to the python.

bestguess = [0]*16
#Set 16 to something lower (like 1) to only go through a single subkey
for bnum in range(0, 16):
    cpaoutput = [0]*256
    maxcpa = [0]*256
    for kguess in range(0, 256):
        print "Subkey %2d, hyp = %02x: "%(bnum, kguess),


        #Initialize arrays & variables to zero
        sumnum = np.zeros(numpoint)
        sumden1 = np.zeros(numpoint)
        sumden2 = np.zeros(numpoint)

        hyp = np.zeros(numtraces)
        for tnum in range(0, numtraces):
            hyp[tnum] = HW[intermediate(pt[tnum][bnum], kguess)]


        #Mean of hypothesis
        meanh = np.mean(hyp, dtype=np.float64)

        #Mean of all points in trace
        meant = np.mean(traces, axis=0, dtype=np.float64)

        #For each trace, do the following
        for tnum in range(0, numtraces):
            hdiff = (hyp[tnum] - meanh)
            tdiff = traces[tnum,:] - meant

            sumnum = sumnum + (hdiff*tdiff)
            sumden1 = sumden1 + hdiff*hdiff
            sumden2 = sumden2 + tdiff*tdiff

        cpaoutput[kguess] = sumnum / np.sqrt( sumden1 * sumden2 )
        maxcpa[kguess] = max(abs(cpaoutput[kguess]))

        print maxcpa[kguess]

    #Find maximum value of key
    bestguess[bnum] = np.argmax(maxcpa)

print "Best Key Guess: "
for b in bestguess: print "%02x "%b,

The maxcpa value is stored as an absolute value, since we may end up with positive or negative correlation, but we only care about absolute value. We also store only the maximum cpa across all points in the trace. Typically only a few points in the trace are correlating, and it’s the maximum across the entire trace we are concerned with. This is done with this line of the code:

maxcpa[kguess] = max(abs(cpaoutput[kguess]))

The argmax() function finds the maximum for all the subkey candidates, though it gives the index, since the candidates are {0,1,2,…,255} they are the same as their index.

So we have the key guesses ranked by their correlation to the true key, and thus have discovered the true key.

Glitching

Chipwhisperer can be used to introduce glitches into an embedded system that could reveal information about the system. The basic idea of clock glitching will be reviewed here.

Glitching takes advantage of points in the code in which a comparison between an input and the desired value will be made. An pseudocode example would be:

if(authok(inputpassword)) {
  send_loginPrompt();
}
else {
  send_badPassword();
}

It is possible to manipulate the system so that this check fails or instructions are skipped. If the attacker can gain access to the clock then they can manipulate the clock, as was done in the image below, where the signal rose, then fell quickly before rising again. The double-edge will cause timing errors in the target device. How the device will behave after receiving this signal will vary. The two common results are an instruction skip or the wrong result of a comparison being loaded.

Clock Glitch

If the attacker can’t access the clock, they may be able to access the input power of the device (VCC). A ‘power glitch’ is when a low-voltage spike is introduced into the VCC line of the target. This has been shown to work on raspberry pi computers and android smartphones. That type of glitch can look something like this.

Power Glitch

Countermeasures

Side-Channel attacks rely on a relationship between the emitted or leaked information and the secret data. So the counter method itself depends on the side-channel being used and the information that can be accessed through it. There are two main categories of countermeasures:

  1. Eliminating or reducing the amount of information that is being released, or
  2. Eliminating the relationship between the leaked information and the secret data.

All countermeasures have drawbacks that can affect the operation of the system, usually by slowing it down, as many require adding calculations or further processing to the system. So designers need to make compromises when considering the types of countermeasures to implement.

Countermeasures Method 1

Depending on the type of side channel attack that the device needs to be protected against there are different methods that can be used to reduce the amount of information that is leaked. For example:

  • Faraday Cage based shielding can be used to prevent electromagnetic emissions. This would reduce the susceptibility to electromagnetic attacks.
  • Power line conditioning and filtering can deter power-monitoring attacks. Though some correlation can remain and compromise security.
  • The design of the physical enclosure can prevent the installation of microphones for acoustic attacks, or other micro-monitoring devices.
  • The side-channel can also have random noise directly added to it. A random delay on computations could deter timing attacks, though most random noise is pseudo-random and so an attacker could still use statistical computations to work around this, though it would require them to obtain more measurements.
  • Designing software to be isochronous, that is, designing every operation to take the same length of time, would completely prevent timing attacks. Though it is difficult to implement.
  • A countermeasure against simple power-analysis attacks is to design the software as “program counter secure”, meaning that the execution path of the program does not depend on any secret values and so there are no differences in power from choosing one branch over another.
  • Using memory in a predictable fixed pattern can prevent cache attacks, as well as avoiding data controlled memory accesses like table lookups as the would reveal which part of the table was accessed and thus information about the data.
  • It is possible to avoid leaking data-dependent power differences by correlating the number of 1 bits in a secret value. By manipulating both the value and it’s complement at the same time the power spikes can be made to be uniform and thus reveal nothing about the data itself. Although unless the balance is perfect correlation will remain.

Countermeasures Method 2

Depending on the type of side channel attack that the device needs to be protected against there are different methods that can be used to remove the relationship between the secret data and the information. For example:

  • [Blinding](https://en.wikipedia.org/wiki/Blinding_(cryptography) is an algorithm that allows the operation to be performed on a randomized version of the data. Which ‘blinds’ the attacker since they have no control or knowledge over this randomized data. Usually it works like this, Alice has an input x and Bob has a function f. Alice wants Bob to compute y = f(x) without him knowing x or y to him. Alice “blinds” the message by encoding it into some other function E(x); the encoding E must be a bijection on the input space of f, ideally a random permutation. Bob returns f(E(x)) to her, to which she applies a decoding D to obtain D(f(E(x))) = y.
  • A countermeasure that is effective against all side-channel attacks is masking. The principle of masking is to avoid manipulating any sensitive value directly, but rather manipulate a sharing of it. A set of variables (called “shares”) that xor’d together equal the original value are used. So an attacker must recover all the values of the shares to get any meaningful information. [16]

References

[1] https://www.wired.com/2008/04/nsa-releases-se/

[2] https://www.nytimes.com/1964/05/20/archives/in-moscow-walls-have-ears40-us-embassy-finds-microphones-after.html

[3] https://www.sans.org/reading-room/whitepapers/privacy/paper/981

[4] http://content.time.com/time/magazine/article/0,9171,964052-2,00.html

[5] https://arstechnica.com/gadgets/2018/01/meltdown-and-spectre-heres-what-intel-apple-microsoft-others-are-doing-about-it/

[6] https://www.bearssl.org/constanttime.html

[7] http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

[8] Quisquater JJ, Samyde D (2001). Electromagnetic analysis (ema): Measures and counter-measures for smart cards. Smart Card Programming and Security.

[9] http://mostconf.org/2012/papers/21.pdf

[10] https://www.berkeley.edu/news/media/releases/2005/09/14_key.shtml

[11] http://cs.tau.ac.il/~tromer/acoustic/

[12] https://www.esat.kuleuven.be/cosic/publications/article-2204.pdf

[13] https://ieeexplore.ieee.org/abstract/document/966796

[14] https://www.securityweek.com/hard-drive-led-allows-data-theft-air-gapped-pcs

[15] https://www.zdnet.com/article/stealing-data-from-air-gapped-computers/

[16] https://www.iacr.org/archive/eurocrypt2013/78810139/78810139.pdf