Other issues in this category (26)
A stack of New Year’s news posts about newly found vulnerabilities affecting all kinds of CPUs led to general panic. Programmers all over the world spent their holidays analysing the vulnerabilities and looking for ways to protect their systems from them. What vulnerabilities are we talking about?
Three were discovered: one of them was called Meltdown (CVE-2017-5754), while the remaining two have collectively been referred to as Spectre (CVE-2017-5753 and CVE-2017-5715). To keep things simple, let's discuss just one of them.
Naturally, users with different privileges can use the same computer. And every user runs applications with various access permissions. Their data is processed by the same CPU(s) in such a way that an application can't access certain information if it is not permitted to do so. Until now, operating system and application vulnerabilities were known to let attackers elevate privileges in order to gain access to private data, but none of the previously discovered flaws pertained to a CPU.
Instead of accessing the physical memory directly, applications operate with virtual address spaces mapped to the physical memory. The ranges of virtual addresses for different applications do not overlap. A piece of hardware called MMU (Memory Management Unit) facilitates control over the address ranges.
It is somewhat similar to virtual machines that each use their own memory which is allocated to them when necessary. The hardware unit ensures that the data used by different processes doesn't overlap. And this appears to be a sound approach, but there’s a catch.
Imagine you're walking down a street. To go from point A to point B, you need to take N steps. So you perform the same sequence of actions, step by step. And one can predict that if you’ve moved from point A1 to point A2, you are very likely to take similar steps from A2 to A3 and from A3 to A4 and so on. But if you could perform all those operations in one step, you would get to point B in an instant!
And this is how modern CPUs work. Processing each application's step takes time. Meanwhile, most programs' actions are meant to be performed in a sequence rather than simultaneously. But modern processors can carry out multiple tasks concurrently. If a program’s code were to be processed step by step, it would take longer and some CPU capacity would remain unused. That's why executable code is divided into parts that are processed simultaneously—this includes code that needs to be executed right now as well as code that is meant to be processed later. In scientific terms, the techniques responsible for this feature are referred to as speculative execution and branch prediction.
Here is an example:
These techniques improve overall system performance, provided other system components, including RAM, are much slower than the CPU. If, in order to execute a certain instruction, the CPU must await data input, instead of standing by, it will execute the code that follows the instruction in accordance with its predictions about the data it is about to receive.
And this is when the first alarm bell goes off. Because the CPU has no clue as to what data the process will be allowed to access for the future step to be executed, it will assume that any and all data will be accessible—including information the process is not supposed to access. Here is another simple example: A program declares a 200-element data array and takes a long time performing similar actions with its elements. The CPU says: “Aha!” and starts performing the actions beforehand. And then the program code that needs to be executed requests access to data outside the array (just in case, in some programming languages, overrunning the structural boundaries of data can be regarded as a legitimate operation). And the CPU will eagerly request data the program is not supposed to use.
What will happen if the variable X is located not just outside the data array but even outside the virtual address range allocated to the process? The CPU will execute the instruction anyway. This happens because the MMU, which is responsible for determining whether a process is allowed to access address X, also needs time to do its job. That's why in the case of speculative execution, the MMU is treated in exactly the same way as external buses—execution commences before the MMU replies as to whether or not the code can be executed.
Should the MMU report that the code can't be executed, the result of the execution will merely be discarded.
But the prediction turned out to be false, and the actual data input for the executed instruction is not what was expected. No problem; the execution output is discarded and the instruction is executed again. The data the program is not supposed to use is not passed to the process. And eerything would be perfectly secure if it weren't for another performance optimisation technique—the cache.
All processed data ends up in the cache. And the cache doesn't discriminate between the different chunks of data it stores. It doesn't care how and why the data has been received, whether it is speculative execution output or the instructions were executed in sequence, or whether it is correct or not. Since the data has been received, it is being stored.
This means that inappropriate output persists in the cache even after it has been determined that an illegal address was used. And an attacker would need to get hold of the data from the cache. But the problem is that a program can't access the cache directly. But the response time can help determine whether or not the CPU accessed the memory area beforehand. If the response time is shorter than average, it probably did.
Programs operate with data and addresses. Addresses point to data locations. If data has been accessed before, finding it again will take less time.
Let's assume that a process can access addresses within the range 0...9,999, while the kernel's memory space is located within the range 10,000...20,000; in our case it does not matter whether the numbers represent actual address ranges. So we arrange our code in such a way that a certain instruction will be processed in speculative execution mode (e.g., one thousandth iteration of one loop while 999 previous iterations were executed without errors), and the instruction will require access to data whose address will match the value being stored at address 15,000.
- As part of speculative execution, the CPU will retrieve data at address 15,000. Let's assume the value stored is 98 (performed in advance, the instruction requests data the process is not supposed to access; let’s remember that the process can access addresses within the range 0…9,999).
- The CPU reads the data at address 98 (the value located at address 15,000).
- The MMU replies that address 15,000 is invalid (remember: the instruction is executed before the unit determines whether the data is accessible).
- The CPU flushes the pipeline, and instead of the value at address 98 gives us an error.
- Our application starts reading data at addresses from 0 and higher within its allocated memory space and measures the time it takes to access each address. Addresses aren't accessed in sequence to make sure that it doesn't trigger speculative execution.
- And suddenly it turns out that it’s several times quicker to access address 98 than it is to read the data at other addresses.
That’s how we know someone has recently read data at this address, and because of that the data has been cached. Who could have done that? Yes, that's right, it was our beloved CPU. Value 98 is stored at address 15,000.
This is how we can read the kernel’s entire memory—the memory to which the entire physical memory is mapped in modern operating systems.
Which CPUs are affected by Meltdown?
At least all Intel Core CPUs as well as Xeon, Celeron and Core-series Pentium CPUs.
ARM Cortex-A75 chips are also vulnerable, but those aren't yet featured in any consumer devices. Meanwhile Qualcomm Snapdragon 845, featuring Kryo 385 and Cortex-A75 and Cortex-A53, was only announced a month ago. Kryo 385 is likely to be vulnerable to Meltdown too.
According to Apple, all devices running iOS are affected. Apparently, this doesn't include all the CPUs ever used in iPhones and iPads (for example, iPhone 4 uses Cortex-A8), but ARM chips in modern iPhones and iPads can be regarded as vulnerable.
The situation sounds really scary, but a few things are worth considering.
- The vulnerabilities discovered don't let attackers gain remote access to a device. To exploit the vulnerability, the malware must first be downloaded into the system. And that means that anti-viruses and policies preventing users from installing applications are our best friends when it comes to protecting our CPUs.
Meltdown patches change the kernel's memory address range, and, thus, memory management is facilitated not only by the hardware but also by dividing control over how addresses are accessed. Because the control is facilitated by software, applications that use system calls suffer performance losses. But it's not as bad as it sounds. Ordinary applications do not use system routines. The information we've gathered so far indicates that games remain mostly unaffected, so fear not and install the patch. Meanwhile, servers can be isolated to maintain security. If no new files appear on a server, the vulnerabilities can’t be exploited. But it is still recommended that you install the patch.
Indeed, artificial benchmarks can show up to a 30% drop in performance, but in real life performance issues won't be that significant even if system calls are used extensively. You will also need to factor in network latency and other factors affecting overall system operation. The portion of servers in large-scale data centers (such as GCP, AWS and FB data centers) that are vulnerable is not very substantial. As a result, the overall decline in performance will be less than 10%. Customers who have large quantities of servers suffering performance degradation will be affected more than others, but I don't believe that their number will be significant.