What is hyper-visor ? What are different hypervisor vulnerabilities?
⃝1 Virtual CPUs: A set of virtual CPUs (vCPUs) is as- signed to each guest VM being hosted by a Hypervisor. The state of each of these vCPUs is saved to and loaded from their respective VM’s Virtual Machine Control Structure (VMCS) guest state area. Since vCPUs must mirror a physical CPU’s actions for each and every machine language instruction, the Hypervisor must handle register states ap- propriately and schedule vCPU tasks to the physical CPUs while making any necessary translations back and forth.
CVE-2010-4525 is an example of a disclosure of Hypervisor memory contents through vCPU registers because of an incomplete initialization of the vCPU data structures, where one of the padding fields was not zeroed-out. Given that the memory for the data structure is allocated in kernel space, the padding field might end up containing information from data structures previously used by the Hypervisor.
⃝2 Symmetric Multiprocessing (SMP): Hypervisors can host guest VMs with SMP capabilities, which leads to the possibility of two or more vCPUs belonging to a single VM being scheduled to the physical CPU cores in parallel. This mode of operation adds complexity to the management of guest VM state and requires additional precautions at the moment of deciding a vCPU’s Current Privilege Level (CPL, e.g., Ring 0 or Ring 3).
SMP vulnerabilities arise from Hypervisor code making assumptions that only hold true on single-threaded processes. For example, CVE-2010-0419 refers to a bug that permitted malicious Ring 3 processes to execute privileged instructions when SMP was enabled because of the presence of a race condition scenario. To do so, they would invoke a legitimate I/O instruction on one thread and attempt to replace it with a privileged one from another thread right after KVM had checked its validity, but before it was executed.
⃝3 Soft MMU: Guest VMs cannot be granted direct access to the MMU, as that would allow them to access mem- ory belonging to the Hypervisor and other co-hosted VMs. Under the absence of a virtualization aware hardware MMU, such as Extended Page Tables (EPT), a Soft MMU is run by the Hypervisor to maintain a shadow page table for each guest VM. Every page mapping modification invoked by a VM is intercepted by the Soft MMU so as to adjust the shadow page tables accordingly.
Vulnerabilities in the Soft MMU’s implementation are dangerous because they may lead to the disclosure of data in arbitrary address spaces, such as a co-hosted guest VM’s memory segment or the Hypervisor’s memory segment. In the specific case of CVE-2010-0298, KVM’s emulator always uses Ring 0 privilege level when accessing a guest VM’s memory on behalf of the guest VM’s code. Given that MMIO instructions are emulated, an unprivileged (Ring 3) application running inside a VM could leverage access to an MMIO region (e.g. framebuffer) to trick KVM into executing a malicious instruction that modifies that same VM’s kernel-space memory.
⃝4 Interrupt and Timer Mechanisms: A Hypervisor must emulate the interrupt and timer mechanisms that the motherboard provides to a physical machine. These include the Programmable Interval Timer (PIT), the Advanced Pro- grammable Interrupt Controller (APIC), and the Interrupt Request (IRQ) mechanisms.
A low-level program is required to provide system resource access to virtual machines, and this
program is referred to as the hypervisor or Virtual Machine Monitor (VMM). A hypervisor running on
bare metal is a Type 1 VM or native VM. Examples of Type 1 Virtual Machine Monitors are
LynxSecure, RTS Hypervisor, Oracle VM, Sun xVM Server, VirtualLogix VLX, VMware ESX and
ESXi, and Wind River VxWorks, among others. The operating system loaded into a virtual machine is
referred to as the guest operating system, and there is no constraint on running the same guest on
multiple VMs on a physical system. Type 1 VMs have no host operating system because they are
installed on a bare system.
Some hypervisors are installed over an operating system and are referred to as Type 2 or hosted VM.
Examples of Type 2 Virtual Machine Monitors are Containers, KVM, Microsoft Hyper V, Parallels
Desktop for Mac, Wind River Simics, VMWare Fusion, Virtual Server 2005 R2, Xen, Windows Virtual
PC, and VMware Workstation 6.0 and Server, among others.
Virtual Machines (VMs) have become commonplace in
modern computing, as they enable the execution of multiple
isolated Operating System instances on a single physical ma-
chine. This increases resource utilization, makes administrative tasks easier, lowers overall power consumption, and en-
ables users to obtain computing resources on demand. Virtualized environments are usually implemented with the use of a Hypervisor, which is a software layer that lies between
the Virtual Machines (VMs) and the physical hardware. The
Hypervisor allocates resources to the VMs, such as main
memory and peripherals. It is in charge of providing each
VM with the illusion of being run on its own hardware, which
is done by exposing a set of virtual hardware devices (e.g.
CPU, Memory, NIC, Storage) whose tasks are then sched-
uled on the actual physical hardware. These services come
at a price: Hypervisors are large pieces of software, with
100,000 lines of code or more.
Three classifications for Hypervisor vulnerabilities based
on
(1) the Hypervisor functionality where the vulnera- bility arises,
(2) the source that triggers such vulnera- bility, and
(3) the target that is affected by the security breach.
(1) the Hypervisor functionality where the vulnera- bility arises,
(2) the source that triggers such vulnera- bility, and
(3) the target that is affected by the security breach.
⃝1 Virtual CPUs: A set of virtual CPUs (vCPUs) is as- signed to each guest VM being hosted by a Hypervisor. The state of each of these vCPUs is saved to and loaded from their respective VM’s Virtual Machine Control Structure (VMCS) guest state area. Since vCPUs must mirror a physical CPU’s actions for each and every machine language instruction, the Hypervisor must handle register states ap- propriately and schedule vCPU tasks to the physical CPUs while making any necessary translations back and forth.
CVE-2010-4525 is an example of a disclosure of Hypervisor memory contents through vCPU registers because of an incomplete initialization of the vCPU data structures, where one of the padding fields was not zeroed-out. Given that the memory for the data structure is allocated in kernel space, the padding field might end up containing information from data structures previously used by the Hypervisor.
⃝2 Symmetric Multiprocessing (SMP): Hypervisors can host guest VMs with SMP capabilities, which leads to the possibility of two or more vCPUs belonging to a single VM being scheduled to the physical CPU cores in parallel. This mode of operation adds complexity to the management of guest VM state and requires additional precautions at the moment of deciding a vCPU’s Current Privilege Level (CPL, e.g., Ring 0 or Ring 3).
SMP vulnerabilities arise from Hypervisor code making assumptions that only hold true on single-threaded processes. For example, CVE-2010-0419 refers to a bug that permitted malicious Ring 3 processes to execute privileged instructions when SMP was enabled because of the presence of a race condition scenario. To do so, they would invoke a legitimate I/O instruction on one thread and attempt to replace it with a privileged one from another thread right after KVM had checked its validity, but before it was executed.
⃝3 Soft MMU: Guest VMs cannot be granted direct access to the MMU, as that would allow them to access mem- ory belonging to the Hypervisor and other co-hosted VMs. Under the absence of a virtualization aware hardware MMU, such as Extended Page Tables (EPT), a Soft MMU is run by the Hypervisor to maintain a shadow page table for each guest VM. Every page mapping modification invoked by a VM is intercepted by the Soft MMU so as to adjust the shadow page tables accordingly.
Vulnerabilities in the Soft MMU’s implementation are dangerous because they may lead to the disclosure of data in arbitrary address spaces, such as a co-hosted guest VM’s memory segment or the Hypervisor’s memory segment. In the specific case of CVE-2010-0298, KVM’s emulator always uses Ring 0 privilege level when accessing a guest VM’s memory on behalf of the guest VM’s code. Given that MMIO instructions are emulated, an unprivileged (Ring 3) application running inside a VM could leverage access to an MMIO region (e.g. framebuffer) to trick KVM into executing a malicious instruction that modifies that same VM’s kernel-space memory.
⃝4 Interrupt and Timer Mechanisms: A Hypervisor must emulate the interrupt and timer mechanisms that the motherboard provides to a physical machine. These include the Programmable Interval Timer (PIT), the Advanced Pro- grammable Interrupt Controller (APIC), and the Interrupt Request (IRQ) mechanisms.
In the case of CVE-2010-0309, lack of validation of the
data contained in the PIT-related data structures enabled a
rogue VM to cause a full host OS crash, a serious denial-of-
service attack.
⃝5 I/O and Networking: The Hypervisor also emulates I/O and networking. Xen and KVM make device emulation possible through division of labor, by having two types of device drivers. Front-end drivers reside inside the guest VMs and run in Ring 0, providing the usual abstraction that the guest OS expects. Nonetheless, those drivers cannot access physical hardware directly, given that the Hypervisor must mediate user accesses to shared resources. Therefore, front- end drivers communicate with back-end drivers, which have full access to the underlying hardware, in order to fulfill the requested operations. In turn, back-end drivers enforce access policies and multiplex the actual devices. KVM and Xen employ QEMU’s back-end drivers by default.
Device emulation is usually implemented in higher-level languages (e.g. C and C++), so the data abstractions are richer but more dangerous when hijacked. Very elaborate attacks are enabled by the expressiveness of higher-level lan- guages like C. For example, CVE-2011-1751 describes a bug that was used to develop the Virtunoid attack QEMU tried to hot-unplug whichever device the programmers desired, regardless of the device’s support for hot-unplugging. Therefore, the lack of state cleanup by some virtual devices resulted in use-after-free opportunities, where data structures that were previously being used by a hot-unplugged virtual device remained in memory and could be hijacked with executable code by an attacker.
⃝6 Paravirtualized I/O: paravirtualized VMs run mod- ified guest kernels that are virtualization-aware and use spe- cial hypercall APIs to interact with the Hypervisor directly. Paravirtualization of I/O operations decreases the number of transitions between the guest VM and the Hypervisor, resulting in performance gains. This scenario requires spe- cial front-end and back-end drivers which are not necessarily developed by the same vendor as the one responsible for reg- ular device emulation (e.g. QEMU).
Paravirtualized I/O vulnerabilities and emulated I/O vul- nerabilities are very much alike. They are rooted in the in- teractions between front-end and back-end drivers, as well as those between back-end drivers and the outside world. For instance, CVE-2008-1943 describes a vulnerability in Xen that allowed paravirtualized front-end drivers to cause denial-of-service conditions and possibly execute arbitrary code with Dom0 privileges. This could be done by sending a malicious shared framebuffer descriptor to trick Xen into allocating an arbitrarily large internal buffer inside Dom0.
⃝7 VM Exits are the mechanism used by the Hypervisor to intercept and carry out operations invoked by guest VMs that require Virtual Machine eXtensions (VMX) root priv- ileges. These VM-to-Hypervisor interfaces are architecture- dependent (e.g. different code for x86 than for AMD64) and are very well specified in the architecture manuals. They are usually implemented using low-level programming lan- guages (Assembly or Machine language), relying on restric- tive bitwise operations. For Intel VT-x, this code is the one supporting all operations described in chapters 23 through 33 of Intel’s Software Developer’s Manual.
⃝5 I/O and Networking: The Hypervisor also emulates I/O and networking. Xen and KVM make device emulation possible through division of labor, by having two types of device drivers. Front-end drivers reside inside the guest VMs and run in Ring 0, providing the usual abstraction that the guest OS expects. Nonetheless, those drivers cannot access physical hardware directly, given that the Hypervisor must mediate user accesses to shared resources. Therefore, front- end drivers communicate with back-end drivers, which have full access to the underlying hardware, in order to fulfill the requested operations. In turn, back-end drivers enforce access policies and multiplex the actual devices. KVM and Xen employ QEMU’s back-end drivers by default.
Device emulation is usually implemented in higher-level languages (e.g. C and C++), so the data abstractions are richer but more dangerous when hijacked. Very elaborate attacks are enabled by the expressiveness of higher-level lan- guages like C. For example, CVE-2011-1751 describes a bug that was used to develop the Virtunoid attack QEMU tried to hot-unplug whichever device the programmers desired, regardless of the device’s support for hot-unplugging. Therefore, the lack of state cleanup by some virtual devices resulted in use-after-free opportunities, where data structures that were previously being used by a hot-unplugged virtual device remained in memory and could be hijacked with executable code by an attacker.
⃝6 Paravirtualized I/O: paravirtualized VMs run mod- ified guest kernels that are virtualization-aware and use spe- cial hypercall APIs to interact with the Hypervisor directly. Paravirtualization of I/O operations decreases the number of transitions between the guest VM and the Hypervisor, resulting in performance gains. This scenario requires spe- cial front-end and back-end drivers which are not necessarily developed by the same vendor as the one responsible for reg- ular device emulation (e.g. QEMU).
Paravirtualized I/O vulnerabilities and emulated I/O vul- nerabilities are very much alike. They are rooted in the in- teractions between front-end and back-end drivers, as well as those between back-end drivers and the outside world. For instance, CVE-2008-1943 describes a vulnerability in Xen that allowed paravirtualized front-end drivers to cause denial-of-service conditions and possibly execute arbitrary code with Dom0 privileges. This could be done by sending a malicious shared framebuffer descriptor to trick Xen into allocating an arbitrarily large internal buffer inside Dom0.
⃝7 VM Exits are the mechanism used by the Hypervisor to intercept and carry out operations invoked by guest VMs that require Virtual Machine eXtensions (VMX) root priv- ileges. These VM-to-Hypervisor interfaces are architecture- dependent (e.g. different code for x86 than for AMD64) and are very well specified in the architecture manuals. They are usually implemented using low-level programming lan- guages (Assembly or Machine language), relying on restric- tive bitwise operations. For Intel VT-x, this code is the one supporting all operations described in chapters 23 through 33 of Intel’s Software Developer’s Manual.
The fact that VM Exit-handling code does not possess
very rich data structures means that vulnerabilities hardly
have any exploitable effects other than a host or guest VM
crash (Denial-of-Service). For example, all VMCS fields
have a unique 32-bit field-encoding, which rules out com-
mon vulnerabilities that arise from variable-size input, such
as buffer overflows. According to CVE-2010-2938, request-
ing a full VMCS dump of a guest VM would cause the entire
host to crash when running Xen on a CPU without Ex-
tended Page Table (EPT) functionality. The reason for this
was that Xen would try to access EPT-related VMCS fields
without first verifying hardware support for those fields, allowing privileged (Ring 0) guest VM applications to trigger
a full denial-of-service attack on certain hosts at any time.
⃝8 Hypercalls are analogous to system calls in the OS world. While VM Exits are architecture-specific (e.g. AMD64, x86), hypercalls are Hypervisor-specific (e.g. Xen, KVM) and provide a procedural interface through which guest VMs can request privileged actions from the Hypervisor. Hyper- calls can be used to query CPU activity, manage Hard Disk partitions, and create virtual interrupts.
Hypercall vulnerabilities can present an attacker, who con- trols a guest VM, with a way to attain escalated privileges over the host system’s resources. Case in point, CVE-2009- 3290 mentions the fact that KVM used to allow unprivi- leged (Ring 3) guest callers to issue MMU hypercalls. Since the MMU command structures must be passed as an ar- gument to those hypercalls by their physical address, they only make sense when issued by a Ring 0 process. Having no access to the physical address space, the Ring 3 callers could still pass random addresses as arguments to the MMU hypercalls, which would either crash the guest VM or, in the worst case, read or write to kernel-space memory segments.
⃝9 VM Management functionalities make up the set of basic administrative operations that a Hypervisor must sup- port. The configuration of guest VMs is expressed in terms of their assigned virtual devices, dedicated PCI devices, main memory quotas, virtual CPU topologies and priorities, etc. The Hypervisor must then be able to start, pause and stop VMs that are true to the configurations declared by the cloud provider. These tasks are initiated by Xen’s Dom0 and KVM’s libvirt toolkit .
Kernel images must be decompressed into memory and interpreted by the management domain when booting up a VM. CVE-2007-4993 indicates that Xen’s bootloader for paravirtualized images used Python exec() statements to process the custom kernel’s user-defined configuration file, leading to the possibility of executing arbitrary python code inside Dom0. By changing the configuration file to include the line shown in Listing 1, a malicious user could trick Dom0 into issuing a command that would trigger the destruction of another co-hosted domain (substituting id with the victim domain’s ID) virtualized environment. Their purpose is generally to facil- itate the Hypervisor’s administration through user-friendly web interfaces and network-facing virtual consoles.
⃝8 Hypercalls are analogous to system calls in the OS world. While VM Exits are architecture-specific (e.g. AMD64, x86), hypercalls are Hypervisor-specific (e.g. Xen, KVM) and provide a procedural interface through which guest VMs can request privileged actions from the Hypervisor. Hyper- calls can be used to query CPU activity, manage Hard Disk partitions, and create virtual interrupts.
Hypercall vulnerabilities can present an attacker, who con- trols a guest VM, with a way to attain escalated privileges over the host system’s resources. Case in point, CVE-2009- 3290 mentions the fact that KVM used to allow unprivi- leged (Ring 3) guest callers to issue MMU hypercalls. Since the MMU command structures must be passed as an ar- gument to those hypercalls by their physical address, they only make sense when issued by a Ring 0 process. Having no access to the physical address space, the Ring 3 callers could still pass random addresses as arguments to the MMU hypercalls, which would either crash the guest VM or, in the worst case, read or write to kernel-space memory segments.
⃝9 VM Management functionalities make up the set of basic administrative operations that a Hypervisor must sup- port. The configuration of guest VMs is expressed in terms of their assigned virtual devices, dedicated PCI devices, main memory quotas, virtual CPU topologies and priorities, etc. The Hypervisor must then be able to start, pause and stop VMs that are true to the configurations declared by the cloud provider. These tasks are initiated by Xen’s Dom0 and KVM’s libvirt toolkit .
Kernel images must be decompressed into memory and interpreted by the management domain when booting up a VM. CVE-2007-4993 indicates that Xen’s bootloader for paravirtualized images used Python exec() statements to process the custom kernel’s user-defined configuration file, leading to the possibility of executing arbitrary python code inside Dom0. By changing the configuration file to include the line shown in Listing 1, a malicious user could trick Dom0 into issuing a command that would trigger the destruction of another co-hosted domain (substituting id with the victim domain’s ID) virtualized environment. Their purpose is generally to facil- itate the Hypervisor’s administration through user-friendly web interfaces and network-facing virtual consoles.
Vulnerabilities in these bundled applications can be ex-
ploited from anywhere and can lead to full control over the
virtualized environment. For example, CVE-2008-3253 de-
scribes a Cross-Site Scripting attack on a remote adminis-
tration console that exposed all of Xen’s VM management
actions to a remote attacker after stealing a victim’s authentication cookies.
⃝11 Hypervisor Add-ons: Hypervisors like Xen and KVM have modular designs that enable extensions to their ba- sic functionalities – Hypervisor Add-ons. For example, the National Security Agency (NSA) has developed their own version of Xen’s Security Modules (XSM) called FLASK.
Hypervisor add-ons increase the likelihood of Hypervisor vulnerabilities being present, since they increase the size of the Hypervisor’s codebase. For example, CVE-2008-3687 describes a heap overflow opportunity in one of Xen’s optional security modules, FLASK, which results in an escape from an unprivileged domain directly to the Hypervisor.
⃝11 Hypervisor Add-ons: Hypervisors like Xen and KVM have modular designs that enable extensions to their ba- sic functionalities – Hypervisor Add-ons. For example, the National Security Agency (NSA) has developed their own version of Xen’s Security Modules (XSM) called FLASK.
Hypervisor add-ons increase the likelihood of Hypervisor vulnerabilities being present, since they increase the size of the Hypervisor’s codebase. For example, CVE-2008-3687 describes a heap overflow opportunity in one of Xen’s optional security modules, FLASK, which results in an escape from an unprivileged domain directly to the Hypervisor.
No comments:
Post a Comment