X86ISA - Introduction (2024)

A formal and executable model of the x86 Instruction Set Architecture.

These books contain the specification of x86 instruction set architecture (ISA); we characterize x86 machine instructions and model the instruction fetch, decode, and execute process using the ACL2 theorem-proving system. We use our x86 ISA specification to formally verify x86 machine-code programs.

These books include:

A formal, executable x86 ISA model (see machine sub-directory)
General theorems to aid in machine-code verification (see proofs/utilities sub-directory)
Examples of using these books to verify some programs (see proofs sub-directory)

For information on how to certify these books, see x86isa-build-instructions.

Completeness of the x86 Model

The utility of a formal ISA model for performing machine-code verification depends directly on the model's completeness (with respect to the ISA features specified), accuracy, and reasoning and execution efficiency. Intel software developer's manuals were used as specification documents, and ambiguities were resolved by cross-referencing AMD manuals and running tests on real machines to understand processor behavior.

The current focus of these books is Intel's IA-32e mode, which includes 64-bit and Compatibility sub-modes, and the 32-bit protected mode.

To see a list of opcodes implemented in these books, see implemented-opcodes.

Reasoning and Execution Efficiency

These books contain a formal x86 ISA model that is not only capable of reasoning about x86 machine-code programs, but also of simulating those programs. See program-execution for instructions on how to set up the x86isa model for executing a program.

Reasoning efficiency is desirable to make code proofs tractable and execution efficiency is desirable to enable faster model validation via co-simulations (see model-validation).

However, simple definitions that are suitable for reasoning typically offer poor execution performance and definitions optimized for execution efficiency often use a sufficiently different algorithm from the one used in the specification that they are difficult to reason about. However, ACL2 offers several features to mitigate this trade-off between reasoning and execution efficiency. For e.g.: mbe and its friends like mbt, and assert*, ACL2::stobj, and defabsstobj.

ACL2 features like abstract stobjs and assert* were a response by the ACL2 authors to provide mechanisms to mitigate the complexity of reasoning vs. execution efficiency that came up over the course of this project. This x86 ISA model uses abstract stobjs to layer the state of the x86 machine such that the lower layer (i.e., the concrete stobj) can be optimized for execution efficiency and the upper layer (i.e., the abstract stobj) can be optimized for reasoning efficiency, while a correspondence theorem is proved to hold between these two layers at all times. Therefore, we get the benefit of both fast execution and effective reasoning.

Modes of Operation

The complexity of the x86 ISA model will increase as more features are added to it, and this added complexity will make reasoning inevitably more involved. The issue of balancing verification effort and verification utility is a very pertinent one. For example, users might not want to reason about an application program at the level of physical memory, i.e., taking into account address translations and access rights management provided by the memory management data structures. This is because it is customary for application programs not to have direct access to the system data structures. The memory model seen by 64-bit application programs is that of linear memory, which is an OS-constructed abstraction of the complicated underlying memory management mechanisms like paging that are based on physical memory. Therefore, verification of application programs can be performed at the level of linear memory, if the OS routines that manage the linear memory abstraction can be either trusted or proved correct. However, the verification of system programs, like kernel routines, must necessarily be done at the level of physical memory since these programs can access/modify system data structures.

In light of the above, the x86 ISA model provides the option to deactivate some features of the ISA, enabling the user to do two kinds of analysis, depending on the kind of programs being considered for verification. Specifically, the model offers the following views of x86 machines:

Application-Level View:
In this view, the model attempts to provide the same environment for reasoning as is provided by an OS for programming. It allows the verification of an application program while assuming that memory management, I/O operations, and services offered via system calls are provided reliably by the underlying OS. The memory model here supports 64-bit linear addresses specified for IA-32e machines. A specification of system calls like read, write, open, and close is also included in this view.
System-level View:
This view includes the specification for IA-32e paging and segmentation; in particular, ISA-prescribed data structures for memory management and (partial) specifications of system-mode instructions like LLDT and LGDT are available in this mode. The memory model here characterizes a 52-bit physical address space, which is the largest physical address space provided by modern x86 implementations. This view is intended to be used to simulate and verify software that has supervisor privileges and interacts with I/O devices.

An added benefit of having these two separate views is the increased execution speed of programs in the application-level view; this is because executing these programs in this view does not require simulating both the physical address space (and hence, accesses/updates to the paging data structures) and the linear address space.

It would be beneficial, not to mention interesting, to verify whether the application-level view is an abstraction of the system-level view, given that the system data structures have been set up correctly. As of now, establishing this relationship between the two modes is out of the scope of this project.