x86 Translation Execution Steps

x86 Translation Execution Steps

x86 translation processors are those that run standard x86 code by translating it into internal RISC-like micro-instructions. These processors are much more complex because of the extra work of translating and then managing these micro-instructions. In a way, they are like tiny multiprocessing computers within the CPU itself, complete with special code to handle the allocation of tasks to the different execution units, much the way a multiprocessing operating system does when using more than one processor.

The actual stages used in execution vary by the individual processor, with some processors using more, smaller steps than others. However, they basically follow the same general path through the processor. These are the main steps followed; note that they are always pipelined:

Fetch: The first step is to load the instruction into the execution unit so it can be executed. Since memory is so slow compared to the processor, this stage doesn't involve a direct read from memory. Rather, special control circuitry loads larger blocks (16 or 32 bytes) of instruction data from memory and into the primary instruction cache. This data is then available for rapid feeding to the execution units as needed. Some processors have prefetch units that do this.
Decode: Translating processors employ multiple decoders, each of which is capable of taking instructions and decoding them into micro-instructions. Since none of the x86 instructions is executed directly, it makes sense to use multiple decoders to improve performance. The amount of time to decode an instruction depends on its complexity; simple instructions can be decoded at the rate of several per clock cycle, with more complicated ones taking more than a cycle each. Any addresses required in memory are also generated at this time.
Issue/Schedule: The micro-instructions are issued to the instruction pool, where they await assignment to an execution unit. Internal circuitry is used to optimize this task and control which instructions go where. This is sometimes called instruction scheduling, since it is basically the task of scheduling tasks (instructions) to available resources (execution units).
Execute: Each microinstruction is actually executed here. Multiple execution units are normally used to improve performance, where some are dedicated only to certain instructions. For example, complex floating-point operations are typically handled by the floating point execution unit.
Retire: Since microinstructions are issued to a pool and then sent to execution units, they execute independently and in fact, can execute out of order. (In fact, this is often stated as a feature of the processor: out-of-order execution). To ensure that there are no problems, the results from execution are stored in temporary locations at first. The retirement unit collects the results from the microinstructions and makes sure that the output is produced correctly, according to the intention of the original x86 instructions. This is called retiring the instructions.
Write-Back: In this stage, the results from execution are written back either to an internal register or the system memory. Again, system memory is very slow so the result isn't really written to it directly but rather to a write buffer, where it is held until it can be written to system memory or the cache.

Note: Don't confuse this term with write-back as it refers to cache; they are similar concepts but refer to different things.

Next: Performance Enhancing Architectural Features

Home - Search - Topics - Up

Not responsible for any loss resulting from the use of this site.
Please read the Site Guide before using this material.