[ The PC Guide | Systems and Components Reference Guide
| The Processor | Processor
Architecture and Operation | Internal Processor Interfaces and
Operation | Instruction Execution Process ]
x86 Translation Execution Steps
x86 translation processors are those that run standard x86 code by translating it into
internal RISC-like micro-instructions. These processors are much more complex because of
the extra work of translating and then managing these micro-instructions. In a way, they
are like tiny multiprocessing computers within the CPU itself, complete with special code
to handle the allocation of tasks to the different execution units, much the way a
multiprocessing operating system does when using more than one
processor.
The actual stages used in execution vary by the individual processor, with some
processors using more, smaller steps than others. However, they basically follow the same
general path through the processor. These are the main steps followed; note that they are
always pipelined:
- Fetch: The first step is to load the instruction into the execution unit so it
can be executed. Since memory is so slow compared to the processor, this stage doesn't
involve a direct read from memory. Rather, special control circuitry loads larger blocks
(16 or 32 bytes) of instruction data from memory and into the primary instruction cache.
This data is then available for rapid feeding to the execution units as needed. Some
processors have prefetch units that do this.
- Decode: Translating processors employ multiple decoders, each of which is capable
of taking instructions and decoding them into micro-instructions. Since none of the x86
instructions is executed directly, it makes sense to use multiple decoders to improve
performance. The amount of time to decode an instruction depends on its complexity; simple
instructions can be decoded at the rate of several per clock cycle, with more complicated
ones taking more than a cycle each. Any addresses required in memory are also generated at
this time.
- Issue/Schedule: The micro-instructions are issued to the instruction pool, where
they await assignment to an execution unit. Internal circuitry is used to optimize this
task and control which instructions go where. This is sometimes called instruction
scheduling, since it is basically the task of scheduling tasks (instructions) to
available resources (execution units).
- Execute: Each microinstruction is actually executed here. Multiple execution
units are normally used to improve performance, where some are dedicated only to certain
instructions. For example, complex floating-point operations are typically handled by the
floating point execution unit.
- Retire: Since microinstructions are issued to a pool and then sent to execution
units, they execute independently and in fact, can execute out of order. (In fact, this is
often stated as a feature of the processor: out-of-order execution). To ensure that
there are no problems, the results from execution are stored in temporary locations at
first. The retirement unit collects the results from the microinstructions and makes sure
that the output is produced correctly, according to the intention of the original x86
instructions. This is called retiring the instructions.
- Write-Back: In this stage, the results from execution are written back either to
an internal register or the system memory. Again, system memory is very slow so the result
isn't really written to it directly but rather to a write buffer, where it is held
until it can be written to system memory or the cache.
Note: Don't confuse this
term with write-back as it refers to cache; they are similar concepts but refer to
different things.
Next: Performance
Enhancing Architectural Features
Home - Search
- Topics - Up
|