bahan OAK
apa itu superscalar ?
Instruksi umum (aritmatika, load / store, cabang bersyarat) dapat dimulai dan dilaksanakan secara independen
Sama berlaku untuk RISC & CISC
Dalam prakteknya biasanya RISC
Why Superscalar?
Most operations are on scalar quantities (see RISC notes)
Improve these operations to get an overall improvement
General Superscalar Organization
Superpipelined
-Many pipeline stages need less than half a clock cycle
-Double internal clock speed gets two tasks per external clock cycle
-Superscalar allows parallel fetch execute
IA64
Background to IA-64
Pentium 4 appears to be last in x86 line
Intel & Hewlett-Packard (HP) jointly developed
New architecture
--64 bit architecture
--Not extension of x86
--Not adaptation of HP 64bit RISC architecture
Exploits vast circuitry and high speeds
Systematic use of parallelism
Departure from superscalar
Motivation
Instruction level parallelism
--Implicit in machine instruction
--Not determined at run time by processor
Long or very long instruction words (LIW/VLIW)
Branch predication (not the same as branch prediction)
Speculative loading
Intel & HP call this Explicit Parallel Instruction Computing (EPIC)
IA-64 is an instruction set architecture intended for implementation on EPIC
Itanium is first Intel product
Superscalar v IA-64
Mengapa Arsitektur Baru?
Tidak kompatibel dengan hardware x86
Sekarang memiliki puluhan juta transistor yang tersedia pada chip
Bisa membangun cache yang lebih besar
menurun
Menambahkan unit eksekusi lebih
- Meningkatkan superscaling
- "Kompleksitas wall"
- Unit lainnya membuat prosesor "yang lebih luas"
- Lebih logika yang dibutuhkan untuk mengatur
- Peningkatan prediksi cabang yang diperlukan
- Pipa yang lebih panjang diperlukan
- Hukuman yang lebih besar untuk misprediction
- Jumlah yang lebih besar dari register penggantian nama diperlukan
- Pada kebanyakan enam instruksi per siklus
Paralelisme eksplisit
Instruksi paralelisme dijadwalkan pada waktu kompilasi
-Termasuk dengan instruksi mesin
Prosesor menggunakan info ini untuk melakukan eksekusi paralel
Membutuhkan sirkuit kurang kompleks
Compiler memiliki lebih banyak waktu untuk menentukan operasi paralel mungkin
Compiler melihat seluruh program
General Organization
Key Features
#Besarnya jumlah register
#Format instruksi
-IA-64 mengasumsikan 256
-- 128 * 64 bit integer, logis & tujuan umum
--128 * 82 bit floating point dan grafis
-64 * 1 bit berpredikat register eksekusi (lihat nanti)
- Untuk mendukung tingkat tinggi paralelisme
# unit eksekusi multiple
-Diperkirakan 8 atau lebih
-Tergantung pada jumlah transistor yang tersedia
-Eksekusi instruksi paralel tergantung pada perangkat keras yang tersedia
- 8 instruksi paralel dapat tumpah ke dua banyak empat jika hanya empat unit eksekusi yang tersedia
IA-64 Execution Units
i-Unit
--Integer arithmetic
--Shift and add
--Logical
--Compare
--Integer multimedia ops
M-Unit
--Load and store
---Between register and memory
--Some integer ALU
B-Unit
--Branch instructions
F-Unit
--Floating point instructions
Instruction Format Diagram
Instruction Format
128 bit bundle
--Holds three instructions (syllables) plus template
--Can fetch one or more bundles at a time
--Template contains info on which instructions can be executed in parallel
----Not confined to single bundle
----e.g. a stream of 8 instructions may be executed in parallel
----Compiler will have re-ordered instructions to form contiguous bundles
----Can mix dependent and independent instructions in same bundle
--Instruction is 41 bit long
----More registers than usual RISC
----Predicated execution registers (see later)
CONTROL UNIT OPERATION
Micro-Operation
A computer executes a program
Fetch/execute cycle
Each cycle has a number of steps
--see pipelining
Called micro-operations
Each step does very little
Atomic operation of CPU
Constituent Elements of Program Execution
Fetch - 4 Registers
#Memory Address Register (MAR)
--Connected to address bus
--Specifies address for read or write op
#Memory Buffer Register (MBR)
--Connected to data bus
--Holds data to write or last data read
#Program Counter (PC)
--Holds address of next instruction to be fetched
#Instruction Register (IR)
--Holds last instruction fetched
Fetch Sequence
Alamat instruksi berikutnya ada di PC
Alamat (MAR) ditempatkan pada bus alamat
Isu unit kontrol READ perintah
Hasil (data dari memori) muncul pada bus data
Data dari bus data yang disalin ke MBR
PC bertambah dengan 1 (secara paralel dengan data mengambil dari memori)
Data (instruksi) dipindahkan dari MBR ke IR
MBR sekarang bebas untuk fetch data lebih lanjut
Flowchart for Instruction Cycle
Functional Requirements
Define basic elements of processor
Describe micro-operations processor performs
Determine functions control unit must perform
Basic Elements of Processor
ALU
Registers
Internal data pahs
External data paths
Control Unit
Types of Micro-operation
Transfer data between registers
Transfer data from register to external
Transfer data from external to register
Perform arithmetic or logical ops
Functions of Control Unit
Sequencing
--Causing the CPU to step through a series of micro-operations
Execution
--Causing the performance of each micro-op
This is done using Control Signals
Control Signals
Clock
--One micro-instruction (or set of parallel micro-instructions) per clock cycle
Instruction register
--Op-code for current instruction
--Determines which micro-instructions are performed
Flags
--State of CPU
--Results of previous operations
From control bus
--Interrupts
--Acknowledgements
Model of Control Unit
apa itu superscalar ?
Instruksi umum (aritmatika, load / store, cabang bersyarat) dapat dimulai dan dilaksanakan secara independen
Sama berlaku untuk RISC & CISC
Dalam prakteknya biasanya RISC
Why Superscalar?
Most operations are on scalar quantities (see RISC notes)
Improve these operations to get an overall improvement
General Superscalar Organization
Superpipelined
-Many pipeline stages need less than half a clock cycle
-Double internal clock speed gets two tasks per external clock cycle
-Superscalar allows parallel fetch execute
Superscalar
v Superpipeline
Limitations
-Instruction level parallelism
-Compiler based optimisation
-Hardware techniques
-Limited by
--True data dependency
--Procedural dependency
--Resource conflicts
--Output dependency
--Antidependency
IA64
Background to IA-64
Pentium 4 appears to be last in x86 line
Intel & Hewlett-Packard (HP) jointly developed
New architecture
--64 bit architecture
--Not extension of x86
--Not adaptation of HP 64bit RISC architecture
Exploits vast circuitry and high speeds
Systematic use of parallelism
Departure from superscalar
Motivation
Instruction level parallelism
--Implicit in machine instruction
--Not determined at run time by processor
Long or very long instruction words (LIW/VLIW)
Branch predication (not the same as branch prediction)
Speculative loading
Intel & HP call this Explicit Parallel Instruction Computing (EPIC)
IA-64 is an instruction set architecture intended for implementation on EPIC
Itanium is first Intel product
Superscalar v IA-64
Mengapa Arsitektur Baru?
Tidak kompatibel dengan hardware x86
Sekarang memiliki puluhan juta transistor yang tersedia pada chip
Bisa membangun cache yang lebih besar
menurun
Menambahkan unit eksekusi lebih
- Meningkatkan superscaling
- "Kompleksitas wall"
- Unit lainnya membuat prosesor "yang lebih luas"
- Lebih logika yang dibutuhkan untuk mengatur
- Peningkatan prediksi cabang yang diperlukan
- Pipa yang lebih panjang diperlukan
- Hukuman yang lebih besar untuk misprediction
- Jumlah yang lebih besar dari register penggantian nama diperlukan
- Pada kebanyakan enam instruksi per siklus
Paralelisme eksplisit
Instruksi paralelisme dijadwalkan pada waktu kompilasi
-Termasuk dengan instruksi mesin
Prosesor menggunakan info ini untuk melakukan eksekusi paralel
Membutuhkan sirkuit kurang kompleks
Compiler memiliki lebih banyak waktu untuk menentukan operasi paralel mungkin
Compiler melihat seluruh program
General Organization
Key Features
#Besarnya jumlah register
#Format instruksi
-IA-64 mengasumsikan 256
-- 128 * 64 bit integer, logis & tujuan umum
--128 * 82 bit floating point dan grafis
-64 * 1 bit berpredikat register eksekusi (lihat nanti)
- Untuk mendukung tingkat tinggi paralelisme
# unit eksekusi multiple
-Diperkirakan 8 atau lebih
-Tergantung pada jumlah transistor yang tersedia
-Eksekusi instruksi paralel tergantung pada perangkat keras yang tersedia
- 8 instruksi paralel dapat tumpah ke dua banyak empat jika hanya empat unit eksekusi yang tersedia
i-Unit
--Integer arithmetic
--Shift and add
--Logical
--Compare
--Integer multimedia ops
M-Unit
--Load and store
---Between register and memory
--Some integer ALU
B-Unit
--Branch instructions
F-Unit
--Floating point instructions
Instruction Format Diagram
Instruction Format
128 bit bundle
--Holds three instructions (syllables) plus template
--Can fetch one or more bundles at a time
--Template contains info on which instructions can be executed in parallel
----Not confined to single bundle
----e.g. a stream of 8 instructions may be executed in parallel
----Compiler will have re-ordered instructions to form contiguous bundles
----Can mix dependent and independent instructions in same bundle
--Instruction is 41 bit long
----More registers than usual RISC
----Predicated execution registers (see later)
CONTROL UNIT OPERATION
Micro-Operation
A computer executes a program
Fetch/execute cycle
Each cycle has a number of steps
--see pipelining
Called micro-operations
Each step does very little
Atomic operation of CPU
Fetch - 4 Registers
#Memory Address Register (MAR)
--Connected to address bus
--Specifies address for read or write op
#Memory Buffer Register (MBR)
--Connected to data bus
--Holds data to write or last data read
#Program Counter (PC)
--Holds address of next instruction to be fetched
#Instruction Register (IR)
--Holds last instruction fetched
Fetch Sequence
Alamat instruksi berikutnya ada di PC
Alamat (MAR) ditempatkan pada bus alamat
Isu unit kontrol READ perintah
Hasil (data dari memori) muncul pada bus data
Data dari bus data yang disalin ke MBR
PC bertambah dengan 1 (secara paralel dengan data mengambil dari memori)
Data (instruksi) dipindahkan dari MBR ke IR
MBR sekarang bebas untuk fetch data lebih lanjut
Flowchart for Instruction Cycle
Functional Requirements
Define basic elements of processor
Describe micro-operations processor performs
Determine functions control unit must perform
Basic Elements of Processor
ALU
Registers
Internal data pahs
External data paths
Control Unit
Types of Micro-operation
Transfer data between registers
Transfer data from register to external
Transfer data from external to register
Perform arithmetic or logical ops
Functions of Control Unit
Sequencing
--Causing the CPU to step through a series of micro-operations
Execution
--Causing the performance of each micro-op
This is done using Control Signals
Control Signals
Clock
--One micro-instruction (or set of parallel micro-instructions) per clock cycle
Instruction register
--Op-code for current instruction
--Determines which micro-instructions are performed
Flags
--State of CPU
--Results of previous operations
From control bus
--Interrupts
--Acknowledgements
Model of Control Unit
Control Signals - output
Within CPU
--Cause data movement
--Activate specific functions
Via control bus
--To memory
--To I/O modules
Example Control Signal Sequence - Fetch
MAR <- div="">->
Data Paths and Control Signals
Internal Organization
Usually a single internal bus
Gates control movement of data onto and off the bus
Control signals control data transfer to and from external systems bus
Temporary registers needed for proper operation of ALU
--Control unit activates signal to open gates between PC and MAR
MBR <- div="" memory="">->
--Open gates between MAR and address bus
--Memory read control signal
--Open gates between data bus and MBR
Internal Organization
Usually a single internal bus
Gates control movement of data onto and off the bus
Control signals control data transfer to and from external systems bus
Temporary registers needed for proper operation of ALU
Intel 8085 CPU Block Diagram
Intel 8085 Pin Configuration
Intel 8085 OUT Instruction Timing Diagram
rangga bahtera
Problems With Hard Wired Designs
Complex sequencing & micro-operation logic
Difficult to design and test
Inflexible design
Difficult to add new instructions
Micro-programmed Control
Control Unit Organization
Micro-programmed Control
Use sequences of instructions (see earlier notes) to control complex operations
Called micro-programming or firmware
Implementation (1)
All the control unit does is generate a set of control signals
Each control signal is on or off
Represent each control signal by a bit
Have a control word for each micro-operation
Have a sequence of control words for each machine code instruction
Add an address to specify the next micro-instruction, depending on conditions
Implementation (2)
Today’s large microprocessor
--Many instructions and associated register-level hardware
--Many control points to be manipulated
This results in control memory that
--Contains a large number of words
----co-responding to the number of instructions to be executed
--Has a wide word width
----Due to the large number of control points to be manipulated
Micro-program Word Length
Based on 3 factors
--Maximum number of simultaneous micro-operations supported
--The way control information is represented or encoded
--The way in which the next micro-instruction address is specified
Vertical Micro-programming
Width is narrow
n control signals encoded into log2 n bits
Limited ability to express parallelism
Considerable encoding of control information requires external memory word decoder to identify the exact control line being manipulated
Horizontal Micro-programming
Wide memory word
High degree of parallel operations possible
Little encoding of control information
Compromise
Divide control signals into disjoint groups
Implement each group as separate field in memory word
Supports reasonable levels of parallelism without too much complexity
Organization of Control Memory
Control Unit
Control Unit Function
Sequence login unit issues read command
Word specified in control address register is read into control buffer register
Control buffer register contents generates control signals and next address information
Sequence login loads new address into control buffer register based on next address information from control buffer register and ALU flags
Next Address Decision
-Depending on ALU flags and control buffer register
--Get next instruction
----Add 1 to control address register
--Jump to new routine based on jump microinstruction
----Load address field of control buffer register into control address register
--Jump to machine instruction routine
----Load control address register based on opcode in IR
Divide control signals into disjoint groups
Implement each group as separate field in memory word
Supports reasonable levels of parallelism without too much complexity
Organization of Control Memory
Control Unit
Control Unit Function
Sequence login unit issues read command
Word specified in control address register is read into control buffer register
Control buffer register contents generates control signals and next address information
Sequence login loads new address into control buffer register based on next address information from control buffer register and ALU flags
Next Address Decision
-Depending on ALU flags and control buffer register
--Get next instruction
----Add 1 to control address register
--Jump to new routine based on jump microinstruction
----Load address field of control buffer register into control address register
--Jump to machine instruction routine
----Load control address register based on opcode in IR
Functioning of Microprogrammed Control Unit
rangga bahtera
Tidak ada komentar:
Posting Komentar