2 Abstract Technologies & Performance

Common Processor Architectures

Harvard
- optimized for parallel fetches: separate program & data memories
- both memories can be accessed simultaneously → higher throughput
- typically in simple/specialized cores (DSPs, microcontrollers)
von Neumann
- unified program + data memory (“stored-program”)
- simpler silicon, more flexible instruction set
- shared bus creates the “von Neumann bottleneck” → limits throughput
- used in general-purpose CPUs
When to use which?
- Harvard → real-time, low-latency embedded
- von Neumann → complex OS, rich ISA

Cost per die $Cost per die = \frac{Cost per wafer}{Dies per wafer \times Yield}$
- Dies per wafer ≈ $\frac{π ( wafer radius ) ^{2}}{die area}$
- Yield = fraction of good dies after fabrication defects

Clock rate vs. cycle time
$Clock rate (Hz) = \frac{1}{Clock cycle time (s)}$
CPU execution time
- via cycles × time: $T_{exec} = Clock cycles \times Clock cycle time$
- via rate: $T_{exec} = \frac{Clock cycles}{Clock rate}$
Breaking down Clock cycles
$Clock cycles = program, ISA, compiler Instruction count \times hardware implementation (avg. cycles per instruction) CPI$
Unified CPU time formula
$T_{exec} = Instruction count \times CPI \times Clock cycle time = \frac{Instruction count \times CPI}{Clock rate}$

Reduce Instruction count
– better algorithms, more powerful ISA
Reduce CPI
– deeper pipelines, more parallelism
Increase Clock rate
– faster transistors, shorter cycle time
Trade-off example: deeper pipelines → higher clock rate but can increase CPI on mispredictions

Overall speedup when improving a feature that accelerates fraction $f$ of computation by factor $S$ : $Speedup_{total} = \frac{1}{( 1 - f ) + \frac{f}{S}} (Amdahl’s Law)$
- shows diminishing returns as $f \to 1$