# Limitations of Adaptable System Architectures for WCET Reduction

Jack Whitham

April 21st 2008



Jack Whitham ()

Limitations of Adaptable System Architectu

April 21st 2008 1 / 19





#### General problems





Jack Whitham ()

Limitations of Adaptable System Architectu

April 21st 2008 2 / 19

3. 3

#### WCET reduction

Usually, system architects aim to:

- Minimize *average case execution time* (ACET) of software: maximizing typical performance.
- But hard real-time system architects aim to:
  - Minimize *worst-case execution time* (WCET) of software: maximizing *guaranteed* performance.



#### My topic

- WCET reduction of a program,
- preferably automatic,
- preferably using a conventional programming language.



- A - E - M

< A >

# ACET vs WCET

ACET optimizations are relatively easy to implement:

- Performance analysis is simple: use profiling.
- Predictability isn't important provided that average performance is good.
- $\Rightarrow$  Heuristic mechanisms can be used.



# ACET vs WCET

WCET optimizations are relatively hard to implement:

- WCET analysis is tricky.
  - Must model the program.
  - Must model the CPU and system architecture.
- Predictability is important.
  - Predictability simplifies the models.
  - Predictability reduces *pessimism*.
- $\bullet \Rightarrow$  Heuristic mechanisms should be avoided.



#### CPU designs

Conventional CPU designs are good for ACET reduction, but not WCET reduction, because of:

- caches,
- superscalar out-of-order execution,
- branch prediction,
- generally, *clever but unpredictable techniques*.

All heuristic mechanisms! Analysis is possible but costly and pessimistic.



# Generalized WCET reduction process



Limitations of Adaptable System Architectu

April 21st 2008 8 / 19

#### **Examples**

Adaptable and reconfigurable systems could implement predictable mechanisms to minimize WCET.

For example, code can be accelerated by:

- Co-processor modules, loaded by run-time reconfiguration.
- Scratchpad memory, loaded at run-time.
- Custom microprograms, loaded at run-time.



WCET reduction process

#### Example: custom microprograms



WCET reduction process

#### Results: custom microprograms



Jack Whitham ()

Limitations of Adaptable System Architectu

#### General problems

All implementations will be subject to these limits:

- Instruction level parallelism (ILP) limit.
- 2 Load cost limit.
- 3 General purpose architecture limit.



#### ILP limit

Applies if you want to reduce the WCET of code written in a conventional programming language (e.g. C).



April 21st 2008

3

Caused by:

• Control flow (branches).

**2** Data dependences introduced by the compiler.

Oata dependences introduced by the problem requirements.



Caused by:

- Control flow (branches).
  - Addressed by dynamic speculation.
- 2 Data dependences introduced by the compiler.

Oata dependences introduced by the problem requirements.



Caused by:

- Control flow (branches).
  - Addressed by dynamic speculation.
  - Addressed by static speculation.
- 2 Data dependences introduced by the compiler.

Oata dependences introduced by the problem requirements.



- Control flow (branches).
  - Addressed by dynamic speculation.
  - Addressed by static speculation.
- ② Data dependences introduced by the compiler.
  - Partly addressed by memory speculation and register renaming.
- **③** Data dependences introduced by the problem requirements.



- Control flow (branches).
  - Addressed by dynamic speculation.
  - Addressed by static speculation.
- ② Data dependences introduced by the compiler.
  - Partly addressed by memory speculation and register renaming.
  - Real solution: improved programming languages (e.g. support for vectorisation).
- **③** Data dependences introduced by the problem requirements.



- Control flow (branches).
  - Addressed by dynamic speculation.
  - Addressed by static speculation.
- ② Data dependences introduced by the compiler.
  - Partly addressed by memory speculation and register renaming.
  - Real solution: improved programming languages (e.g. support for vectorisation).
- **③** Data dependences introduced by the problem requirements.
  - Unavoidable.



#### Load cost limit

Applies if you want to load instructions (or data) into a scratchpad (or FPGA). Necessary to make best use of limited on-chip memory.





April 21st 2008

15 / 19

Caused by:

Limited space in on-chip memory.

Ost of transferring data.



Jack Whitham ()

Limitations of Adaptable System Architectu

April 21st 2008 16 / 19

- Limited space in on-chip memory.
  - Addressed by dynamic loading.
- 2 Cost of transferring data.



- Limited space in on-chip memory.
  - Addressed by dynamic loading.
  - Addressed by static loading (overlaying).
- Ocst of transferring data.



- Limited space in on-chip memory.
  - Addressed by dynamic loading.
  - Addressed by static loading (overlaying).
- Ost of transferring data.
  - Addressed by burst transfers.



- Limited space in on-chip memory.
  - Addressed by dynamic loading.
  - Addressed by static loading (overlaying).
- Ost of transferring data.
  - Addressed by burst transfers.
  - Addressed by compression.





Jack Whitham ()

Limitations of Adaptable System Architectu

April 21st 2008

17 / 19

э

A choice: either,

• Write programs in a conventional programming language for a general purpose architecture,

or,

• Write programs that use application-specific hardware.



A choice: either,

- Write programs in a conventional programming language for a general purpose architecture,
  - Limited by ILP.

or,

• Write programs that use application-specific hardware.



A choice: either,

- Write programs in a conventional programming language for a general purpose architecture,
  - Limited by ILP.

or,

- Write programs that use application-specific hardware.
  - WCET reduction search is difficult (co-design).



A choice: either,

- Write programs in a conventional programming language for a general purpose architecture,
  - Limited by ILP.

or,

- Write programs that use application-specific hardware.
  - WCET reduction search is difficult (co-design).
  - Manual hardware design may be required.



- Conventional languages versus specialist languages.
- 2 Loading costs versus on-chip memory sizes.
- **③** General-purpose versus application specific architectures.



- Conventional languages versus specialist languages.
- 2 Loading costs versus on-chip memory sizes.
- **③** General-purpose versus application specific architectures.
- My own work has explored the first two.



- Conventional languages versus specialist languages.
- 2 Loading costs versus on-chip memory sizes.
- **③** General-purpose versus application specific architectures.
- My own work has explored the first two.
- There is plenty of scope for future work.



- Conventional languages versus specialist languages.
- 2 Loading costs versus on-chip memory sizes.
- **③** General-purpose versus application specific architectures.
  - My own work has explored the first two.
  - There is plenty of scope for future work.
  - Questions?

