CS 201 System Programming and Architecture

#### Cache Cont'd



Slides adapted from CS205@PSU(Prof. Li) / CS15-213 @CMU

#### **Recall: Cache Org. and Read**



- Check if any line in set has
  - matching tag
- Yes + line valid: hit
- Locate data starting at offset

Address of word:



#### **Understanding Cache Parameters**

Question: What different values of S, E, B imply?

- E = 1 (Direct mapped caches) "assigned seating"
- S = 1 (Fully associative caches) "open seating"
  - Cache has a single set; a memory address can map to any cache line.
- General cases are called Set associative caches. "assigned coach"
  - A memory address can map to any cache line within a fixed set.

# Direct Mapped Cache (E=1)

#### • Each memory address maps to a single cache line.

• Advantage: Simplest to implement

Example: Cache block size 8 bytes





# Direct Mapped Cache (E=1, B=8)

• Each memory address maps to a single cache line.

• Advantage: Simplest to implement

Example: Cache block size 8 bytes



No match? Then old line is evicted and replaced

| 0x00 |
|------|
| 0x11 |
| 0x22 |
| 0x33 |
| 0x44 |
| 0x55 |
| 0x66 |
| 0x77 |
| 0x88 |
| 0x99 |
| OxAA |
| OxBB |
| OxCC |
| OxDD |
| OxEE |
| OxFF |
|      |

#### Cache (S,E,B) = (4,1,1)

Index Tag Data

| 00 |  |
|----|--|
| 01 |  |
| 10 |  |
| 11 |  |

m = 4 bits of address = 16 bytes of memory

- S = 4 sets
- E = 1 line per set

B = 1 byte per line/block

 $s = \log S = 2$  Index is 2 LSB of address

 $b = \log B = o$ 

$$t=4 - (s+b) = 2$$
 Tag is 2 MSB of address







| 0000                 | 0x00                 |
|----------------------|----------------------|
| 0001                 | 0x11                 |
| 0010                 | 0x22                 |
| 0011                 | 0x33                 |
| 0100                 | 0x44                 |
| 0101                 | <b>0</b> x55         |
| 0110                 | 0x66                 |
| 0111                 | 0x77                 |
| 1000                 | 0x88                 |
| 1001                 | 000                  |
| TOOT                 | 0x99                 |
| 1010                 | Ox99<br>OxAA         |
|                      |                      |
| 1010                 | OxAA                 |
| 1010<br>1011         | OxAA<br>OxBB         |
| 1010<br>1011<br>1100 | OxAA<br>OxBB<br>OxCC |

#### Cache (S,E,B) = (2,1,2)

Index Tag Dataı Datao



b =

**s** =

t = 4 - (s+b) =

| 0000                         | 0x00                         |
|------------------------------|------------------------------|
| 0001                         | 0x11                         |
| 0010                         | 0x22                         |
| 0011                         | 0x33                         |
| 0100                         | 0x44                         |
| 0101                         | <b>0x55</b>                  |
| 0110                         | 0x66                         |
| 0111                         | 0x77                         |
| 1000                         |                              |
| 1000                         | 0x88                         |
| 1000                         | 0x88<br>0x99                 |
|                              |                              |
| 1001                         | 0x99                         |
| 1001<br>1010                 | 0x99<br>0xAA                 |
| 1001<br>1010<br>1011         | 0x99<br>0xAA<br>0xBB         |
| 1001<br>1010<br>1011<br>1100 | 0x99<br>0xAA<br>0xBB<br>0xCC |

#### Cache (S,E,B) = (2,1,2)

Index Tag Dataı Datao



b = 1

S = 1

t = 4 - (s+b) = 2

| t (2 bits) | s (1 bit) | b (1 bit) |
|------------|-----------|-----------|
|------------|-----------|-----------|

0x00 0000 0001 **O**x1] 0x22 0010 0011 0x33 0100 0x44 0101 **0x55** 0110 0x66 0111 0x77 1000 Ox88 1001 0x99 1010 OxAA 1011 OxBB 1100 OxCC 0xDD 1101 OxEE 1110 1111 OxFF



#### **Direct-Mapped Cache Simulation**



M=16 byte addresses, B=2 bytes/block,

S=4 sets, E=1 Blocks/set

Address trace (reads, one byte per read):

|       | v | Tag | Block  |
|-------|---|-----|--------|
| Set o | 1 | 0   | M[0-1] |
| Set 1 |   |     |        |
| Set 2 |   |     |        |
| Set 3 |   |     |        |



Consider the following code that runs on a system with a cache of the form (S,E,B,m) = (512,1,32,32)

int array[4096];

for (i=0; i<4096; i++) sum += array[i];

Assuming sequential allocation of the integer array.

What is the maximum number of integers from the array that are stored in the cache at any point in time?

# Today

Case study: direct mapped cache

Case study: set associative cache

Case study: fully associative cache

More discussions on Cache

#### **Recall:** Cache Parameters

Question: What different values of S, E, B imply?

- E = 1 (Direct mapped caches) "assigned seating"
- S = 1 (Fully associative caches) "open seating"
  - Cache has a single set; a memory address can map to any cache line.
- General cases are called Set associative caches. "assigned coach"
  - A memory address can map to any cache line within a fixed set.

# **Set Associative Caches**

- Each memory address is assigned to a particular set in the cache, but not to a specific block in the set.
  - Advantage: Reduce conflict misses
- Each set can store multiple distinct blocks/lines
  - If each set has E blocks, the cache is called an E-way associative cache
  - Set sizes range from 1 (direct-mapped) to whole cache (fully associative)





E = 2: Two lines per set

Cache block size 8 bytes

 Address of short int:

 t bits
 0...01
 100

 v
 tag
 01234567
 v
 tag
 01234567

E = 2: Two lines per set



E = 2: Two lines per set



E = 2: Two lines per set



E = 2: Two lines per set



No match? One line in set is selected for eviction and replacement Replacement policies: random, least recently used (LRU), ...

# 2-Way Set Associative Cache Simulation



M=16 byte addresses, B=2 bytes/block,

S=2 sets, E=2 Blocks/set

Address trace (reads, one byte per read):



0 [00002], min  
1 [00012], hit m[1]  
7 [01112], min 
$$(72)$$
  
8 [10002],  $10 \neq 00$  miss  
0 [00002]

Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

• Excluding the overhead of the tags and valid bits, what is the capacity of this cache?  $C = \int x E x B = 8 + 2 \times 4 = 6 \times 5$ 

Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

- Excluding the overhead of the tags and valid bits, what is the capacity of this cache?
- Draw a figure of this cache

Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

- Excluding the overhead of the tags and valid bits, what is the capacity of this cache?
- Draw a figure of this cache



Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

Excluding the overhead of the tags and valid bits, what is the capacity of this cache?



Iraw a diagram that shows the parts of the address that are used to determine the cache tag, cache set index, and cache block offset

Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

- Excluding the overhead of the tags and valid bits, what is the capacity of this cache?
- Draw a figure of this cache



In the cache tag, cache set index, and cache block offset to the set index index index.



Consider a 2-way set associative cache (S,E,B,m) = (8,2,4,13)

Note: Invalid cache lines are left blank

- What is the block offset of this address?  $(0) \rightarrow (0)$
- What is the set index of this address?

Consider an access to 0x0E34.

- What is the cache tag of this address?
- Does this access hit or miss in the cache?
- What value is returned if it is a hit?

| t 1 | ag | Data0-3    |    |    |    |  |  |
|-----|----|------------|----|----|----|--|--|
| D   | 09 | 86         | 30 | 3F | 10 |  |  |
| 1   | 45 | 60         | 4F | Е0 | 23 |  |  |
| 2   | EB |            |    |    |    |  |  |
| 3   | 06 |            |    |    |    |  |  |
| 4   | C7 | 06         | 78 | 07 | C5 |  |  |
| 5   | 71 | UВ         | DE | 18 | 4B |  |  |
| 6   | 91 | <b>A</b> 0 | в7 | 26 | 2D |  |  |
| 7   | 46 |            |    |    |    |  |  |

| Tag | Data0-3 |    |    |    |   |  |
|-----|---------|----|----|----|---|--|
| 00  |         |    |    |    |   |  |
| 38  | 00      | BC | 0в | 37 |   |  |
| 0в  |         |    |    |    |   |  |
| 32  | 12      | 08 | 7B | AD |   |  |
| 05  | 40      | 67 | C2 | 3в | 1 |  |
| 6E  |         |    |    |    |   |  |
| FO  |         |    |    |    |   |  |
| DE  | 12      | C0 | 88 | 37 | 1 |  |

# Today

Case study: direct mapped cache

Case study: set associative cache

• Case study: fully associative cache

More discussions on Cache

#### Fully Associative Cache (S=1)

- In A fully associative cache permits data to be stored in any cache block, instead of forcing each memory address into one particular block.
  - There is only 1 set (i.e. S=1 and s=0) C = E \* B
- When data is fetched from memory, it can be placed in any unused block of the cache.
- Iliminates conflict misses between two or more memory addresses which map to a single cache block.

Consider the following fully associative cache: (S,E,B,m) = (1,4,1,4)

• Derive values for number of address bits used for the tag (t), the index (s) and the block offset (b)

• Draw a diagram of which bits of the address are used for the tag, the set index and the block offset



b=0

Draw a diagram of the cache

Consider the following fully associative cache: (S,E,B,m) = (1,4,1,4)

 Derive values for number of address bits used for the tag (t), the index (s) and the block offset (b)

s = 0 b = 0 t=4

Iraw a diagram of which bits of the address are used for the tag, the set index and the block offset



Draw a diagram of the cache

|     | Tag    | Data   | Tag    | Data   | Tag    | Data   | Tag    | Data   |
|-----|--------|--------|--------|--------|--------|--------|--------|--------|
| s=0 | 4 bits | 1 byte |



Cache (S,E,B,m) = (1,4,1,4)Tag Tag Tag Tag Data Data Data 0000 0110 OXOD 0×0° OX () 330 Access pattern mins niss niss hit  $\bigcirc$ 

0110

0001

0110

0010

0110

•••

Data



#### Main memory

| 0000 | 0x00 |
|------|------|
| 0001 | 0x11 |
| 0010 | 0x22 |
| 0011 | 0x33 |
| 0100 | 0x44 |
| 0101 | 0x55 |
| 0110 | 0x66 |
| 0111 | 0x77 |
| 1000 | 0x88 |
| 1001 | 0x99 |
| 1010 | 0xAA |
| 1011 | 0xBB |
| 1100 | 0xCC |
| 1101 | 0xDD |
| 1110 | OxEE |
| 1111 | OxFF |

| Cache | (S,E,B,m) = (1,4,1,4) |
|-------|-----------------------|
|-------|-----------------------|

| Tag | Data | Tag | Data | Tag | Data | Tag | Data |
|-----|------|-----|------|-----|------|-----|------|
|     |      |     |      |     |      |     |      |

#### Access pattern

| 0000 |
|------|
| 0110 |
| 0001 |
| 0110 |
| 0010 |
| 0110 |

...

Data Miss rate after first two cold misses = 0%



# The Price of Full Associativity

- :( A fully associative cache is expensive to implement.
- No index field in the address anymore
- Entire address used as the tag, increasing the total cache size.
- In the cache, so we must check the tag of every cache block. That's a lot of comparators!



# Today

Case study: direct mapped cache

Case study: set associative cache

Case study: fully associative cache

More discussions on Cache

# Cache Line Replacement

Any empty block in the correct set may be used for storing data. If there are no empty blocks, the cache will attempt to replace one.

#### **Replacement Algorithms**

- Most common: Least Recently Used (LRU)
  - If a block hasn't been used in a while, it's less likely needed again anytime soon.
- First in first out (FIFO): Replace block that has been in cache longest
- Least frequently used: Replace block which has had fewest hits

· Random (arbitvary

#### Data Writes in Memory Hierarchy

On a computer system, multiple copies of data exist, e.g. L1, L2, main, and disk.

Question: What happens when data is written to memory through cache? Answer: Cache needs write policies as well.

- Write-hit: Block being written is in cache
- Write-miss: Block being written is NOT in cache

### Write-Hit Policies

#### Write-through

- Writes go immediately to main memory (as well as cache)
- Can incur lots of traffic, slows down writes

#### Write-back

- Writes initially made in cache only
- Defer write to memory until replacement of line
- Need a dirty bit (line different from memory or not)
- Can cause inconsistency among caches

# Write-Miss Policies

#### Write-allocate

- When write to address misses, load into cache
- Update line in cache
- Good if more writes to the location follow

#### No-write-allocate

- When write to address misses, write straight to memory. Do not load into cache!
- Good if there are no subsequent reads and writes to address

Typically

- Write-through + No-write-allocate
- Write-back + Write-allocate

# **Cache Design Summary**

- Cache size
- Associativity
- Block size
- Replacement algorithms
- Write policies
- Number of caches