Reason

Caches

split: $\exist$ $\exists$ separate instruction-cache & data-cache
- (+) parallel access possible#
- (+) caches can physically be closer to them accessing units
unified: one cache for both
inclusive: all blocks in higher level cache are also stored in lower level cache
exlusive: all blocks in higher level cache are not stored in lower level cache
non-exclusive: all blocks in higher level cache can be stored in lower level cache

index decides set which is slelected
compare all entries TAG with the TAG of the address
- Tag found -> Hit
- Tag not found -> Miss

Physically Indexed Physically Tagged (PIPT): Index and Tag read out from physical memory address
- virtual to physical address translation must be done bevorehand -> slow
Virtually Indexed Virtually Tagged (VIVT): Index and Tag read out from virtual memory address
- (+) faster
- (-) virtual address tied to one process | homonym
- (-) different virtual addresses may point to one physical address | synonym
Virtually Indexed Physically Tagged (VIPT): Index read out from virtual memory address and Tag read out from physical memory address
- (+) address translation happens parallel to cache request
- (+) homonyms prevented
- (-) synonyms possible

write back:
- write data only to cach | set dirty Bit
- uppon eviction: write data back to main memory
write-through: write data to cach and to main memory immediately

write-allocate: write miss -> load requested data into cache
no-write-allocate: write miss -> write data directly to memory | dont allocate cache

Least-Recently-Used (LRU): evict cache line with has not received a hit for the loongest time
- non-negligible hardware overhead to store age for every cache line
Least-Frequently-Used (LFU): evict cache line used the least often
Most-Recently-Used (MRU): evict cache line with has not received a hit for the shortest time

each control register assigned a unique memory address
no real memory assigned tho these addresses
(+) fast access to device inputs / outputs
(+) no need to first copy data from device registers
(+) no extra instructions needed
(+) protection from device IO can be performed via memory mapping
(+) testing of device registers fast, as comparison can be done directly in memory
(-) caching problematic | needs to be disabled for memory mapped addesses
(-) memory addressing on systems with multiple buses complicated

DMA controller is used to perform data transfer. CPU only initializes the transfer
saves CPU time
DMA signals finished transfer with interrupt
internal device buffer: required to store data while DMA waits for bus availability
(-) in embeded systems added overhead might not be worth it
(-) fast CPU slow DMA -> useless

word-at-a-time: DMA occasionally steals bus from CPU to performe short transfers
block- or burst-mode: series of transfers executed at once
fly-by-mode: DMA instructs device to access memory directly (no copy)
DAM access itself: DMA stores/reads words itself -> can performe device to device, memory to memory copies