Basically READ operations dominate processor cache accesses since many of the instruction accesses are READ operation’s and most instructions do not WRITE into memory. When the address of the block to be READ is available then the tag is read and if it is a HIT then READ from it.
In case of a miss the READ policies are:
- Read Through - Reading a block directly from main memory.
- No Read Through - Reading a block from main memory into cache and then from cache to CPU. So we even update the cache memory.
Basically a Miss is comparatively slow because they require the data to be transferred from main memory to CPU which incurs a delay since main memory is much slower than cache memory, and also incurs the overhead for recording the new data in the cache before it is delivered to the processor. To take advantage of Locality of Reference, the CPU copies data into the cache whenever it accesses an address not present in the cache. Since it is likely the system will access that same location shortly, the system will save wait states by having that data in the cache. Thus cache memory handles the temporal aspects of memory access, but not the spatial aspects.
Accessing Caching memory locations won't speed up the program execution if we constantly access consecutive memory locations (Spatial Locality of Reference). To solve this problem, most caching systems read several consecutive bytes from memory when a cache miss occurs. 80x86 CPUs, for example, read between 16 and 64 bytes at a shot (depending upon the CPU) upon a cache miss. If you read 16 bytes, why read them in blocks rather than as you need them? As it turns out, most memory chips available today have special modes which let you quickly access several consecutive memory locations on the chip. The cache exploits this capability to reduce the average number of wait states needed to access memory.
It is not the same with Cache WRITE operation. Modifying a block cannot begin until the tag is checked to see if the address is a hit. Also the processor specifies the size of the write, usually between 1 and 8 bytes; only that portion of the block can be changed. In contrast, reads can access more bytes than necessary without a problem.
The Cache WRITE policies on write hit often distinguish cache designs:
- Write Through - the modified data is written back to both the block in the cache memory and in the main memory.
Advantage:
1. READ miss never results in writes to main memory.
2. Easy to implement
3. Main Memory always has the most current copy of the data (consistent)
Disadvantage:
1. WRITE operation is slower as we have to update both Main Memory and Cache Memory.
2. Every write needs a main memory access as a result uses more memory bandwidth
- Write Back - the modified data is first written only to the block in the cache memory. The modified cache block is written to main memory only when it is replaced. In order to reduce the frequency of writing back blocks on replacement, a dirty bit (a status bit) is commonly used to indicate whether the block is dirty (modified while in the cache) or clean (not modified). If it is clean the block is not written on a miss.
Advantage:
1. WRITE’s occur at the speed of the cache memory.
2. Multiple WRITE’s within a block require only one WRITE to main memory as a result uses less memory bandwidth
Disadvantage:
1. Harder to implement
2. Main Memory is not always consistent with cache reads that result in replacement may cause writes of dirty blocks to main memory.
- Write Allocate - the memory block is first loaded into cache memory from main memory on a write miss, followed by the write-hit action.
- No Write Allocate - the block is directly modified in the main memory and not loaded into the cache memory.
The data in main memory being cached may be changed by other entities, in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU updates the data in the cache, copies of data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as cache coherence protocols.