SEMINARS AND PROJECTS: Cache Memory

INTRODUCTION :-

1.1 Cache Memory Systems:-

We can represent a computers memory and storage systems, hierarchy with a triangle with the processors internal registers at the top and the hard drive at the bottom. The internal registers are the fastest and most expensive memory in the system and the system memory is the least expensive.

q Here is a representation of that hierarchy :-

· Internal Registers (less than 1K, fastest access)

· Level One Cache (inside the processor, 8K or 16K at present)

· Level Two Cache (on the system board, 64K to 1Meg, static RAM with 20nsec access)

· Main Memory (on the system board, 4Meg to 128Meg, uses dynamic RAM with 50 to 70nsec access)

· Mass storage devices (hard drives, CDROM)

1.2 What Is Cache?

q Definition:-

For reducing average memory access time, the active portion of the program and data are placed in a fast small memory .such a fast small memory is referred to as a cache memory.

Cache Memory is the intermediate buffer between CPU and MAIN -MEMORY which is shown in figure:-

The cache memory access time is less than the access time of main memory by a factor of 5 to 10.the cache is the fastest component in the memory hierarchy and approaches the speed of CPU components.

The fundamental idea of cache organization is that by keeping the most frequently accessed instruction and data in the fast cache memory, the average memory access time will approach the access time of the cache.

1.3 PRINCIPAL OF LOCALITY

Caches work on the basis of the locality of program behavior. There are three principles involved:

Spatial Locality –

Given an access to a particular location in memory, there is a high probability that other accesses will be made to either that or neighboring locations withing the lifetime of the program.

Temporal Locality –

This is complementary to spatial locality. Given a sequence of references to n locations, there is a high probability that references following this sequence will be made into the sequence. Elements of the sequence will again be referenced during the lifetime of the program.

Sequentiality-

Given that a reference has been made to a particular location s it is likely that within the next several references a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it.

1.4 Basic Model Of Cache Memory:-

Figure shows a simplified diagram of a system with cache. In this system, every time the CPU performs a read or write, the cache may intercept the bus transaction, allowing the cache to decrease the response time of the system. Before discussing this cache model, lets define some of the common terms used when talking about cache.

· Cache Hits:-

When the cache contains the information requested, the transaction is said to be a cache hit.

· Cache Miss:-

When the cache does not contain the information requested, the transaction is said to be a cache miss.

· Cache Consistency:-

Since cache is a photo or copy of a small piece main memory, it is important that the cache always reflects what is in main memory. Some common terms used to describe the process of maintaining cache are:-

1. Snoop:-

When a cache is watching the address lines for transaction, this is called a snoop. This function allows the cache to see if any transactions are accessing memory it contains within itself.

2. Snarf :-

When a cache takes the information from the data lines, the cache is

said to have snarfed the data. This function allows the cache to be

updated and maintain consistency.Snoop and snarf are the mechanisms the cache uses to maintain consistency.Two other terms are commonly used to describe the inconsistencies in the cache data, these terms are:

3. Dirty Data :-

When data is modified within cache but not modified in main memory,

the data in the cache is called “dirty data.”

4. Stale Data :-

When data is modified within main memory but not modified in cache,

the data in the cache is called stale data.

HISTORY OF CACHE MEMORY:-

In early PCs, the various components had one thing in common: they were all really slow . The processor was running at 8 MHz or less, and taking many clock cycles to get anything done. It wasn't very often that the processor would be held up waiting for the system memory, because even though the memory was slow, the processor wasn't a speed demon either. In fact, on some machines the memory was faster than the processor.

In the 15 or so years since the invention of the PC, every component has increased in speed a great deal. However, some have increased far faster than others. Memory, and memory subsystems, are now much faster than they were, by a factor of 10 or more. However a current top of the line processor has performance over 1,000 times that of the original IBM PC!

This disparity in speed growth has left us with processors that run much faster than everything else in the computer. This means that one of the key goals in modern system design is to ensure that to whatever extent possible, the processor is not slowed down by the storage devices it works with. Slowdowns mean wasted processor cycles, where the CPU can't do anything because it is sitting and waiting for information it needs. We want it so that when the processor needs something from memory, it gets it as soon as possible.

The best way to keep the processor from having to wait is to make everything that it uses as fast as it is. Wouldn't it be best just to have memory, system buses, hard disks and CD-ROM drives that just went as fast as the processor? Of course it would, but there's this little problem called "technology" that gets in the way.

Actually, it's technology and cost; a modern 2 GB hard disk costs less than $200 and has a latency (access time) of about 10 milliseconds. You could implement a 2 GB hard disk in such a way that it would access information many times faster; but it would cost thousands, if not tens of thousands of dollars. Similarly, the highest speed SRAM available is much closer to the speed of the processor than the DRAM we use for system memory, but it is cost prohibitive in most cases to put 32 or 64 MB of it in a PC.

There is a good compromise to this however. Instead of trying to make the whole 64 MB out of this faster, expensive memory, you make a smaller piece, say 256 KB. Then you find a smart algorithm (process) that allows you to use this 256 KB in such a way that you get almost as much benefit from it as you would if the whole 64 MB was made from the faster memory. How do you do this? The short answer is by using this small cache of 256 KB to hold the information most recently used by the processor. Computer science shows that in general, a processor is much more likely to need again information it has recently used, compared to a random piece of information in memory. This is the principle behind caching.

CACHE ARCHITECTURE:-

Caches have two characteristics , a read architecture and a write policy. The read architecture may be either “Look Aside” or “Look Through.” The write policy may be either “Write-Back” or “Write-Through.” Both types of read architectures may have either type of write policy,depending on the design. Write policies will be described in more detail in the next section. Lets examine the read architecture now.

3.1 Look Aside :-

Figure shows a simple diagram of the “look aside “cache architecture. In this diagram, main memory is located opposite the system interface. The discerning feature of this cache unit isthat it sits in parallel with main memory. It is important to notice that both the main memory and the cache see a bus cycle at the same time. Hence the name “look aside.”

· Look Aside Cache Example :-

When the processor starts a read cycle, the cache checks to see if that address is a cache hit.

§ HIT:-

If the cache contains the memory location, then the cache will respond to the read cycle and terminate the bus cycle.

§ MISS:-

If the cache does not contain the memory location, then main memory will respond to the processor and terminate the bus cycle. The cache will snarf the data, so next time the processor requests this data it will be a cache hit. Look aside caches are less complex, which makes them less expensive. This architecture also provides better response to a cache miss since both the DRAM and the cache see the bus cycle at the same time. The draw back is the processor cannot access cache while another bus master is accessing main memory.

3.2 Read Architecture: Look Through :-

Figure shows a simple diagram of cache architecture. Again, main memory is locatedopposite the system interface. The discerning feature of this cache unit is that it sits between the processor and main memory. It is important to notice that cache sees the processors bus cycle before allowing it to pass on to the system bus.

· Look Through Read Cycle Example :-

When the processor starts a memory access, the cache checks to see if that address is a cache hit.

§ HIT:-

The cache responds to the processor’s request without starting an access to main memory.

§ MISS:-

The cache passes the bus cycle onto the system bus. Main memory then

responds to the processors request. Cache snarfs the data so that next time the processor requests this data, it will be a cache hit. This architecture allows the processor to run out of cache while another bus master is accessing main memory, since the processor is isolated from the rest of the system. However, this cache architecture is more complex because it must be able to control accesses to the rest of the system. The increase in complexity increases the cost. Another down side is that memory accesses on cache misses are slower because main memory is not accessed until after the cache is checked. This is not an issue if the cache has a high hit rate and their are other bus masters.

3.3 Write Policy:

A write policy determines how the cache deals with a write cycle. The two common write policies are Write-Back and Write-Through. In Write-Back policy, the cache acts like a buffer. That is, when the processor starts a write cycle the cache receives the data and terminates the cycle. The cache then writes the data back to main memory when the system bus is available. This method provides the greatest performance by allowing the processor to continue its tasks while main memory is updated at a later time. However, controlling writes to main memory increase the cache’s complexity and cost. The second method is the Write-Through policy. As the name implies, the processor writes through the cache to main memory. The cache may update its contents, however the write cycle does not end until the data is stored into main memory. This method is less complex and therefore less expensive to implement. The performance with a Write-Through policy is lower since the processor must wait for main memory to accept the data.

3.4 Cache Components:-

The cache sub-system can be divided into three functional blocks: SRAM, Tag RAM, and the Cache Controller. In actual designs, these blocks may be implemented by multiple chips or all may be combined into a single chip.

· SRAM :-

Static Random Access Memory (SRAM) is the memory block which holds the data. The size of the SRAM determines the size of the cache.

· Tag RAM :-

Tag RAM (TRAM) is a small piece of SRAM that stores the addresses of the data that is stored in the SRAM.

· Cache Controller :-

The cache controller is the brains behind the cache. Its responsibilities include: performing the snoops and snarfs, updating the SRAM and TRAM and implementing the write policy. The cache controller is also responsible for determining if memory request is cacheable2 and if arequest is a cache hit or miss.

v Cache Organization

In order to fully understand how caches can be organized, two terms need to be defined. These terms are cache page and cache line. Lets start by defining a cache page. Main memory is divided into equal pieces called cache pages3. The size of a page is dependent on the size of the cache and how the cache is organized. A cache page is broken into smaller pieces, each called a cache line. The size of a cache line is determined by both the processor and the cache design. Figure shows how main memory can be broken into cache pages and how each cache page is divided into cache lines. We will discuss cache organizations and how to determine the size of a cache page in the following sections.

4.1 Fully-Associative :-

The first cache organization to be discussed is Fully-Associative cache. Figure shows a diagram of a Fully Associative cache. This organizational scheme allows any line in main memory to be stored at any location in the cache. Fully-Associative cache does not use cache pages, only lines. Main memory and cache memory are both divided into lines of equal size. For example Figure 2-5 shows that Line 1 of main memory is stored in Line 0 of cache. However this is not the only possibility, Line 1 could have been stored anywhere within the cache. Any cache line may store any memory line, hence the name, Fully Associative. A Fully Associative scheme provides the best performance because any memory location can be stored at any cache location. The disadvantage is the complexity of implementing this scheme. The complexity comes from having to determine if the requested data is present in cache. In order to meet the timing requirements, the current address must be compared with all the addresses present in the TRAM. This requires a very large number of comparators that increase the complexity and cost of implementing large caches. Therefore, this type of cache is usually only used for small caches, typically less than 4K.

4.2 Direct Mapping :-

Direct Mapped cache is also referred to as 1-Way set associative cache. Figure shows a diagram of a direct map scheme. In this scheme, main memory is divided into cache pages. The size of each page is equal to the size of the cache. Unlike the fully associative cache, the direct map cache may only store a specific line of memory within the same line of cache. For example, Line 0 of any page in memory must be stored in Line 0 of cache memory. Therefore if Line 0 of Page 0 is stored within the cache and Line 0 of page 1 is requested, then Line 0 of Page 0 will be replaced with Line 0 of Page 1. This scheme directly maps a memory line into an equivalent cache line, hence the name Direct Mapped cache. A Direct Mapped cache scheme is the least complex of all three caching schemes. Direct Mapped cache only requires that the current requested address be compared with only one cache address. Since this implementation is less complex, it is far less expensive than the other caching schemes. The disadvantage is that Direct Mapped cache is far less flexible making the performance much lower, especially when jumping between cache pages

· Merits of direct mapping:

It is the simplest type of mapping.The hardware required is very simple since the tag of only 1 cache line matched with the TAG field in the given memory address. The cache controller quickly releases hit or miss information.

· Demarits of direct mapping:-

There is no flexibility in mapping system.A given memory block is tied to a fix cache line if 2 frequently accessed blocks happen To be mapped to same cache line. Hit ratio is poor resulting in frequent replacement of cache lines.This slows down program execution.

4.3 Set Associative :-

A Set-Associative cache scheme is a combination of Fully-Associative and Direct Mapped caching schemes. A set-associate scheme works by dividing the cache SRAM into equal sections (2 or 4 sections typically) called cache ways. The cache page size is equal to the size of the cache way. Each cache way is treated like a small direct mapped cache. To make the explanation clearer, lets look at a specific example. Figure shows a diagram of a 2-Way Set- Associate cache scheme. In this scheme, two lines of memory may be stored at any time. This helps to reduce the number of times the cache line data is written-over? This scheme is less complex than a Fully-Associative cache because the number of comparitors is equal to the number of cache ways. A 2-Way Set-Associate cache only requires two comparitors making this scheme less expensive than a fully-associative scheme

· Merits of block-set associative cache:-

1. compared to the direct mapping multiple choices are available for mapping a memory bloak.The no. of options depends on the size of k.so, better flexibility is provided

2. During reading the tag matching is limited to the no of lines in the set .Hence search is only within a set, unlike in associative mapping where search is for the entire cache memory.

· Demarits of set associative mapping:-

Its implementation cost is more than direct mapping and cheaper than associative mapping.

4.4 Cache Updating :-

With a Cache system, at least two versions of the same data exist in the system, one in the Main Memory or on the Hard Disk Drive, and the other in the Cache. If the Processor is reading data then this is not a problem because both copies are the same. A problem can occur however when the Processor writes to memory and the data in the Cache only is updated. Any other system components that have access to the memory will read old data and in the case of a Hard Disk Drive Cache, a power failure would result in the loss of current data. There are several methods of processing "writes" and many chipsets and BIOS configurations allow the user to choose between several options.

4.5 Write-Through :-

With a Write-Through system each write to memory that scores a Cache hit causes the Cache controller to update the Cache and then the corresponding memory location immediately. This method is the safest in terms of data security but it is the slowest because every write operation is in effect a Cache miss.

4.6 Buffered Write Through or Posted-Write :-

This technique is similar to Write-Through except the Cache controller releases the bus to the Processor after the Cache has been updated. The Cache controller then proceeds to update the corresponding main memory location. If the Processor follows the write with a read that scores a Cache hit then that read can proceed simultaneously with the Cache controllers write (posted write). If the processor follows the write by another write, or a read that scores a Cache miss, Then the processor will have to wait until the Cache controller has finished updating the main memory. This option should provide a slight performance improvement over a write-through operation and is quite safe.

4.7 Write-Back :-

With a Write-Back Cache, writes from the Processor only go to the Cache and the Cache Controller will only update the main memory during periods of bus inactivity or when the Cache contents are to be replaced. This technique is fast in terms of letting the Processor get on with what is there to do but it is also risky in terms of data loss. The main memory does not necessarily match the Cache until an update has occurred.

4.8 Write-Back with Dirty Bit :-

This option is similar to the Write-Back option except that each location has a bit in the TAG RAM that is called a Dirty Bit. This bit is set whenever data is written to the Cache. When a Cache line is to be replaced, the Cache controller will only update the main memory locations for that line if the Dirty Bit is set. If the Dirty Bit is not set then the line in the Cache has not been rewritten and there is no need to write it back.

If a system offers both Write-Back and Write-Back with Dirt Bit then the latter technique will be the fastest option.

5 Cache Characteristics :-

This section discusses the different features of the level 2 cache. These are the characteristics you will normally need to understand when making a motherboard selection, or upgrading the cache in your existing system. Some of the descriptions in this section are explained in much more detail in Function and Operation of the System Cache. The focus of this page is on the higher-level performance aspects of the various cache features.

6 Cache Speed :-

There is no single number that dictates completely the "speed" of the system cache. Instead, we must consider the raw speed of the components used, as well as how the circuitry chooses to use them. These considerations are identical to how they are when looking at the system RAM itself; saying "my RAM is 60 ns" tells only a small part of the story.

The "raw" speed of the cache is the speed of the RAM chips used to make it. Caches are normally made from static RAM chips (SRAM), unlike main system memory which is made from dynamic RAM (DRAM). The short version of the difference between the two, is that static RAM is faster but also more expensive. The access speed of SRAMs are normally rated in the tens of nanoseconds. SRAMs normally have a speed of 7 to 20 ns; DRAMs on the other hand are usually 50 to 70 ns.

The speed of the SRAM chips gives the upper bound on performance. It is up to the motherboard and chipset designer to make full use of the speed. Let's consider a Pentium motherboard with a memory bus speed running at 66 MHz. This means 66.66 million cycles per second; if we take the reciprocal of this it gives the cycle time, which is 15 nanoseconds (1 divided by 66 million). In order for the motherboard to be able to read from the cache in one cycle at this speed, the SRAM must be faster than 15 ns in speed (there is some overhead time as well so exactly 15 ns won't work). If the SRAM is faster than this, there will be no additional benefit; if it is slower, timing problems will occur, which usually manifest themselves as memory errors and system lockups.

The tag RAM used as part of the cache must normally be faster than the actual cache data store. This is because the tag RAM must be read first to check for a cache hit. We want to be able to check the tag and still have enough time to read the cache within a single clock cycle, if we have a hit. So for example, yo may find that your system's main cache chips are 15 ns, while the tag may be 12 ns. The more complicated the cache mapping technique, the more important the difference in speed between the tag and the data store. Simple techniques like direct mapping don't generally require much difference at all. Your system may use the same speed for all the memory in this case; for example, if the system needs 15 ns for the tag and 16 ns for the data store, the motherboard may just specify 15 ns for everything since this is simpler. In any event, if your motherboard doesn't already come with the level 2 cache on it, you should buy for it whatever the motherboard manual or your dealer specifies.

The true speed of any cache, in terms of how quickly it really transfers information to and from the processor so that you get faster speed in your applications, is dependent on the cache controller and other chipset circuits. The capabilities of the chipset determine what kind of transfer technologies your cache can use. This in turn determines your cache's optimal system timing, the number of clock cycles required to move data in and out of the cache. This is discussed in detail in this section.

The performance of the cache obviously also is greatly dependent on the speed that the cache subsystem is running at. In a typical Pentium machine this is the speed of the memory bus, 66 MHz. However a Pentium Pro processor has an integrated level 2 cache, which runs at full processor speed, normally 180 or 200 MHz. Obviously, this will yield superior performance! The Intel Pentium II uses instead a daughterboard cache with level 2 caches running at half the processor speed, which with a 233 or 266 MHz chip will still mean much better performance than running the cache at 66 MHz.

7 Cache Size :-

The size of the cache normally refers actually to the size of the data store, where the memory elements are actually stored. A typical PC level 2 cache is either 256 KB or 512 KB, but can be as small as 64 KB on older machines, or as high as 1 MB or even 2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB.

The more cache the system has, the more likely it is to register a hit on a memory access, because fewer memory locations are forced to share the same cache line. Let's use an example to illustrate (the same one we used when we discussed cache operation in detail.). We have a system with 64 MB of memory and 512 KB of direct-mapped cache, arranged into 32-byte cache lines. This means that we have 16,384 cache lines (512 K divided by 32). Each line is shared by 4,096 memory addresses (64 MB divided by 16,384). Now if we increase the amount of cache to 1 MB, we will have 32,768 cache lines, and each will only be shared by 2,048 addresses. Conversely, if we leave the cache at 512 KB but increase the system memory to 256 MB, each of the 16,384 cache lines will be shared by 16,384 addresses.

There are many areas in the computer world where Pareto's Law applies, and cache size is definitely one of them. If you have a 256 KB cache on a system using 32 MB, increasing the cache by 100% to 512 KB will probably result in an increase in the hit ratio of less than 10%. Doubling it again will likely result in an increase of less than 5%. In the real world, this differential is not noticeable to most people. However, if you greatly increase the amount of system memory you use, you will probably want to up your cache total as well to prevent a degradation in performance. Just make sure you watch closely the system RAM cacheability issue.

8 System RAM Cacheability :-

This is one of the most misunderstood aspects of the caching equation. The amount of RAM that the system can cache is very important if you are going to be using a lot of system memory. Almost all modern fifth generation systems can cache 64 MB of system memory. However, many systems, even newer ones, cannot cache more than 64 MB of memory. Intel's popular 430FX ("Triton I"), 430VX (one of the "Triton II"s, also called "Triton III") and 430TX chipsets, do not cache more than 64 MB of main memory. There are millions and millions of these PCs on the market.

If you put more memory in a system than can be cached, the result is a performance decrease. The speed differential between the cache and memory is significant; that's why we use it. : When some of that memory is not cached, the system must go to memory for every access to that uncached memory, which is much slower. In addition, when using a multitasking operating system (pretty much anything other than DOS these days) you can't really control what ends up in cached memory and what ends up in non-cached memory, unless you really know what you are doing.

The keys to how much memory your system can cache are first, the design of the chipset, and second, the width of the tag RAM. The more memory you have, the more address lines you need to specify an address. This means that you have more address bits to store in the tag RAM to use in order to check for a cache hit. Of course if the chipset isn't designed to cache more than 64 MB, an extra wide tag RAM won't help anyway.

Let's take our standard example again; 64 MB of memory, 512 KB cache, 32-byte cache lines. As we described in detail in this section, 64 MB means 26 address lines (A0 to A25); A0 to A4 specify the byte in the cache line, A5 to A18 specify the cache line, and A19 to A25 go into the tag RAM to specify which memory address is currently using the cache line. That's 7 bits; let's say our tag RAM is 8 bits wide, and we are reserving one bit for the "dirty bit", to allow write-back operation of the cache. So we're fine, we have enough tag memory in the cache. Now, suppose we add another 32 MB of memory. To address 96 MB you need another address line, A26, to be held in the tag RAM. Hmm, we have a problem, because now we need 9 bits in our tag RAM and it only has 8.

The only mainstream Pentium chipset to support caching over 64 MB is the 430HX "Triton II" chipset by Intel. In actual fact, caching over 64 MB on this chipset is considered "optional"; the motherboard manufacturer has to make sure to use an 11-bit tag RAM instead of the default 8-bit. The extra 3 bits increase cacheability from 64 MB to 512 MB (2^3=8, and 64*8=512).

Many people confuse the issue of system RAM size and system RAM cacheability. The common thought is that adding more cache will let you cache more RAM, but you can see that really it is the tag RAM and chipset that controls this. Further complicating the matter is that some companies put extra tag RAM on their COASt modules. So a user will insert a 256 KB COASt module, and think that increasing his cache let him cache more system memory, when really it was the extra tag RAM that did it.

Pentium PCs use an integrated level 2 cache that contains the tag RAM within it, so none of this is really a concern for these machines. The Pentium Pro will cache up to 4 GB of main memory, basically anything you can throw at it. The Pentium II uses an SEC daughtercard. It has the same general architecture as the Pentium Pro, but due to a design limitation will "only" cache up to 512 MB. This isn't nearly as much of an issue as a 64 MB barrier, but considering that the PII is used in many high-end applications, this might be a concern for some people.

One question that people ask a lot is: "How much will the system slow down if I have more RAM in it than can be cached?" There is no easy answer to this question, because it depends both on the system and what you are doing with it. Somewhere between 5% and 25% is most likely, but you should bear something else in mind: adding real physical memory to the system is one way to avoid the extreme slowdown to the system that occurs when it runs out of real memory and must use virtual memory. If you are doing heavy multitasking and notice that the system is thrashing, you will always be better off to have more memory, even uncached, instead of having the system swap a great deal to disk. Of course having all the memory cached is still preferred.

v TYPES OF CACHE MEMORY

Pronounced cash, a special high-speed storage mechanism. It can be either a reserved section of main memory or an independent high-speed storage device. Two types of caching are commonly used in personal computers:

1. memory caching

2. disk caching.

· MEMORY CACHING:-

A memory cache, sometimes called a cache store or RAM cache, is a portion of memory made of high-speed static RAM (SRAM) instead of the slower and cheaper dynamic RAM (DRAM) used for main memory. Memory caching is effective because most programs access the same data or instructions over and over. By keeping as much of this information as possible in SRAM, the computer avoids accessing the slower DRAM.

Some memory caches are built into the architecture of microprocessors. The Intel 80486 microprocessor, for example, contains an 8K memory cache, and the Pentium has a 16K cache. Such internal caches are often called Level 1 (L1) caches. Most modern PCs also come with external cache memory, called Level 2 (L2) caches. These caches sit between the CPU and the DRAM. Like L1 caches, L2 caches are composed of SRAM but they are much larger.

· DISK CACHING:-

Disk caching works under the same principle as memory caching, but instead of using high-speed SRAM, a disk cache uses conventional main memory. The most recently accessed data from the disk (as well as adjacent sectors) is stored in a memory buffer. When a program needs to access data from the disk, it first checks the disk cache to see if the data is there. Disk caching can dramatically improve the performance of applications, because accessing a byte of data in RAM can be thousands of times faster than accessing a byte on a hard disk.

5.1 "Layers" of Cache:-

There are in fact many layers of cache in a modern PC. This does not even include looking at caches included on some peripherals, such as hard disks. Each layer is closer to the processor and faster than the layer below it. Each layer also caches the layers below it, due to its increased speed relative to the lower levels:

Level	Devices Cached
Level 1 Cache	Level 2 Cache, System RAM, Hard Disk / CD-ROM
Level 2 Cache	System RAM, Hard Disk / CD-ROM
System RAM	Hard Disk / CD-ROM
Hard Disk / CD-ROM	--

What happens in general terms is this. The processor requests a piece of information. The first place it looks is in the level 1 cache, since it is the fastest. If it finds it there (called a hit on the cache), great; it uses it with no performance delay. If not, it's a miss and the level 2 cache is searched. If it finds it there (level 2 "hit"), it is able to carry on with relatively little delay. Otherwise, it must issue a request to read it from the system RAM. The system RAM may in turn either have the information available or have to get it from the still slower hard disk or CD-ROM. The mechanics of how the processor (really the chipset controlling the cache and memory) "look" for the information in these various places is discussed here.

It is important to realize just how slow some of these devices are compared to the processor. Even the fastest hard disks have an access time measuring around 10 milliseconds. If it has to wait 10 milliseconds, a 200 MHz processor will waste 2 million clock cycles! And CD-ROMs are generally at least 10 times slower. This is why using caches to avoid accesses to these slow devices is so crucial.

Caching actually goes even beyond the level of the hardware. For example, your web browser uses caching itself, in fact, two levels of caching! Since loading a web page over the Internet is very slow for most people, the browser will hold recently-accessed pages to save it having to re-access them. It checks first in its memory cache and then in its disk cache to see if it already has a copy of the page you want. Only if it does not find the page will it actually go to the Internet to retrieve it.

q Level 1 (Primary) Cache:-

Level 1 or primary cache is the fastest memory on the PC. It is in fact, built directly into the processor itself. This cache is very small, generally from 8 KB to 64 KB, but it is extremely fast; it runs at the same speed as the processor. If the processor requests information and can find it in the level 1 cache, that is the best case, because the information is there immediately and the system does not have to wait. The level 1 cache is discussed in more detail here, in the section on processors.

ü Level 1 cache is also sometimes called "internal" cache since it resides within the processor.

q Level 2 (Secondary) Cache:-

The level 2 cache is a secondary cache to the level 1 cache, and is larger and slightly slower. It is used to catch recent accesses that are not caught by the level 1 cache, and is usually 64 KB to 2 MB in size. Level 2 cache is usually found either on the motherboard or a daughterboard that inserts into the motherboard. Pentium Pro processors actually have the level 2 cache in the same package as the processor itself (though it isn't in the same circuit where the processor and level 1 cache are) which means it runs much faster than level 2 cache that is separate and resides on the motherboard. Pentium II processors are in the middle; their cache runs at half the speed of the CPU.

ü Level 2 cache is also sometimes called "external" cache since it resides outside the processor.

q Hard Disk Cache:-

· Software Hard Disk Caching :–

A Software Hard Disk Cache is provided by Software operating under the Operating System and using some of the System RAM to speed up access to information stored on a Hard Disk (or CDROM Drive). It does this by storing frequently used information in fast access RAM rather than slow access Hard Disk Drives. The RAM used by the Cache routine today is in Expanded Memory. The most commonly used Disk Cache for DOS and Windows 3.xx was Smartdrive, supplied as part of DOS and Windows. After market products were also available.

Although not directly tied to the internal DOS architecture Software Cache products operate in fast System Memory close to the CPU and access this memory 32 bits or 64 bits at a time, at the Processor Bus speed.

Operating Systems such as Unix and NetWare also implement intelligent Hard Disk Caches under direct Operating System control. An Operating System might cache directory information or queue disk requests for intelligent re-ordering of physical read/write requests. In addition, some Operating Systems can dynamically adjust the cache size according to current operating requirements (for example, reducing disk cache buffers and increasing communications buffers).

· Hardware Hard Disk Caching –

Software Hard Disk Caches have two problems. Some of the System RAM must be assigned to the Cache, reducing the amount of RAM available to the Operating System and Applications, and the System CPU must service the Cache Routine, reducing the time it can spend on carry out tasks for the Operating System and the Applications. A Hardware Hard Disk Cache Controller is on a card plugged into a Bus Slot and has it's own Processor and RAM.

Hardware Hard Disk Controllers are expensive, often difficult to install, and can't be readily disabled like Software Caches. The Disk Cache Controller has its own Processor chip and its own RAM devices. Separate Cache subsystems, unlike Software Caches, help the System CPU in disk management tasks without taking up memory required by applications or the Operating System. Many Hard Disk Caching Controllers allow CPU access to data in the controller's cache memory while the controller performs write or read-ahead operations with the Disk Drive. Depending on the application, an Operating System or third-party software cache system combined with a Hard Disk Caching Controller may provide the ultimate disk performance. This configuration is commonly used in high performance Server computers where the speed of data throughput is most important.

Disk Caching reduces the time required to move data between the Hard Disk Drive and the Processor. When accessing data from a Hard Disk Drive, delays in regulating disk buffer transfers, including command formulation and issuance, head seeks, disk rotation, head settling, reading/writing data, moving data from the Operating System to Hard Disk, and the overhead in managing and moving the data in and out of the Processor, accumulate and slow performance.

Both IDE and SCSI Hard Drive Controller cards are available with Hardware Cache on board. Most Cached Hard Disk Drive controllers can have between 1Meg and 16Mbyte of RAM installed and they use common SIMM RAM. The cost of an Enhanced IDE, hardware cached controller, is of the order of $150 to $200 plus memory and a SCSI device is in the range $400 to $1000.

TO DOWN LOAD REPORT AND PPT
DOWNLOAD

SEMINARS AND PROJECTS

Cache Memory

1.1 Cache Memory Systems:-

4.4 Cache Updating :-

q Hard Disk Cache:-

Pages

Labels

Followers

About Me