PageCache和BufferCache

The page cache caches pages of files to optimize file I/O. The buffer cache caches disk blocks to optimize block I/O.

Prior to Linux kernel version 2.4, the two caches were distinct: Files were in the page cache, disk blocks were in the buffer cache. Given that most files are represented by a filesystem on a disk, data was represented twice, once in each of the caches. Many Unix systems follow a similar pattern.

This is simple to implement, but with an obvious inelegance and inefficiency. Starting with Linux kernel version 2.4, the contents of the two caches were unified. The VM subsystem now drives I/O and it does so out of the page cache. If cached data has both a file and a block representation—as most data does—the buffer cache will simply point into the page cache; thus only one instance of the data is cached in memory. The page cache is what you picture when you think of a disk cache: It caches file data from a disk to make subsequent I/O faster.

The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.

I'm glad that Robert Love provided a typically excellent answer. Instead of trying to cover the same ground that he does, I'll try to go back a bit further - before Linux, even. I know that might seem odd in a question about Linux, but Linux developers do talk to others and so on matters of history the context around other systems is IMO still relevant.

In the beginning was the buffer cache. It was a very simple concept: behave exactly like a disk, but avoid the actual disk request/transfer at the last moment. This was actually a really powerful idea, which improved performance on practically every workload (except those that were very CPU-bound, and network-bound wasn't an operative concept at the time) by a great deal. Who could ask for anything more?

Later, some people got the idea that moving the cache higher would be even better. Why burn cycles going through all that complicated filesystem code when you could just maintain a simple file+offset hash table? Also, implementing a cache above the filesystem meant that it could be more tightly integrated with the memory-management subsystem, which was an active area of development at the time. Thus, the answer to your first question: they were separate because they were developed at different times to serve different needs, and nobody wanted to muck with the buffer cache at first.

Still later, people realized that you could unify the caches to avoid the double-caching issue that Robert mentions. I wish I could find a good link to some of the stuff that was written about how Solaris 7 and 8 kept changing the way they dealt with this, because the amount of complexity they added was almost comical. The reasons for having the buffer cache point into the page cache instead of vice versa are two-fold.

Not every filesystem is backed by disk. Some are backed by memory, or operate over a network. Having the page cache split between stuff that points into the buffer cache and stuff that's handled natively is a bit inelegant.
Because the page cache is supposed to absorb far more hits, it makes sense for its addressing method and other semantics to be primary.

The buffer cache does still exist for things that aren't in files, but it's now a shadow of its former glory. A system without a buffer cache nowadays would be slower, but that's nothing compared to the crippling effect that removing the page cache would have.

转自：https://www.quora.com/What-is-the-major-difference-between-the-buffer-cache-and-the-page-cache-Why-were-they-separate-entities-in-older-kernels-Why-were-they-merged-later-on

posted @ 2019-10-24 16:51 CobbLiu 阅读(487) 评论(0) 收藏举报

刷新页面返回顶部

CobbLiu

Touching the void.

PageCache和BufferCache

公告