VM and Pagetables, redux.

Several months ago, I wrote:

I've been learning about virtual memory. I guess I'll explain what I know, and if someone wants to correct me, please let me know.

Okay, so, fundamentally, VM is a mapping of virtual pages to “other things” (like physical memory, devices, disks, etc.) What’s a page? 4k of memory. Well, it’s not always 4k, but that’s a common size. The size dictates how many entries in the page table you have, because each page can be mapped to something different and it has to keep track of each page individually.

What’s the point of VM? There are many. Processes shouldn’t be able to access the memory of other processes, and in this method each process gets its own VM address space so that you can’t possibly confuse them. Also, these mappings can go to many places – you can have two processes share the same address space; you can swap pages to disk and then bring them back when somebody needs to use them; you can (as mentioned before) make some pages actually devices like graphics memory, and so forth.

Now that I’m deep into the Virtual Memory assignment of CS169, I can talk about this more. I was basically right, though. The way we’ve implemented it is with “VM Areas” and page tables. VM Areas specify a range of pages that are mapped to a certain vnode (which can be null). We convert from VM Areas to page tables when we “page fault” – which involves instructing the vnode to give us a page of RAM, or creating it out of thin air if the vnode is null. Then we save the physical page in the page table.

The hardware doesn’t look at our page table structure, though; we have to fill the TLB with some page table entries. So when the hardware gets a TLB miss, it asks us to fill it, and we look at the page table, optionally handle a page fault, and write from the page table to the TLB.

One detail I missed was regarding the page table size – if a mapping of every virtual page was stored for every process, it would be large. Instead, we take advantage of the idea that there are usually giant swathes of unmapped memory in the address space – we use a two-level page table. The top level is called the segment table and the next is the page table. We just shift bits to get the segment and page indices, and only allocate the page tables on demand.