note: Enabling Efficient OS Paging for Main-Memory OLTP Databases

Paper Link

In this paper, they found the OS page replacement policy is not good for DBMS workload according to experiments. So they proposed to identify cold/hot data through sampling the access log, then re-organize the data to corresponding memory area. It divide the mmapped area to two parts, first part is Hot Area, second is for Cold data, the Hot Area be mlock-ed into memory.

They implement it on VoltDB. In VoltDB, each object (e.g. partition data table or index) has a continuous VM address space. The re-organization is done in userspace. The memory allocator (in userspace) has two free-list, one holds the free memory blocks on Hot area and one on Cold. To move one tuple, they just deallocate it and reallocate in another area (in userspace). The abstract data-structure will not be affected, it’s relative easy to implement such strategy.

And the experiments shows the performance is almost same to pure in-memory implementation even the data size is 50x physical memory, when the working set can fit into memory.

I have a question is that how calculate working set size for TPCC benchmark?

And this method also gets out the by production, the hot data is more compact now! It reduces the CPU cache miss rate.