note: What’s Really New with NewSQL?

Paper Link

I occasionally found this paper from Andy (LOL) and decide to read it. Reading it is like reading a story book or novel. It introduces the history about disk-oriented DMBS and “New” DBMS w/ distributed architecture. In paper, Andy divided the NewSQL to three categories: DBMS having new architecture (especially distributed architecture), middleware, and cloud database.

In my option, cloud databases are traditionally database but optimized for cloud environment (Amazon Aurora); The middleware is like a coordinator in distributed DMBS context. And the distributed DBMS (especially for short-lived OLTP workload), is the most appealing to me. I don’t like the idea of replication for traditionally DBMS, because it still need to address consistency problem. And I don’t like that Andy uses “2PL” as the example for consensus rather than “Paxos”.

And there are some details I’m curios about distributed DBMS. It’s not about this paper, it’s for general distributed DBMS.

  1. How do they guarantee that for different query go through to different nodes, the results or view are same if some modifications not yet propagate to these nodes. Locking? Even w/ locking, some nodes can be unlock before others.

  2. How MVCC+2PL works in distributed context.

By the way, they mention VoltDB is based on OS virtual memory to support large than memory dataset. A question is whether the mmap(2) is good for in-memory DBMS?