The goal of our research is to investigate how to build efficient shared virtual memory on PC and SMP-based clusters by leveraging simple hardware support, lazy protocols, consistency models, memory-mapped communication and application study.
The State of the Art SVM systems support relaxed consistency protocols using low-latency commodity interconnects with various degree of custom hardware support for SVM. One such mechanism is Automatic Update. This is a simple memory-mapped communication mechanism supported by the SHRIMP network interface, in which local writes are forwarded to a remote node's memory in a transparent manner. This mechanism employs hardware to snoop the memory bus for writes to shared data.
Automatic Updates are used to implement Automatic Update Release Consistency (AURC) protocol. Every shared page has a home node, and writes observed to a page are automatically propagated to the home at a fine granularity in hardware. Shared pages are mapped write-through in the caches so that writes appear on the memory bus. When a node incurs a page fault, the page fault handler retrieves the page from the home where it is guaranteed to be up to date. Data are kept consistent according to a page-based software consistency protocol such as lazy release consistency. Thus, consistency in maintained at page granularity, while there is some hardware support for fine-grained communication.
All software versions of the home based protocols have also been developed and demonstrated to perform well. The idea is to compute diffs as in previous all-software protocols, but to propagate the diffs to the home at a release point and apply them eagerly, keeping the home up to date according to lazy release consistency. The result is a protocol very much like the hardware-supported AURC, except that propagation of changes to the home is done either at release or acquire time rather than at the time of the writes themselves. In a preliminary evaluation on the Intel Paragon multiprocessor, this software home-based protocol (called HLRC) was also found to out perform earlier distributed all-software protocols. |