Global memory management in a workstation cluster

global memory management in a cluster of machines connected by a high-speed LAN
GMS (global memory service)
- local and global page
- algorithm
- how to determine what to replace
  - global information
  - epoch (timeout and M pages to replace)
    - flush TLB
    - assign probabilities (lottery scheduling)
compared to today’s system
- In contrast, each operating system in today’s clusters acts as an autonomous agent, exporting services to other nodes, but not acting in a coordinated way.
For example, global memory management allows the operating system to use cluster-wide memory to avoid many disk accesses

minimize average data reference time
- go to remote memory instead of disks
Why can global memory management benefit application?
- many nodes in LAN are often idle
- use those idle resources on behalf of nodes with active application
- GMS: use idle memory resources

Model:
- local pages and global pages
- global pages are pages stroed on behalf of other nodes
- global pages together constitute “global memory”
- global memory adds another level of memory hierarchy

Three main aspects to implementation
- maintaining page ages
- finding pages in cluster
- making replacement and eviction (逐出) decisions

Various local and distributed data structures are described that GMS uses to locate the cluster node that is currently caching a memory page.
The page ownership directory (POD) is a globally replicated data structure that maps page unique ids to the GCD section containing cache information about the page.
The global cache directory (GCD) stores network location information about cached pages in the cluster. It is a distributed hash table. Each node in the cluster maintains a portion of the table.
The page frame directory(PFD) is a local data structure storing physical page location and age information for pages in a node’s local and global memories

Figure four in the paper demonstrates the three step page lookup process for locating a cached page in the cluster.

Given a page’s unique id, consult the POD to determine what section of the GCD corresponds to it.
Consult the GCD entry (potentially over the network) to determine if the page is cached and where.
The GCD node contacts the node currently caching the page (potentially over the network) and requests that it send the page to the faulting node. The GCD is updated with the new location of the page in the cluster (i.e. the faulting node).

global page ages?
- As times goes on
local page ages?
- File pages easy (app does explicit read/write I/O syscalls)
- Need to track accesses to mapped pages
Use similar trick as in Ozalp’s paper
- Invalidate TLB entries
- TLB misses determine when pages are accessed

There is too much overhead to maintain a global age. Instead, a probabilistic algorithm is used to decide which one to page out.

Common case is replacing oldest page in global memory. There is too much overhead to maintain a global age. Instead, a probabilistic algorithm is used to decide which one to page out.
w = \frac { T} { M}
M pages to evict, T is on each node, how many fall into the oldest M.
w1,w2,...wn are used as weights to choose page out place.
The node with more old pages have greater chance to be the destination. -
The algorithm implemented approximates the ideal, how?
- divide time into epochs
- in an epoch, expect to replace the M oldest pages
- at beginning of epoch, decide how old pages are distributed around cluster (age might change during communication, no need to be accurate)
- send all info to a central node
- each node was W_i fraction of oldest pages
- when a node needs to decide where to evict
  - choose a node randomly weighted by W_i
- an entirely local decision
- after M evictions, on average will evict M oldest pages in cluster

The objective: Convince others and yourself about the idea and the system
Question you should ask yourself
- Given your idea and your system, what kind of would others have about the idea and the system?
- Pick the Grapevine system as an example to design experimental evaluation for them

Global Memory Management