06/01 Haystack

Overview

Network File System (NFS) is a distributed file system protocol, allowing a user on a client computer to access files over a computer network much like local storage is accessed.

  • Features of FB users
    • upload much each week
    • visit often
  • Long Tail Issue
    • some popular photos accessed frequently
    • so many photos accessed rarely

Goals of Haystack

  • high throughout low latency
    • provide a good user experience
  • fault-tolerate
    • handle server crashes and hard driver failures
  • cost-effective
    • save money over traditional approaches(reduce reliance on CDNs!)
  • simplicity
    • make it easy to implement and maintain

Features of Old Design

  • each image is stored in its own file
  • enormous amount of metadata (namespace directories and file inodes)
  • the amount of metadata far exceeds the caching abilites of the NFS storage tier, resulting in mulitple I/O operations per photo upload or read requests
  • high degree of reliance on CDNs = expensive

Haystack

Step

Haystack Directory

Main functions

  • provides a mapping from logical volumes to physical volumes.
    • Web servers use this mapping when uploading photos and also when constructing the image URLs for a page request.
  • loads balances writes across logical volumes and reads across physical volumes.
  • determines whether a photo request should be handled by the CDN or by the Cache.
    • This functionality lets us adjust our dependence on CDNs.
  • identifies those logical volumes that are read-only either because of operational reasons or because those volumes have reached their storage capacity. We mark volumes as read-only at the granularity of machines for operational ease.

Haystack Cache

  • distributed hash table, uses photo's id to locate cached data
  • receives HTTP requests for photos from CDNs and also directly from users’ browsers.
    • If photo is in Cache, return the photo
    • If photo is not in Cache, fetches photo from the Haystack Store and returns the photo
  • Add a photo to Cache if two conditions are met:
    • The request comes directly from a user(browser) and not the CDN
      • if come from CDN, CDN could cache it.
    • The photo is fetched from a write-enabled Store machine.
      • which shows that this photo was uploaded recently
      • achieve 80% hit ratio

Haystack Store

  • Read
  • Write
  • Delete
    • Store machine sets the delete flag in both the in memory mapping and in the volumn file

Needle

  • A Store machine represents a physical volume as a large file consisting of a superblock followed by a sequence of needles.
  • Each needle represents a photo stored in Haystack.
  • cookie: security cookie supplied by the client app to prevent brute force attack

Haystack

Question & Discussion

  • Album level abstraction
    • better if photos from the same album are placed sequentially or at least close toghether
  • Privacy concerns
    • Are cookies sufficient protection? Is there a better way?
    • Security level of Facebook?
  • How is consistency maintained between the Haystack and the CDN?

results matching ""

    No results matching ""