BlowFish is a distributed data store that builds upon Succinct and enables a smooth storage-performance tradeoff between two extremes: uncompressed data with high performance, and compressed data with low performance, allowing fine-grained changes in storage and performance. What makes BlowFish unique is that applications can navigate from one operating point to another along this tradeoff curve dynamically over fine-grained time scales. In many cases, navigating this smooth tradeoff has higher system-wide utility (e.g., throughput per unit of storage) than existing techniques. Intuitively, this is because BlowFish allows applications to increase/decrease the storage “fractionally”, just enough to meet the performance goals.
To achieve this, BlowFish introduces Layered Sampled Array (LSA), a new data structure that stores multiple layers of sampled values. Each combination of layers in LSA correspond to a static configuration of Succinct. Note that, unlike Succinct, BlowFish does not necessarily enforce compression — some combination of layers may end up having storage comparable to systems that store indexes along with input data. BlowFish also allows applications to dynamically adapt to changing loads by navigating along the smooth storage-performance trade-off curve at fine-grained time scales. The navigation is performed by adding and deleting layers in LSA transparently, independent of existing layers and query execution.
Finally, BlowFish adopts techniques from scheduling theory, namely back-pressure style Join-the-shortest-queue mechanism, to resolve several challenges surrounding load balancing and shard management (within and across servers): How should shards on a single server share the available cache? How should shard replicas share requests?