Querying Persistent Graphs using Solid State Storage

Abstract: Recent advances in solid state technology have led to the introduction of Solid State Drives (SSDs). Todays SSDs store data persistently using NAND flash memory. While SSDs are more expensive than hard disks when measured in dollars per gigabyte, they are significantly cheaper when measured in dollars per random I/O per second. Additional storage technologies are under development, Phase Change Memory (PCM) being the next one to enter the marketplace. PCM is nonvolatile, it can be byte-addressable, and in future Multi Level Cell (MLC) devices, PCM is expected to be denser than DRAM. PCM has lower read and write latency compared to NAND flash memory, and it can endure orders of magnitude more write cycles before wearing out. Recent research has shown that solid statedevices can be particularly beneficial for latency-bound applications involving dependent reads. Latency-bound applications like path processing in the context of graph processing or Resource Description Framework (RDF) data processing are typical examples of these applications.We demonstrate via a custom graph benchmark that even an early prototype Phase Change Memory device can offer significant improvements over mature flash devices (1.5x - 2.5x speedup in response times). We take this observation further by building Pythia, a prototype RDF repository tailor-made for Solid State Storage to investigate the predicted benefits for these type of workloads that can be achieved in a properly designed RDF repository. We compare the performance of our repository against the state of the art RDF-3X repository in a limited set of tests and discuss the results. We finally compare the performance of our repository running on a PCM-based device against a state of the art flash device, showing that there is indeed significant gain to be achieved by using PCM jfor RDF processing.

Bio: Bishwaranjan Bhattacharjee (aka Bhatta) is a Senior Technical Staff Member working in the Database Research Group at the IBM T.J.Watson Research Center in Yorktown Heights, New York. He is also a Member of the IBM Academy of Technology.His research interests include data management and its applications. In particular, he is interested in scalable database processing, clustering and indexing techniques, query processing and optimizations, compression, data management in new hardware and variable schema stores.