Research
I’m interested in doing computer architecture research where I apply techniques from classical algorithms and data structures. I’m not yet quite sure what this entails, but it will probably have something to do with accelerators, compilers, and/or hardware-software co-design. Ideally, I would be able to focus on maximizing efficiency, free from the limitations of existing architectures.
Previously, I worked with Nikola Samardzic on accelerating Fully Homomorphic Encryption—encryption that allows running programs on secret data without decrypting it.
A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption
Aleksandar Krastev*, Nikola Samardzic*, Simon Langowski, Srinivas Devadas, Daniel Sanchez
Fhelipe is an FHE compiler exposing an easy-to-use tensor programming interface. Fhelipe simplifies programming by abstracting data layouts and noise management, while achieving great performance via two key contributions: a novel bit-permutation tensor layout representation and a novel bootstrap placement algorithm. Fhelipe is the first compiler to match the performance of large hand-optimized FHE applications, outperforming prior compilers by gmean 18.5×.
I designed Fhelipe’s layout representation, helped design all algorithms in the compiler, and implemented the compiler frontend.
CraterLake: A Hardware Accelerator for Efficient Unbounded Computation on Encrypted Data
Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Nathan Manohar, Nicholas Genise, Srinivas Devadas, Karim Eldefrawy, Chris Peikert, Daniel Sanchez
CraterLake is a state-of-the-art hardware accelerator for FHE, providing speedups of 5,000× over CPU on a broad range of applications. Building upon F1’s functional units, CraterLake introduces a novel architecture that significantly reduces on- and off-chip data movement.
I came up with the way computation is distributed across the chip (Sec. 4), designed the on-chip network (Sec. 5.3), and designed the KeySwitch hint generator (Sec. 5.2).
F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption
Axel Feldmann*, Nikola Samardzic*, Aleksandar Krastev, Srini Devadas, Ron Dreslinski, Christopher Peikert, Daniel Sanchez
F1 was our initial proposal for an FHE accelerator. F1 proposes novel high-throughput FHE functional units, but suffers from excessive on-chip data movement that prevents it from scaling to large FHE application. As a result, F1 is about 5,000× faster than a CPU on small applications (similar to CraterLake), but only 400× faster on large ones (11× slower than CraterLake).
I designed the first SRAM-only, fully-pipelined transpose unit (Sec. 5.1), which is a crucial component of F1’s novel FFT and automorphism units.