I’m interested in doing computer architecture research where I apply techniques from classical algorithms and data structures. I’m not yet quite sure what this entails, but it will probably have something to do with accelerators, compilers, and/or hardware-software co-design. Ideally, I would be able to focus on maximizing efficiency, free from the limitations of existing architectures.

Previously, I worked with Nikola Samardzic on accelerating Fully Homomorphic Encryption—encryption that allows running programs on secret data without decrypting it.

CraterLake: A Hardware Accelerator for Efficient Unbounded Computation on Encrypted Data

Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Nathan Manohar, Nicholas Genise, Srinivas Devadas, Karim Eldefrawy, Chris Peikert, Daniel Sanchez

CraterLake is the state-of-the-art hardware accelerator for FHE, providing speedups of 5,000× over CPU on a broad range of applications. Building upon F1’s functional units, CraterLake introduces a novel architecture that significantly reduces on- and off-chip data movement.

I came up with the way computation is distributed across the chip (Sec. 4), designed the on-chip network (Sec. 5.3), and designed the KeySwitch hint generator (Sec. 5.2).

F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption

Axel Feldmann*, Nikola Samardzic*, Aleksandar Krastev, Srini Devadas, Ron Dreslinski, Christopher Peikert, Daniel Sanchez

F1 was our initial proposal for an FHE accelerator. F1 proposes novel high-throughput FHE functional units, but suffers from excessive on-chip data movement that prevents it from scaling to large FHE application. As a result, F1 is about 5,000× faster than a CPU on small applications (similar to CraterLake), but only 400× faster on large ones (11× slower than CraterLake).

I designed the first SRAM-only, fully-pipelined transpose unit (Sec. 5.1), which is a crucial component of F1’s novel FFT and automorphism units.