Kevin Zakka

On a quest to make household robots a reality 🧠🤖🧹

I'm currently a student researcher at Google Brain, fortunate to be advised by Debidatta Dwibedi.
2019 - 2021
I'm also a second-year Computer Science Master's student at Stanford University, specializing at the intersection of machine learning and robotics. My research centers around robotic perception and manipulation. In particular, I'm exploring ways in which robots can self-acquire generalizable representations that are useful and efficient for manipulation.

I'm passionate about teaching and communicating things clearly. I've TA'd one of the largest and most popular deep learning classes at Stanford, CS 231n: Convolutional Neural Networks for Visual Recognition. As part of an initiative to make the course "compute-free" and accessible to students, I led the development effort of porting all course assignments to Google Colaboratory (think Jupyter notebook with free GPUs).
AI resident at X - the Moonshot Factory, working on the Everyday Robot Project.
I interned in Johnny Lee's group in the Brain Robotics team at Google. I was mentored by Andy Zeng and Shuran Song, and spent the summer building Form2Fit, a robo-kitting solution that can generalize to new objects and kits. This was my first foray into research and I'm deeply grateful to Shuran, Andy and Johnny for being the best mentors one could ask for.

Form2Fit was featured on the Google AI Blog and was a finalist for Best Paper Award in Automation at ICRA 2020.
2014 - 2018
B.Eng. at the American University of Beirut, in my home country of Lebanon ❤️, majoring in Electrical Engineering.

I completed two internships during my undergrad stint. In the summer of 2017, I was a visiting researcher in the Khuri-Yakub Ultrasonics Group at Stanford, developing machine learning solutions for various applications involving capacitive micromachined ultrasound transducers, also known as CMUTs. And in 2018, I interned at Nimble AI, where I built the infrastructure for training and deploying various real-time grasping algorithms for suction and parallel-jaw grasping.
Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly
Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song,
ICRA 2020, Best Paper Award in Automation Finalist
project page / blog post / arXiv / code / slides

We leverage visual geometric shape descriptors in the kit assembly task, with a nifty self-supervised data collection pipeline based on time-reversed disassembly, to create Form2Fit, a robotic system that can assemble novel objects and kits.

side stuff
torchkit is a lightweight library containing PyTorch utilities useful for day-to-day research. Its main goal is to abstract away a lot of the redundant boilerplate associated with research projects like experimental configurations, logging and model checkpointing.
walle is a general-purpose robotics library I wrote and use in my day to day research. It features a unified API for dealing with position and orientation in 3D space, an extendable API for streaming 3D data from Intel Realsense RGB-D sensors, and an API for transforming to and from different 3D representations like point clouds and orthographic heightmaps.
learn-linalg is a numerical linear algebra library I wrote from scratch as a learning exercise. It implements useful decompositions (LU, Cholesky, QR, SVD) as well as eigen algorithms (power, inverse, projected, qr). It is written entirely in Python and tested against tried and true numpy equivalents.
torchnca is a PyTorch-based (i.e. gpu-accelerated) dimensionality reduction package that implements the Neighbourhood Components Analysis algorithm of Goldberger et al. with a few tweaks to make it more stable. Since I couldn't find any tutorial or reference outside academic papers, and because I find it to be an extremely elegant algorithm, I wrote about it in a detailed blog post.
pyphoxi is a lightweight Python API for streaming RGB-D data from the PhoXi 3D stuctured light sensor.
hypersearch is a hyperparameter optimization library for PyTorch. It is based on the Hyperband algorithm by Li et al., and uses random search coupled with adaptive resource allocation and early-stopping to select the best-performing configuration.
I've re-implemented some of my favorite papers in the deep learning litterature over the last few years. Implementations like Spatial Transformer Networks and Recurrent Models of Visual Attention have garnered over 690 and 350 GitHub ★ respectively. I think implementing algorithms from scratch is a great way of building intuition for why things work so I genuinely recommend it as an exercise for students.

Big thanks to Andrej Karpathy for letting me use his website template!
Thanks to Jon Barron for making his website's source code free to use - I incorporated a chunk to create the publication section above.
And thanks to Jimmy Wu for tidying my HTML loose ends.