Kevin Zakka

On a quest to make household robots a reality ๐Ÿง ๐Ÿค–๐Ÿงน

I'm a first-year CS PhD student at UC Berkeley, advised by Prof. Pieter Abbeel. My research centers around robotic perception and manipulation. In particular, I'm interested in endowing robots with the flexible ability to learn from rich sources of data, like video demonstrations and natural language instructions.
Student researcher in the Robotics team at Google Brain, advised by Debidatta Dwibedi. I worked on teaching robots from video demonstrations and specifically tackled the challenging case where demonstrator and learner have significant embodiment differences.
I graduated with an MS degree in Computer Science from Stanford University, with a distinction in research for my thesis titled, Self-supervised Visual Learning for Robot Manipulation.

I'm very passionate about teaching and communicating things clearly. For my contributions as head TA for CS 231n, one of the largest and most popular deep learning classes at Stanford, I was awarded the Centennial Teaching Assistant award.
AI resident at X - the Moonshot Factory, working on the Everyday Robot Project.
I interned in Johnny Lee's group in the Brain Robotics team at Google. I was mentored by Andy Zeng and Shuran Song, and spent the summer building Form2Fit, a robo-kitting solution that can generalize to new objects and kits. This was my first foray into research and I'm deeply grateful to Shuran, Andy and Johnny for being the best mentors one could ask for.

Form2Fit was featured on the Google AI Blog and was a finalist for Best Paper Award in Automation at ICRA 2020.
B.Eng. at the American University of Beirut, in my home country of Lebanon โค๏ธ, majoring in Electrical Engineering.

I completed two internships during my undergrad stint. In the summer of 2017, I was a visiting researcher in the Khuri-Yakub Ultrasonics Group at Stanford, developing machine learning solutions for various applications involving capacitive micromachined ultrasound transducers, also known as CMUTs. And in 2018, I interned at Nimble AI, where I built the infrastructure for training and deploying various real-time grasping algorithms for suction and parallel-jaw grasping.
XIRL: Cross-embodiment Inverse Reinforcement Learning
Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debdidatta Dwibedi,
CoRL 2021, Oral Presentation
project page / arXiv / openreview / code

To leverage the vast quantity of tutorial videos on the web, we need robots that can learn from expert demonstrators with a vastly different embodiment. We tackle this cross-embodiment visual imitation problem by learning self-supervised reward functions that encode task progress and can be maximized with downstream reinforcement learning.

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly
Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song,
ICRA 2020, Best Paper Award in Automation Finalist
project page / blog post / arXiv / code / slides

We leverage visual geometric shape descriptors in the kit assembly task, with a nifty self-supervised data collection pipeline based on time-reversed disassembly, to create Form2Fit, a robotic system that can assemble novel objects and kits.

side stuff
torchkit is a lightweight library containing PyTorch utilities useful for day-to-day research. Its main goal is to abstract away a lot of the redundant boilerplate associated with research projects like experimental configurations, logging and model checkpointing.
walle is a general-purpose robotics library I wrote and use in my day to day research. It features a unified API for dealing with position and orientation in 3D space, an extendable API for streaming 3D data from Intel Realsense RGB-D sensors, and an API for transforming to and from different 3D representations like point clouds and orthographic heightmaps.
learn-linalg is a numerical linear algebra library I wrote from scratch as a learning exercise. It implements useful decompositions (LU, Cholesky, QR, SVD) as well as eigen algorithms (power, inverse, projected, qr). It is written entirely in Python and tested against tried and true numpy equivalents.
torchnca is a PyTorch-based (i.e. gpu-accelerated) dimensionality reduction package that implements the Neighbourhood Components Analysis algorithm of Goldberger et al. with a few tweaks to make it more stable. Since I couldn't find any tutorial or reference outside academic papers, and because I find it to be an extremely elegant algorithm, I wrote about it in a detailed blog post.
pyphoxi is a lightweight Python API for streaming RGB-D data from the PhoXi 3D stuctured light sensor.
hypersearch is a hyperparameter optimization library for PyTorch. It is based on the Hyperband algorithm by Li et al., and uses random search coupled with adaptive resource allocation and early-stopping to select the best-performing configuration.
I've re-implemented some of my favorite papers in the deep learning litterature over the last few years. Implementations like Spatial Transformer Networks and Recurrent Models of Visual Attention have garnered over 690 and 350 GitHub ★ respectively. I think implementing algorithms from scratch is a great way of building intuition for why things work so I genuinely recommend it as an exercise for students.

Big thanks to Andrej Karpathy for letting me use his website template!
Thanks to Jon Barron for making his website's source code free to use - I incorporated a chunk to create the publication section above.
And thanks to Jimmy Wu for tidying my HTML loose ends.