This summer, while living with 7 other people in a biotech hacker house in Cambridge, MA, I interned on the Machine Learning Research team at Dyno Therapeutics! Dyno is a startup aiming to design gene therapy vectors using machine learning. Using machine learning and high-throughput in-vitro (cell) + vivo (animal) experimentation, they aim to develop gene vectors, known as capsids, that can safely deliver a genetic payload (a gene therapy) to the correct cell. Dyno specifically works on designing adeno-associated viruses (AAV), the most widely-used vector for gene therapies.
Gene vectors emerged as a significant problem, primarily because biologics, or therapeutics made from a living organism, are an increasingly exciting opportunity to improve human health. Gene-editing systems, base editing, and even the recent success of messenger RNA systems in the COVID-19 vaccine race have proven that biologics are the future of how we lengthen the human lifespan. However, success in gene therapies relies on the viral vector’s ability to safely and precisely deliver a gene payload to the intended target cells and tissues. Ensuring that gene therapies are not blocked by the immune system (immune evasion), foreign DNA is introduced into a cell (transduction), and that a genetic payload can fit into a viral vector (packaging) are all integral to this.
Current gene therapies are limited by what we know about naturally occurring vectors, especially adenoviruses. Thus, Dyno is trying to tackle this problem by building its suite of targeted AAV vectors for various tissues by applying machine learning and protein engineering strategies to naturally occurring (wildtype) AAVs.
I was initially drawn to Dyno because of its extensive list of scientific contributions to machine learning, protein engineering, and viral vector development. Before Dyno, I had spent time working in machine learning and computational molecular design, working in an academic lab and a large biotech company. Given those experiences, I wanted to spend my summer working on biological problems with a startup in a machine learning role. The main factor I was searching for was a company innovating in both computational methods development and application of methods for protein engineering and biological discovery.
Dyno, in particular, stood out to me the most because of its significant scientific contributions to the field of ML-guided protein engineering. Dyno published two breakthrough papers in Science and Nature Biotech, “Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design” and “Deep diversification of an AAV capsid protein by machine learning,” where they pioneered the four-step approach mentioned above. Reading their papers and the founder’s work describing the biological sequence design problem excited me at the prospect of training under a leading team of researchers in the field.
At Dyno, I was mentored by some fantastic scientists, some of whom came from Berkeley. My manager, Jeffrey Chan, did his Ph.D. in EECS advised by Prof. Yun Song, and another scientist on my team, David Brookes, was advised by Prof. Jennifer Listgarten. Working with Berkeley grads at the cutting edge of protein biology and machine learning was exciting — it was great learning how to do good science while receiving invaluable advice about labs and classes at Berkeley. As part of my team (Machine Learning Research — MLR), we work primarily on developing and evaluating the promise of new methods for designing new AAV capsid libraries. Seeing how different teams cross-collaborate was exciting, as Dyno had built an environment for wet-lab biological scientists to work hand-in-hand with ML scientists on designing new experiments. The other exciting opportunity at Dyno, from the perspective of a computational intern, is the ability to play with large-scale, diverse datasets spanning dozens of different sequence-to-function relationships for numerous targets. It was exciting to probe interesting research questions and try to make contributions to Dyno’s R&D efforts in a few months, and this wouldn’t be possible without the sheer amount of biological data collected internally.
Out of everything, one of the things I valued most at Dyno was the high expectations and scientific rigour people have. Dyno was spun out of Prof. George Church’s lab at Harvard, and the research culture remains here. For example, journal clubs disseminate interesting literature on biology, statistics, and computer science and are a regular occurrence here. In addition, work-in-progress research talks were an opportunity for me to explain my projects to teams of scientists and get invaluable feedback on new directions to probe and methods to try.
Within my first month, I did a lot of paper reading surrounding literature, understanding the concept of epistasis and how it relates to the fitness of proteins. The mapping from protein sequence to biological phenotype largely determines the course of evolution. Living systems evolve one mutation at a time, but a single mutant can alter the effect of subsequent mutations. But, the mechanistic determinants of this, known as epistasis, are pretty unclear. These nonadditive interactions between amino acid sites in a sequence can either accelerate or severely constrain the pace of this adaptation.
Starting from understanding and implementing basic additive (linear models) to more complex non-linear models to understand epistasis in Dyno’s datasets involved a lot of paper reading, leading to me doing a journal club on the paper “Physical Constraints on Epistasis” by Husain et al. Through preparing for the journal club, I had to pick up domain knowledge through lectures, notes and papers on topics far orthogonal to my background as a computer science student. I explored fields like physical dynamics, evolution and structural biology, trying to understand and better contextualize the paper to the context of the problems Dyno was tackling. In the process, I explored many fields and expanded my research interests to areas I would’ve never considered before. This interest led to me heavily focusing on the intersection of machine learning and structural biology, reading papers that developed geometric deep learning models for protein design, and using my domain knowledge to understand a new discipline better. I’ll hopefully be able to share some concrete work from my summer at Dyno in the coming months! 🙂
From these experiences, I grew convinced that I’m broadly interested in pursuing research in machine learning and biological sequence design, using techniques from evolutionary biology and structural biology to guide my work. Using the skills and frameworks I learned at Dyno, I’m excited to do research during my sophomore year. Beyond research, Dyno has taught me a lot about working with real-world ML systems and problems. Dealing with data imbalances, validating approaches in the literature on actual data, and designing new experiments to interpret the features models are learning — have all taught me things I couldn’t get from reading papers. To top it off, doing so alongside incredible people who deeply cared about their work and were always willing to hop on calls made for a fantastic learning environment.
I’m back in Berkeley for year 2 of my undergrad! I’m excited to get back to contributing to student organizations in like ML@Berkeley (and join a couple new ones), sit in on research seminars and lab meetings, and take exciting undergrad + grad classwork in CS, math and bioengineering. I’m currently diving into classes around optimization theory, probability & discrete math, and the history of science, while taking grad classes in areas like computational functional genomics. While the first year of classes at Berkeley was admittedly full of packed lecture halls, oftentimes with hundreds of undergrads, I’m excited to be taking smaller, higher-level classes in my sophomore year!
In terms of concrete next steps: I’m thrilled to be joining the Broad Institute of MIT and Harvard as a visitng undergrad student working with Eeshit Dhaival Vaishnav and Prof. Eric Lander. Prof. Lander has been a longstanding research idol of mine ever since I first read about the Human Genome Project, so I’m thrilled to be working with him and Eeshit at the Broad. We’ll be tackling some exciting questions relating to ML, single-cell bio, and gene expression!
I’m also starting to look for ML/research internships for next summer. If you’re possibly interested, lets chat! I can be reached at [first_name][last_name_initial]@berkeley.edu.