For this Humans of Machine Learning (#humansofml) interview, I’m super excited to share my conversation with Emil Wallner. Emil is living, breathing proof that it’s possible to pursue serious AI research as a self-taught creator. Emil is currently doing machine learning research at Google Art & Culture and an independent researcher in reasoning.
This episode of Humans of ML is special because Emil got his start in AI at FloydHub. In 2018, Emil created a popular open-source project that translates design mock-ups into HTML/CSS, Screenshot-to-code. In early 2019, he was the subject of a short film made by Google for his work on automated colorization. He previously worked for the University of Oxford. He co-founded a seed investment firm that focuses on education technology.
In this conversation, we’ll cover Emil’s AI journey and his advice to pursue a career as a self-taught research scientist. He has had an inspiring and adventurous personal journey – we talk about this as well. Emil is super kind, humble & full of passion for his work. He was such a pleasure to talk to – I hope you enjoy our chat.
*Cover image source: https://blog.google/technology/ai/creative-coder-adding-color-machine-learning/
[Alessio]: You don’t have what we could consider a “standard education” in either AI or CS, despite your deep-domain expertise. This is very unconventional in a field where academic pedigree was considered to carry all the weight. I’d love to walk through your journey in AI.
[Emil]: Where would you like to start?
Looking at your past experiences, it’s fascinating to see such diversity in what you’ve pursued. By the way, I really love your CV – the quirks section was especially fun to read. Could you tell us a little about your pre-AI life?
In my early teens, I was more focused on developing a theory about personal development than studying for exams. When I finished high school, I tried it.
I moved from Sweden to Ghana, West Africa. I started working as a teacher in the countryside, but after invoking the spirit of their dead chief, they later annotated me the king of their village.
After I left Ghana, I went back to Sweden to work as a truck driver. I then joined a band and toured the US, Mexico, and Europe. Tl;dr, I spent a few years planning and embarking on personal development adventures. They were loosely modeled after the Jungian hero’s journey with the influences of Buddhism and Stoicism.
From my travels, I was exposed to a lot of social issues that led me into social entrepreneurship in my mid-twenties. I started working with The Skoll Centre for Social Entrepreneurship at the University of Oxford. One thing led to another, and I ended up co-founding an investment firm to fund education initiatives.
When did you start your journey at Ecole42? What motivated you to pursue this path?
I realized I prefer being a maker and programming became a bottleneck to many of my long-term goals. I started studying three years ago.
I chose 42 because they are one of the few education institutions based on learning best-practices. 42 is a peer-to-peer university created from first-principles to account for what we know about learning science.
Makes sense. How did studying programming lead you to ML/DL?
I spent six months programming in C and then did a deep learning internship at FloydHub.
During my internship, I spent my first two months playing with models and implemented the core deep learning algorithms from scratch. I then spent two months colorizing images with neural networks, and finished my internship with a project to translate design mock-ups into HTML.
You can read about what I did on the FloydHub blog. The the launch of the AI phase of my career.
You clearly have strong values associated with education and ideas as to how it’s best done. It shows in your personal choices and your work with investments in educational initiatives. Do you think self-education is the future?
Many are realizing that education is a zero-sum credential game. It serves people with a certain type of motivation from roughly the same socio-economic background.
I believe most ambitious people have been searching for alternative routes, but there haven’t been any good options. Recently, we’ve seen an increase in universities for autodidacts, 40-50 or so. They are using software systems to shift from a teacher and exam-centric system to a peer-reviewed and portfolio-centric system.
These peer-to-peer universities are in the early stages and many still prefer exam-based motivation. However, they are becoming better by the day and I’m confident that they will become mainstream within the coming decade.
Can you elaborate more on the signaling in the self-taught process? In other words, how can we recognize when we are on the right track or pursuing the right learning experience?
Creating value with your knowledge is evidence of learning. I see learning as a by-product of trying to achieve an intrinsic goal, rather than an isolated activity to become educated.
Early evidence of practical knowledge often comes from usage metrics on GitHub, or reader metrics from your work blog. Progress in theoretical work starts by having researchers you consider interesting engage with your work.
Taste has more to do about character development than knowledge. You need taste to form an independent opinion of a field, having the courage to pursue unconventional areas and to not get caught up in self-admiration.
Taste is related to the impact your work has.
Would you say there’s an ideal curriculum for a self-taught AI student?
When you are intrinsically motivated, you want to avoid the concept of a curriculum as much as possible.
The key is to learn the bare minimum and then start exploring.
Knowing how to code is a prerequisite. Then, I’d spend 1-2 months completing Fast.ai course V3, and spend another 4-5 months completing personal projects or participating in machine learning competitions. It’s important to collect objective evidence that you can apply machine learning.
Here’s what I outlined as a rough guideline:
After six months, I’d recommend doing an internship. Then you’ll be ready to take a job in industry or do consulting to self-fund your research.
As fascinating and practical as it sounds, are you convinced self-education is for everyone? Are there people you’d recommend it for versus not? Any advice on where to start for those who want to follow your same journey?
Support systems within self-education are still in early development, so it’s best suited for early adopters. The main requirements are being self-driven and being able to handle external pressure, e.g. friends and family will question your decision. It’s also worth noting that student loans, visas, and research grants are harder to obtain.
On the plus side, peer-to-peer universities are often tuition-free and don’t require high-school diplomas. I’d check the 42 school network, which I’m part of, but also Holberton School, and Lambda School.
We’ve talked about how the interview process for AI positions is broken. I imagine that’s especially true for self-taught people entering this field. How do you think that that problem can be fixed?
Many of the small and medium enterprises prefer portfolios over degrees. When it comes to larger companies, it becomes more of an art than a science.
At large companies, less than a few percent are self-taught in ML. Of those, most don’t come through the classic hiring channels. Due to the volume of university applicants a large company faces, it’s harder for them to adjust to portfolio-centric hiring.
It’s not an easy problem, here are the rough guidelines I shared earlier:
I don’t know what it would look like in practice, but I’d imagine clearly communicating that you have a separate track for portfolio-based hiring, and how you quantify the quality of a portfolio.
Think of it as assessing a process rather than skill-specific questions. Focus the initial phase on discussing their portfolio in-depth. It can also be useful to ask how they solve a problem step by step, not a brain-teaser with a specific answer, but a more open-ended problem related to their area of expertise.
Depending on the bandwidth of the applicant, it can also be worth doing a take-home exam, followed by a shorter paid contracting assignment.
That sounds so much more efficient than the typical hiring process. Assuming they can master the art of finding their way into a hiring channel, how can a self-taught applicant increase their chances of getting an offer to work for a big company?
To have a high chance of getting an offer, you need to understand most of Ian Goodfellow’s Deep Learning book and Cracking the Coding Interview, and find a dozen people from big companies to do mock interviews with. If you self-study full-time, it will take around two years.
In the end, hiring pipelines at large companies assess your extrinsic motivation, your ability to learn a given body of knowledge. However, you are self-taught because you have strong intrinsic motivation. Forcing yourself to learn a body of knowledge is a dread. In my case, I think the opportunity cost is too high to study for interviews.
I started working with Google because I reproduced an ML paper, wrote a blog post about it, and promoted it. Google’s brand department was looking for case studies of their products, TensorFlow in this case. They made a video about my project. Someone at Google saw the video, though my skill set could be useful, and pinged me on Twitter.
What I’ve seen work is getting good at a niche and letting the world know about it. Have a blog, be active on Twitter, and engage with researchers via email.
Any advice for creating an AI portfolio?
Once an employer checks your portfolio, you have 10 seconds to pique their interest and another 20 seconds to convince them you are a fit.
The credentialism-value of a portfolio is proportional to how clear the evidence of your work is and how relevant it is to the employer.
That’s why online course certificates are weak because it’s hard for an employer to know how it was assessed. They assume most copy and paste the assignments. The same is true for frequent portfolio items. Group projects are weaker because they don’t know what you contributed with.
Novelty has high credentialism-value because it’s evidence that you have unique knowledge and it’s clear that it came from you. Reproducing a paper without code is evidence that you can understand machine learning papers. And creating an in-depth blog post about your work creates further evidence that you made a genuine contribution.
To create additional evidence, you can engage in an objective process to assess your work, in the form of machine learning competitions, publishing papers, or sharing it online to see what the broader public thinks of your work. Formal research is often measured by publishing first-author papers in high-quality conferences or journals.
That’s the context that led to this thread:
How can someone bootstrap into AI research as a self-educated practitioner?
I made this tweet as a rough outline:
It’s important to have enough practical machine learning skills to self-fund your early research exploration with contracting, have enough skills for fast experimentation and try different areas to find a specialization you want to spend the next few years on.
Then you want to decide if you want to pursue formal research or independent research. For many of the same reasons I’m self-educated, I’m also inclined to do indie research.
There is a recent research trend of big AI companies benefiting from working at scale. How can small- and medium-sized research labs stay in the game? Rich Sutton discussed this in an essay as well. What type of strategy can smaller players adopt to escape that trap?
What Sutton pointed out was that the best models tend to have few priors. But most if not all of today’s building blocks in AI could have been invented with small compute. The industry labs have a significant advantage when it comes to applying these in the real world, but that is mostly a concern for the industry as of now.
Because big labs tend to do large scale projects with significant PR efforts, that’s what most end up talking about. As a result, that’s what becomes trendy.
However, if you end up working on problems outside of your compute budget, you start procrastinating by tinkering with hyperparameters. You don’t have enough resources to build an efficient hyperparameter sweep, so you end up doing ad-hoc architecture changes. You are not likely to contribute, you learn slowly, and you end up with huge compute bills.
Fast experiment cycles are crucial for learning efficiently.
When you are learning, aim for experiments that take less than 10 minutes. If you are building an applied portfolio, it’s fine if an experiment takes a day since you have most hyperparameters, and if you are doing research, I’d aim to sweep 20-50 experiments within a few hours. This can often be done by using a smaller benchmark or narrowing the problem scope.
Great advice. What are interesting AI research areas that don’t require too much compute?
I agree with François Chollet’s ‘The Measure of Intelligence.’ We should shift from solving tasks to models that are concerned with skill-acquisition efficiency, i.e. systems that can solve a broad set of tasks that involve uncertainty. Today’s systems lead to local-generalization and there is little evidence that they are useful for human-like intelligence.
A lot of this has to do with areas related to reasoning, here’s what I outlined earlier:
In addition, I’d like to add Routing Networks and CRL (Composing representation transformations). Networks that break a problem into intermediate logic and create specialized models for each step. Unsupervised and curriculum learning will be important to develop Chollet’s idea of ‘Core Knowledge’, a core set of priors to enable reasoning.
What are you hoping to learn or tackle next in your career?
Similar to finding my self-educated path, I want to find my research style. I’m interested in the macro and micro, I’m hoping to contribute to AI in relation to creativity, and improve neural networks reasoning capabilities.
Just for kicks, let’s stray more into hypothetical territory before we conclude. How far would you say we are from AGI?
Today’s most sophisticated sequential models can’t generalize to solve addition, however, we are making a lot of improvements in scaling local-generalization models. Since the development of deep learning, we’ve only made marginal improvements in general intelligence.
The most significant combined increase in the number of AI researchers, education accessibility, and compute resources will likely happen in the coming decade. Our collective learning curve in AI will flatten out after this point. Hence, by the end of the 2020s, we’ll have a better understanding if there are more general approaches to machine learning.
What is your opinion about the debate between Symbolic & Connectionist AI?
It’s too high-level to be useful.
I prefer to discuss implementations that have data to support a claim. For example, do the MAC network, Neural Turing Machines, and Routing networks create intermediate logic, or are they large hash tables with locality-sensitive hash functions?
Many of the interesting debates center around Chollet’s call to shift from task specific networks to skill-acquisition efficiency.
Can AI be conscious?
Yes. I think consciousness is a spectrum and is commonly thought of as the point when we become self-aware, which happens roughly at the age of two for humans. It’s a point when we have enough general and high-level abstractions of the world to start forming a coherent identity.
That leads to another question, how do you develop an identity when you don’t have constraints such as nature and nurture? You can artificially create constraints to create the illusion of a human-like identity, but artificial identities are probably going to evolve from information and energy constraints.
I guess we’ll have to wait and see what evolves, then.
Emil, thanks so much for taking the time to chat with me today. Where can people go to learn more about you and your work?