Nidhi Parthasarathy, Tuesday – June 28th, 2022
Day 2 started with a camp-wide lecture on machine learning from one of the research mentors (Stanford). We went over the definition of machine learning, how it was a specific class of artificial intelligence that allows computers to learn from data to make decisions and predictions, without being explicitly programmed. We discussed examples of tasks that were too complex to fully describe on our own (like recognizing different kinds of handwriting, translating across languages, digit recognition) and how ML could tell the computer how to learn without describing what to learn. We also contrasted the fully-learned models in ML with hand-crafted rule systems in broader AI, and how the latter needed expert knowledge, but led to predictable and safe scenarios, but the former automatically detects patterns in massive datasets that allow good decisions even in new unseen scenarios.
We then learnt about how data is at the heart of machine learning and the difference between training and testing data sets, and about features (a.k.a. inputs) and labels (a.k.a. outputs) in data, and about public datasets on kaggle.com. We got an introduction to different types of machine learning: supervised learning, where the dataset contains both examples and labels, like regression or classification; unsupervised learning where we attempt to make inferences about unlabelled data for grouping or pattern detection or identifying anomalies; and reinforcement learning to learn by trial and error through rewards and penalties with approaches like k-means clustering, or principle component analysis (PCA). We learned about training, memorization versus generalization (adapting to data that we have not seen), and overfitting/underfitting challenges.
We also learned about model accuracies, false positives and false negatives, and confusion matrices, as well as metrics like precision (how likely a positive prediction is correct), recall (how many of the positive outcomes are predicted correctly), specificity (how likely a negative prediction is correct), and F1 score ( a combined precision/recall metric).
The last section of the lecture talked about how we get labeled data, and the importance of good unbiased datasets (for example: if you are classifying pictures as dogs or cats and don’t have any labeled pictures of dogs?). We also learned about important issues around privacy and anonymity and recent laws like the EU General Data Protection Regulation (GDPR). We looked at some odd solutions from AI (https://www.aiweirdness.com/) and about societal biases in data, that made me realize the importance of not blindly trusting your AI model.
In the discussion part of the lecture, I also had a chance to reflect on the assortment of ways one could contribute to progress in AI across business, design, ethics, education, engineering, research, etc, and it made me realize how broad the impact of AI was on various disciplines.
Our next session was a deep dive on medical AI. Alaa went through a whirlwind tour of various applications of AI in the medical space, highlighting dozens of different examples from recent work. One interesting thing that stood out for me was how recent AI models could even identify race from X-rays, something that even doctors can not do today, but we don’t quite understand how the model did that. This highlighted to me how AI is capable of exceeding human capacity, but we still need to be careful about how we use these models. An important theme was the importance of big data in making effective AI models, and the importance of societal issues and ethical concerns in getting and processing the data.
The afternoon session featured Mykel Kochenderfer (professor at Stanford, Aeronautics and Astronautics). After giving an overview of his background and his interests in aviation and AI, Prof. Kochenderfer talked about the importance of optimization in AI and in training neural networks (to minimize incorrectness) and the challenges around irregularity, dimensionality, and modeling.
He talked about optimization concepts like curve fitting, classification, and probability estimation and examples of how they were used in AI for different applications (e.g., home pricing estimation/covid detection, spam messages, investment returns/treatment outcomes). He talked about decision making and walked us through an interesting example of making a decision between visiting the computer history museum or Disneyland by systematically understanding terms like outcome utility and probability. He also discussed sequential decision making with uncertainty at the outcome, state, and model levels and how they mapped to different mathematical approaches (markov process or reinforcement learning).
He then talked about applications in the airplane industry around airplane collisions. He gave us good pointers for future reading in this area including free PDFs of his books. I particularly liked his ending advice about how we should not be intimidated by some of the more complicated math models when they are not explained well and his comments about how AI is not just fun but can have a real impact, and how major advances could come from working on old problems in new ways.
This was followed by a fun lecture/demo on design thinking from the Stanford design school (d.school) by Ariam Mogos. She talked about how AI technology could be a double-edged sword. She began by focusing on face recognition and both its benefits and biases. She highlighted how understanding these implications could be uncomfortable for people and how the only way to address this is to have a lot of diversity and representativeness in the making of these technologies. She mentioned Rep, a student-centric machine that engages young people in emerging technologies.
She highlighted the importance of design principles — what can the code do, how it can represent everyone, etc — and illustrated this for voice assistants. She then had a very fun interactive activity where she gave us a picture as a prompt and asked us to come up with the first health-related topic that it triggered. What was really interesting was when we all put our topics in the chat, every one of us had different views and the sum total of all the input was a very diverse input data set. It really gave me a very nice way to think about how to address diversity and biases in ML data sets. I also liked her last exercise of making us write down our reflections and take-aways from the talk.
As always, we ended the day with a great one hour of virtual socializing — more icebreakers and some games that helped form deeper connections with the other students.
Read on for day 3.