OpenAI released DALL-E 2. If you haven’t seen some of the examples, check them out. DALL-E 2 is very impressive in generating realistic images from very short text-based descriptions. It is the second release from OpenAI. The codes are not open sourced yet but you have to sign up to the Waitlist if you want to get a deeper preview and try.
DALL-E Instagram page (@openaidalle) offers more clue as to what the algorithm can generate. See images associated with “a head of broccoli complaining about the weather” or a more literal (or easier example) such as a “cubed basketball.”
Like other generative algorithms, we don’t know what exactly the outputs are until we see them. The variation of the baseline image could be more violent, sexualized, more mundane, or more creative. This leads to one key risk with the generative models.
Another risk is that the output can become good enough to mimic or replace authentic contents that machines or humans do not know what’s generated by human and what’s generated by machine. Some people would argue that computer-generated images can be authentic or real too. I am not going down that rabbit hole here, and you can think about that debate.
The continued advancement of synthetic content will not stop. Dalle-E 2 is showing how the technology can be used for positive and creative purposes which may not have harmful consequences. At the same time, given the proliferation of deepfakes and the extremely rapid propagation of information via social media, we have to be careful about who has access to these codes and what they are being used for. Organizations that practice AI should consider business and process-level guardrails to prevent state-of-art AI algorithms to become weaponized. Media organizations should consider a collective governance system and a disincentive system to help reduce deepfakes. Organizations such as OpenAI that does state-of-art AI research should do a societal risk and benefit analysis on releasing the code behind new algorithms. So far, OpenAI has been cautious in release codes to the public and “into the wild,” which I believe is a good thing.
Identifying deepfakes is hard. Many academic and media groups are studying misinformation. At this point, given the proliferation of deepfakes, our society doesn’t have a panacea.
UC Berkeley Professor Hany Farid and others provided their expert perspective on deepfakes in a recent Scientific American article which highlights the outcome of an experiment where humans find AI-generated faces more trustworthy.
You can read more about the techniques that learn joint representations of texts and images as well as the evaluation behind DALL-E 2 here.
For those interested in the theory grounded in statistics, here you go: