Early in the pandemic, an agent—literary, not software—suggested Fei-Fei Li write a book. The approach made sense. She has made an indelible mark on the field of artificial intelligence by heading a project started in 2006 called ImageNet. It classified millions of digital images to form what became a seminal training ground for the AI systems that rock our world today. Li is currently the founding codirector of Stanford’s Institute of Human-Centered AI (HAI), whose very name is a plea for cooperation, if not coevolution, between people and intelligent machines. Accepting the agent’s challenge, Li spent the lockdown year churning out a draft. But when her cofounder at HAI, philosopher Jon Etchemendy, read it, he told her to start over—this time including her own journey in the field. “He said there’s plenty of technical people who can read an AI book,” says Li. “But I was missing an opportunity to tell all the young immigrants, women, and people of diverse backgrounds to understand that they can actually do AI, too.”
Li is a private person who is uncomfortable talking about herself. But she gamely figured out how to integrate her experience as an immigrant who came to the United States when she was 16, with no command of the language, and overcame obstacles to become a key figure in this pivotal technology. On the way to her current position, she’s also been director of the Stanford AI Lab and chief scientist of AI and machine learning at Google Cloud. Li says that her book, The Worlds I See, is structured like a double helix, with her personal quest and the trajectory of AI intertwined into a spiraling whole. “We continue to see ourselves through the reflection of who we are,” says Li. “Part of the reflection is technology itself. The hardest world to see is ourselves.”
The strands come together most dramatically in her narrative of ImageNet’s creation and implementation. Li recounts her determination to defy those, including her colleagues, who doubted it was possible to label and categorize millions of images, with at least 1,000 examples for every one of a sprawling list of categories, from throw pillows to violins. The effort required not only technical fortitude but the sweat of literally thousands of people (spoiler: Amazon’s Mechanical Turk helped turn the trick). The project is comprehensible only when we understand her personal journey. The fearlessness in taking on such a risky project came from the support of her parents, who despite financial struggles insisted she turn down a lucrative job in the business world to pursue her dream of becoming a scientist. Executing this moonshot would be the ultimate validation of their sacrifice.
The payoff was profound. Li describes how building ImageNet required her to look at the world the way an artificial neural network algorithm might. When she encountered dogs, trees, furniture, and other objects in the real world, her mind now saw past its instinctual categorization of what she perceived, and came to sense what aspects of an object might reveal its essence to software. What visual clues would lead a digital intelligence to identify those things, and further be able to determine the various subcategories—beagles versus greyhounds, oak versus bamboo, Eames chair versus Mission rocker? There’s a fascinating section on how her team tried to gather the images of every possible car model. When ImageNet was completed in 2009, Li launched a contest in which researchers used the dataset to train their machine learning algorithms, to see whether computers could reach new heights identifying objects. In 2012, the winner, AlexNet, came out of Geoffrey Hinton’s lab at the University of Toronto and posted a huge leap over previous winners. One might argue that the combination of ImageNet and AlexNet kicked off the deep learning boom that still obsesses us today—and powers ChatGPT.
What Li and her team did not understand was that this new way of seeing could also become linked to humanity’s tragic propensity to allow bias to taint what we see. In her book, she reports a “twinge of culpability” when news broke that Google had mislabeled Black people as gorillas. Other appalling examples followed. “When the internet presents a predominantly white, Western, and often male picture of everyday life, we’re left with technology that struggles to make sense of everyone,” Li writes, belatedly recognizing the flaw. She was prompted to launch a program called AI4All to bring women and people of color into the field. “When we were pioneering ImageNet, we didn’t know nearly as much as we know today,” Li says, making it clear that she was using “we” in the collective sense, not just to refer to her small team.”We have massively evolved since. But if there are things we didn’t do well; we have to fix them.”
On the day I spoke to Li, The Washington Post ran a long feature about how bias in machine learning remains a serious problem. Today’s AI image generators like Dall-E and Stable Diffusion still deliver stereotypes when interpreting neutral prompts. When asked to picture “a productive person,” the systems generally show white men, but a request for “a person at social services” will often show people of color. Is the key inventor of ImageNet, ground zero for inculcating human bias into AI, confident that the problem can be solved? “Confident would be too simple a word,” she says. “I’m cautiously optimistic that there are both technical solutions and governance solutions, as well as market demands to be better and better.” That cautious optimism also extends to the way she talks about dire predictions that AI might lead to human extinction. “I don’t want to deliver a false sense that it’s all going to be fine,” she says. “But I also do not want to deliver a sense of gloom and doom, because humans need hope.”