Stella Biderman - Research Interests

Research Interests

I am interested in artificial intelligence and theoretical computer science, and especially topics that allow me to leverage algorithmic and combinatorial insights to solve problems with real-world relevance. Areas I am particularly active in include:

Foundation Models

The development of transformer-based language models is rapidly becoming one of the most promising technologies for scalable artificial cognition on earth. Unfortunately, a lack of investment in public and open source research has left a handful of large companies with de facto monopolies. As a member of EleutherAI and the BigScience Research Workshop I am working to democratize access to and knowledge about large language models. I have helped develop several of the (at the time) largest publicly available GPT-3 style language models in the world including GPT-Neo, GPT-NeoX-20B, and BLOOM.

Beyond natural language processing, I have also worked on models for text-to-image synthesis such as VQGAN-CLIP and models for modeling proteins such as OpenFold.

Interpretability of Foundation Models

Transformers are exceptionally powerful technologies that have quickly gone from smashing NLP benchmarks to being one of, if not the premier ML technology in a wide array of fields. Given their growing role in technological pipelines and society writ large, understanding how and why they work is a pressing issue. My work on this topic falls under two broad categories: mechanistic interpretability and training dynamics. Mechanistic interpretability research aims to pick apart transformers to understand the algorithms that trained models use to reason. This work is complimented by research on training dynamics of large language models, which studies how model behaviors evolve over the course of training and how an engineer can intervene to instill desirable properties in such models.

AI Ethics and Alignment

One of the most exciting recent developments in ML is the emergence of large, self-supervised, pretrained language models, most notably OpenAI’s GPT3. Unfortunately, many studies have shown that — despite their apparently high cognitive powers — these models can easily be trained to acts immorally and that even well-trained models can become corrupted. If a model becomes dangerous by the mere exposure to unethical content, it is unacceptably dangerous and broken at its core. These models are fundamentally not doing what we as humans want them to do, which is to act in useful, aligned ways, not just regurgitate an accurate distribution of the text they have been trained on. We need AI that is, like humans, capable of reading all kinds of content, understanding it, and then deciding to act in an ethical manner anyways. See my essay "the Hard Problem of Aligning AI to Human Values: A New Paradigm for ML" in the Montreal AI Ethics Institute's The State of AI Ethics Report to learn more.