My work focuses on understanding the representations and algorithms underlying human and machine cognition. For the first time, we have access to models that can flexibly accomplish complex cognitive tasks across a wide range of domains. My research uses insights from cognitive science in order to uncover the mechanisms supporting this seemingly-intelligent behavior, and, reciprocally, uses techniques from mechanistic interpretability to better characterize the similarities and differences between minds and machines. Ultimately, this research program aims to transform black-box neural networks into explicit and useful cognitive models of linguistic and visual processing, while also driving the development of more human-like artificial intelligence systems.
Here are some specific questions that I like to think about:
Can language models distinguish the possible from the impossible?
Language models are usually great at incorporating context — but not always. What causes contextualization errors?
How do vision transformers solve a simple symbolic visual reasoning task?
Do neural networks self-organize into modular components when solving compositional tasks?