3

Self-supervised alignment with mutual information: Learning to follow principles without preference labels

When prompting a language model (LM), users frequently expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles into …

MARPLE: A Benchmark for Long-Horizon Inference

Reconstructing past events requires reasoning across long time horizons. To figure out what happened, we need to use our prior knowledge about the world and human behavior and draw inferences from various sources of evidence including visual, …

Whodunnit? Inferring what happened from multimodal evidence

Humans are remarkably adept at inferring the causes of events in their environment; doing so often requires incorporating information from multiple sensory modalities. For instance, if a car slows down in front of us, inferences about why they did so …

Towards a computational model of responsibility judgments in sequential human-AI collaboration

When a human and an AI agent collaborate to complete a task and something goes wrong, who is responsible? Prior work has developed theories to describe how people assign responsibility to individuals in teams. However, there has been little work …

Without his cookies, he's just a monster: A counterfactual simulation model of social explanation

Everyday reasoning about others involves accounting for why they act the way they do. With many explanations for someone's behavior, how do observers choose the best one? A large body of work in social psychology suggests that people's explanations …

Chain versus common cause: Biased causal strength judgments in humans and large language models

Causal reasoning is important for humans and artificial intelligence (AI). Causal Bayesian Networks (CBNs) model causal relationships using directed links between nodes in a network. Deviations from their edicts result in biased judgments. This study …

Do as I explain: Explanations communicate optimal interventions

People often select only a few events when explaining what happened. What drives people's explanation selection? Prior research argued that people's explanation choices are affected by event normality and causal structure. Here, we propose a new …

Biased causal strength judgments in humans and large language models

Causal reasoning is a critical aspect of both human cognition and artificial intelligence (AI), playing a prominent role in understanding the relationships between events. Causal Bayesian Networks (CBNs) have been instrumental in modeling such …

Resource-rational moral judgment

There is wide agreement that the mind has different mechanisms it can use to make moral judgments. But how does it decide which one to use when? Recent theoretical work has suggested that people select mechanisms of moral judgment in a way that is …

Procedural dilemma generation for evaluating moral reasoning in humans and language models

As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic …