Skip to main content
Skip to main content

CLIP Talk: Shi Feng (GWU)

Clip Lab Thumbnail

CLIP Talk: Shi Feng (GWU)

Maryland Language Science Center | Computational Linguistics and Information Processing Lab Wednesday, October 9, 2024 11:00 am - 12:00 pm Brendan Iribe Center, 4105

Challenges in AI-assisted AI evaluation

Abstract: AIs are being deployed to solve increasingly complex problems, and reliable human oversight becomes a huge challenge: AIs are also getting better at producing outputs that look correct to humans but are in fact subtly flawed. To support effective oversight, approaches like debate, constitutional AI, and reward modeling all involve using AIs to assist human evaluators. Although promising, these approaches can create new risks, as AIs are being used to evaluate themselves. In this talk, I will discuss three failure modes in both AI-assisted evaluation and training with flawed supervision. I will also discuss preliminary work on mitigating these risks.

Bio: Shi is an assistant professor at the George Washington University. He received his PhD at UMD supervised by Jordan Boyd-Graber. He did postdocs at UChicago with Chenhao Tan and NYU in the Alignment Research Group with Sam Bowman and He He. Shi works on AI safety, in particular scalable oversight, as an extension of his work on human-AI collaboration, interpretability eval, and adversarial robustness. His most recent work focuses on a meta-evaluation of risks in scalable oversight methods and evaluations.

Add to Calendar 10/09/24 11:00:00 10/09/24 12:00:00 America/New_York CLIP Talk: Shi Feng (GWU)

Challenges in AI-assisted AI evaluation

Abstract: AIs are being deployed to solve increasingly complex problems, and reliable human oversight becomes a huge challenge: AIs are also getting better at producing outputs that look correct to humans but are in fact subtly flawed. To support effective oversight, approaches like debate, constitutional AI, and reward modeling all involve using AIs to assist human evaluators. Although promising, these approaches can create new risks, as AIs are being used to evaluate themselves. In this talk, I will discuss three failure modes in both AI-assisted evaluation and training with flawed supervision. I will also discuss preliminary work on mitigating these risks.

Bio: Shi is an assistant professor at the George Washington University. He received his PhD at UMD supervised by Jordan Boyd-Graber. He did postdocs at UChicago with Chenhao Tan and NYU in the Alignment Research Group with Sam Bowman and He He. Shi works on AI safety, in particular scalable oversight, as an extension of his work on human-AI collaboration, interpretability eval, and adversarial robustness. His most recent work focuses on a meta-evaluation of risks in scalable oversight methods and evaluations.

Brendan Iribe Center false

Organization

Website

CLIP Talks