Francis Rhys Ward — AI Control Project

Open Philanthropy recommended a grant of $20,000 to Francis Rhys Ward to support a study on whether AI models can deceptively “sandbag” (deliberately underperform) on the MLE-bench evaluation, or otherwise sabotage evaluation results. Understanding AI models’ ability to deceive or mislead under these conditions could help researchers properly interpret the results of future AI benchmarks and experiments.

This falls within Open Philanthropy’s focus area of potential risks from advanced artificial intelligence.

Open Philanthropy Grant Page
Francis Ward's Website

Read more: