Francis Rhys Ward — AI Control Project

Organization:
Francis Rhys Ward
Award Date:
05/2025
Amount:
$20,000
Purpose:
To support research expenses on the Sandbagging Project

Open Philanthropy recommended a grant of $20,000 to Francis Rhys Ward to support a study on whether AI models can deceptively “sandbag” (deliberately underperform) on the MLE-bench evaluation, or otherwise sabotage evaluation results. Understanding AI models’ ability to deceive or mislead under these conditions could help researchers properly interpret the results of future AI benchmarks and experiments.

This falls within Open Philanthropy’s focus area of potential risks from advanced artificial intelligence.

Read more: