Open Philanthropy recommended a grant of $244,614 to Meridian to support research on maintaining faithfulness in the chain of thought of LLMs.
Many researchers are concerned about scenarios where LLM chains of thought might start to hide intermediate reasoning steps, encoding these thoughts in ways that human readers won’t detect. This study will evaluate several methods for avoiding encoded reasoning.
This grant was funded via a request for proposals for projects related to technical AI safety research. This falls within Open Philanthropy’s focus area of potential risks from advanced artificial intelligence.