GiveWell’s Top Charities Are (Increasingly) Hard to Beat

In this post, “we” refers to Good Ventures and the Open Philanthropy Project, who work as partners.

Our thinking on prioritizing across different causes has evolved as we’ve made more grants. This post explores one aspect of that: the high bar set by the best global health and development interventions, and what we’re learning about the relative performance of some of our other grantmaking areas that seek to help people today. To summarize:

Cash transfers to people in extreme poverty

In 2015, when we were still part of GiveWell, we wrote:

By default, we feel that any given grant of $X should look significantly better than making direct cash transfers (totaling $X) to people who are extremely low-income by global standards – abbreviated as “direct cash transfers.” We believe it will be possible to give away very large amounts, at any point in the next couple of decades, via direct cash transfers, so any grant that doesn’t meet this bar seems unlikely to be worth making….

It’s possible that this standard is too lax, since we might find plenty of giving opportunities in the future that are much stronger than direct cash transfers. However, at this early stage, it isn’t obvious how we will find several billion dollars’ worth of such opportunities, and so – as long as total giving remains within the ... budget – we prefer to err on the side of recommending grants when we’ve completed an investigation and when they look substantially better than direct cash transfers.

It is, of course, often extremely unclear how to compare the good accomplished by a given grant to the good accomplished by direct cash transfers. Sometimes we will be able to do a rough quantitative estimate to determine whether a given grant looks much better, much worse or within the margin of error. (In the case of our top charities, we think that donations to AMF, SCI and Deworm the World look substantially better.) Other times we may have little to go on for making the comparison other than intuition. Still, thinking about the comparison can be informative. For example, when considering grants that will primarily benefit people in the U.S. (such as supporting work on criminal justice reform), benchmarking to direct cash transfers can be a fairly high standard. Based on the idea that the value of additional money is roughly proportional to the logarithm of income, [1] and the fact that mean American income is around 100x annual consumption for GiveDirectly recipients, we assume that a given dollar is worth ~100x as much to a GiveDirectly recipient as to the average American. Thus, in considering grants that primarily benefit Americans, we look for a better than “100x return” in financial terms (e.g. increased income). Of course, there are always huge amounts of uncertainty in these comparisons, and we try not to take them too literally.

To walk through the logic of how this generates a “100x” bar a bit more clearly:

Obviously, calculations like this remain deeply uncertain and vulnerable to large mistakes, so we try to not put too much weight on them in any one case. But the general reality that they reflect — of vast global inequalities, and the relative ease of moving money from people who have a lot of it to people who have little – seems quite robust.

Although we stopped formally using this 100x benchmark across all of our giving a couple of years ago because of considerations relating to animals and future generations, we have continued to find it a useful benchmark against which “near-termist, human-centric” grants – those that aim to improve the lives of humans on a relatively short time horizon, including a mix of direct aid, policy work, and scientific research – can be measured.

The best programs are even harder to beat

In 2015, when we first wrote about adopting the cash transfer benchmark, it looked like GiveWell could plausibly “run out” of their more-cost-effective-than-cash giving opportunities. At the time, they had three non-cash-transfer top charities they estimated to be in the 5-10x cash range (i.e., 5 to 10 times more cost-effective than cash transfers),[8] with ~$145 million of estimated short-term room for more funding. That, plus uncertainty about the amount of weight to put on these figures, led us to adopt the cash transfer benchmark. (In the remainder of this post, I occasionally shorten “cash transfer” to just “cash.”) But by the end of 2018, GiveWell had expanded to seven non-cash-transfer top charities estimated to be in the ~5-15x cash range, with $290 million of estimated short-term room for more funding, and with the top recommended unfilled gaps at ~8x cash transfers.[9] If we combine cash transfers at “100x” and large unfilled opportunities at ~5-15x cash transfers, the relevant “bar to beat” going forward may be more like 500-1,500x.[10] And earlier this year GiveWell suggested that they expected to find more cost-effective opportunities in the future, and they are staffing up in order to do so.

Another approach to this question is to ask, how much better than direct cash transfers should we expect the best underfunded interventions to be? I find scalable interventions worth ~5-15x cash a bit surprising, but not wildly so. It’s not obvious where to look for a prior on this point, and it seems to correlate strongly with general views about broad market efficiency: if you think broad “markets for doing good” are efficient, finding a scalable ~5-15x baseline intervention might be especially surprising; conversely if you think markets for doing good are riddled with inefficiencies, you might expect to find many even more cost-effective opportunities.

One place to potentially look for priors on this point might be compilations of the cost-effectiveness of various evidence-based interventions. I know of five compilations of the cost-effectiveness of different interventions within a given domain that contain easily available tabulations of the interventions reviewed:[11]

For this purpose, I was just curious about the general distribution of the estimates, and didn’t attempt to verify any of them, and was very rough in discarding estimates that were negative or didn’t have numerical answers, which may bias my conclusions. In general, we regard the calculations included in these compilations as challenging and error-prone, and we would caution against over-reliance on them[12].

I made a sheet summarizing the sources’ estimates here. All five distributions appear to be (very roughly) log-normal, with standard deviations of ~0.7-1, implying that a one-standard-deviation increase in cost-effectiveness would equate to a 5-10x improvement. However, any errors in these calculations would typically inflate that figure, and we think they are structurally highly error-prone, so these standard deviations likely substantially overstate the true ones.[13]

We don’t know what the mean of the true distribution of cost-effectiveness of global development opportunities might be, but assuming it’s not more than a few times different from cash transfers (in either direction), and that measurement error doesn’t make up more than half of the variance in the cost-effectiveness compilations reviewed above (a non-trivial assumption), then these figures imply we shouldn’t be too surprised to see top opportunities ~5-15x cash. A normal distribution would imply that an opportunity two standard deviations above the mean is in the ~98th percentile. These figures would support more skepticism towards an opportunity from the same rough distribution (evidence-based global health interventions) that is claimed to be even more cost-effective (e.g., 100x or 1,000x cash rather than 10x).

Stepping back from the modeling, given the vast difference in treatment costs per person for different interventions (~$5 for bednets, $0.33-~$1 for deworming, ~$250 for cash transfers), it does seem plausible to have large (~10x) differences in cost-effectiveness.

Even if scalable global health interventions were much worse than we currently think, and, say, only ~3x as cost-effective as cash transfers, I expect GiveWell’s foray into more leveraged interventions to yield substantial opportunities that are at least several times more cost-effective, pushing back towards ~10x cash transfers as a more relevant future benchmark for unfunded opportunities.

Overall, given that GiveWell’s numbers imply something more like “1,000x” than “100x” for their current unfunded opportunities, that those numbers seem plausible (though by no means ironclad), and that they may find yet-more-cost-effective opportunities in the future, it looks like the relevant “bar to beat” going forward may be more like 1,000x than 100x.

Our other grantmaking aimed at helping people today

While we think a lot of our “near-termist, human-centric” grantmaking clears the 100x bar, we see less evidence that it will clear a ~1,000x bar.

Since we initially adopted the cash transfer benchmark in 2015, we’ve made roughly 300 grants totalling almost $200 million in our near-termist, human-centric focus areas of criminal justice reform, immigration policy, land use reform, macroeconomic stabilization policy, and scientific research. To get a sense of our estimated returns for these grants, we looked at the largest grants and found 33 grants totalling $73M for which the grant investigator conducted an ex ante “back-of-the-envelope-calculation” (“BOTEC”) to roughly estimate the expected cost-effectiveness of the potential grant for Open Philanthropy decision-makers’ consideration.

All of these 33 grants were estimated by their investigator to have an expected cost-effectiveness of at least 100x. This makes sense given the existence of our “100x bar.” Of those 33, only eight grants, representing approximately $32 million, had BOTECs of 1,000x or greater. Our large grant to Target Malaria accounts for more than half of that.

Although we don’t typically make our internal BOTECs public, we compiled a set here (redacted somewhat to protect some grantees’ confidentiality) to give a flavor of what they look like. As you can see, they are exceedingly rough, and take at face value many controversial and uncertain claims (e.g., the cost of a prison-year, the benefit of a new housing unit in a supply-constrained area, the impact of monetary policy on wages, the likely impacts of various other policy changes, stated probabilities of our grantees’ work causing a policy change). We would guess that these uncertainties would generally lead our BOTECs to be over-optimistic (rather than merely adding unbiased noise) for a variety of reasons:

We think it’s notable that despite likely being systematically over-optimistic in this way, it’s still rare for us to find grant opportunities in U.S. policy and scientific research that appear to score better than GiveWell’s top charities. Of course, compared to GiveWell, we make many more grants, to more diverse activities, and with an explicit policy of trying to rely more on program officer judgment than these BOTECs. So the idea that our models look less robust than GiveWell’s is not a surprise — we’ve always expected that to be the case — but combining that with GiveWell’s rising bar is a more substantive update.

Some counter-considerations in favor of our work

As we’re grappling with the considerations above, we don’t want to give short shrift to the arguments in favor of our work. We see two broad categories of arguments in this vein: (a) this work may be substantively better than the BOTECs imply; and (b) it’s a worthwhile experiment.

This work may be better than the BOTECs imply

There are a couple big reasons why Open Phil’s near-termist, human-centric work could turn out to be better than implied by the figures above:

It’s a worthwhile experiment

Our near-termist, human-centric giving since adopting the cash benchmark can be broken into roughly three groups: ~$100M for U.S. policy, ~$100M for scientific research, and ~$300M based on GiveWell recommendations in global health and development. We think given the amount of giving we anticipate doing in the future, an experimental effort of that scale is worth running. As we’ve discussed before, we see many benefits from giving in multiple different kinds of causes that are not fully captured by the impact of the grants themselves, including:

We see a number of other practical benefits to working in a broad variety of causes, including presenting an accurate public-facing picture of our values and making our organization a more appealing place to work.

Finally, it is worth noting that while we think GiveWell’s cost-effectiveness estimates are (far) more reliable than the very rough BOTECs we have done, we do not think their estimates (or any cost-effectiveness estimates we’ve ever seen) can be taken literally, or even used with much confidence.

It should be possible to outperform the GiveWell top charities

Although this post describes some doubts about how some of our giving to date may compare to the GiveWell top charities, we continue to think it should be possible to achieve more cost-effective results than the current GiveWell top charities via advocacy or scientific research funding rather than direct services. To the extent that there is a single overarching update here — which we are uncertain about — we think it is likely to be against the possibility of achieving sufficient leverage via advocacy or scientific research aimed at benefiting people in the U.S. or other wealthy countries alone. We have only explored a small portion of the space of possible causes in this broad area, and continue to expect that advocacy or scientific research, perhaps more squarely aimed at the global poor, could have outsized impacts. Indeed, GiveWell seems to agree this is possible, with their expansion into considering advocacy opportunities within global health and development.

As we look more closely at our returns to date and going forward, we’re also interested in exploring other causes that may have especially high returns. One hypothesis we’re interested in exploring is the idea of combining multiple sources of leverage for philanthropic impact (e.g., advocacy, scientific research, helping the global poor) to get more humanitarian impact per dollar (for instance via advocacy around scientific research funding or policies, or scientific research around global health interventions, or policy around global health and development). Additionally, on the advocacy side, we’re interested in exploring opportunities outside the U.S.; we initially focused on U.S. policy for epistemic rather than moral reasons, and expect most of the most promising opportunities to be elsewhere.

If this sounds interesting, you should consider applying: we’re hiring for researchers to help.

Conclusion

We are still in the process of thinking through the implications of these claims, and we are not planning any rapid changes to our grantmaking at this time. We currently plan to continue making grants in our current focus areas at approximately the same level as we have for the last few years while we try to come to more confident conclusions about the balance of considerations above. As Holden outlined in a recent blog post, a major priority for the next couple years is building out our impact evaluation function. We expect that will help us develop a more confident read on our impact in our most mature portfolio areas, and accordingly will place us in a better position to approach big programmatic decisions. We will hopefully improve the overall quality of our BOTECs in other ways as well.

If, after building out this impact evaluation function and applying it to our work to date, we decided to substantially reduce or wind down our giving in any of our current focus areas, we’d do so gradually and responsibly, with ample warning and at least a year or more of additional funding (as much as we feel is necessary for a responsible transition) to our key partner organizations. We have no current plans to do this, and we know funders communicating openly about this kind of uncertainty is unusual and can be unnerving, but our hope is that sharing our latest thinking will be useful for others.

Finally, we’re planning to write more at a later date about the cost-effectiveness of our “long-termist” and animal-inclusive grantmaking and the implications for our future resource allocation.

Footnotes
  1. [1]

    See e.g. Subjective Well‐Being and Income: Is There Any Evidence of Satiation? (archive)

    For instance Deaton (2008) and Stevenson and Wolfers (2008) find that the well-being–income relationship is roughly a linear-log relationship, such that, while each additional dollar of income yields a greater increment to measured happiness for the poor than for the rich, there is no satiation point.
  2. [2]

    We’re eliding a huge amount of complexity here in terms of modeling the domestic welfare impacts of various policy changes, which we recognize. In practice, our calculations are often very crude, though we try to be roughly consistent in considering distributional issues and weighing whether incomes are increasing due to productivity changes, prevented waste, or other causes.

  3. [3]

    See footnote 33 in GiveWell’s writeup on GiveDirectly.

  4. [4]

    2017 average U.S., per capita income was $34,489, per the U.S. Census. (archive)

  5. [5]

    $34,000 / ($288.35 / 0.9) = ~106. Using median U.S. income rather than mean would reduce this ~20% but seems less apt as a comparison since we’re partially modeling foregone spending and taxes are moderately progressive.

  6. [6]

    See Economic Growth and Subjective Well-Being: Reassessing the Easterlin Paradox. (archive)

  7. [7]

    Too much: there is some evidence of satiation (archive) in terms of self-reported wellbeing even in log terms as incomes get very high by global standards. Additionally, if you think very high incomes carry net negative externalities (e.g., through carbon emissions or excess political influence), you may even think additional income at the high end should be treated as negative. Finally, placing high moral weight on marginal consumption for high-income people seems to imply that their lives “have a lot more value in them” or “are worth a lot more,” which seems problematic.

    Too little: people continue to exercise substantial effort to increase their own income, even at high levels, and there seem to be obvious benefits beyond subjective wellbeing that accrue to them from doing so (such as increased lifespan or educational access). Additionally, if you’re discounting income or consumption logarithmically or more, even very small positive spillovers from high income people to others (e.g., through employment, charity, or bequests) could swamp the first order effects in a utility calculation.

  8. [8]

    This 5-10x cash range translated to roughly ~$2,000-4,000 per “life saved equivalent” in the 2015 cost-effectiveness calculation - XLSX.

  9. [9]

    Based on the median results from GiveWell’s final 2018 cost-effectiveness calculation, 8x cash implies a “cost-per outcome as good as saving an under-5 life” of ~$1,500. This is not directly comparable to the figures from 2015 because GiveWell made some changes in the values and framework used in their cost-effectiveness calculation, which affect both the outcome measures and the comparisons between them.

  10. [10]

    Another way to get similarly high overall ROI figures is from comparing GiveWell’s top charity “cost per life saved equivalent” figures to rich world “value of a statistical life” figures:

    To be clear, this calculation violates the standard assumptions of value of a statistical life, one of which is that the value of a life depends on the income of a person who lives it, and is not endorsed by GiveWell (which has a more complicated moral weights system for comparing outcomes).

  11. [11]

    Since this post was first written, we came across Five-Hundred Life-Saving Interventions and Their Cost-Effectiveness (archive).

  12. [12]

    When we looked closely at one of the calculations in the the DCP2, we found serious errors. We haven’t looked closely at the other sources at all. Overall, we expect the project of trying to estimate the cost-effectiveness of many different interventions in uniform terms to be extremely difficult and error-prone, so we don’t mean to endorse these specific estimates.

  13. [13]

    Some discussion of this in the comments of GiveWell’s 2011 post on errors in the DCP2.