We have an evaluation problem in the social sector. We want evaluations to be easy more than we want them to be right. Designing good surveys and collecting client data is hard. Rating how we feel about a particular program on a scale from one to five is easy. As a sector, we need to guide funding towards programs that work, and abandon ones that don’t. If we are to reliably move resources towards the highest achieving organizations, we have to define what high achieving means.
To me, the answer to what makes an organization high achieving is clear. The social service sector exists to reduce social ills like poverty, homelessness, and food insecurity. Organizations that have a greater impact on improving the lives of their clients are better than those that have less. Any evaluative framework that is not centered on measuring changes in client indicators is irrelevant. Despite this obvious point, I am dismayed by how celebrated efforts like the Alliance for Social Investing and Greatnonprofits.org fail to base their evaluative criterion on client outcomes.
There is a lot at stake in getting a rating system right (or wrong). The potential harm a poor rating system can cause was illustrated last week in a partnership between Greatnonprofits.org and Guidestar.org. These two rating organizations teamed up to compile a list of the “Top Ten Relief Organizations Working In Haiti.” The list was compiled based on a handful of donor reviews, and as non-profit consultant Gayle Gifford pointed out, those organizations “that were listed in the Top 10, had ONLY 1 or 2 Reviews. That’s it.”
Greatnonprofits.org and Guidestar.org responded to Ms. Gifford’s criticism by dropping the top ten list all together. While these rating organizations certainly did the right thing by retracting their list, it is amazing to me that two supposed evaluation leaders in our industry could have compiled such a hasty, pointless agency ranking in the first place. There is so much that is problematic here, least of all the paltry number of reviews the top ten list was based on.
If we are to ever develop a meaningful top ten list of the most effective social programs, we have to embrace the social scientific complexities of evaluating clients’ social outcomes. This means taking the collection and analysis of client data, in its quantitative and qualitative forms, seriously. Simplistic rating systems that ask donors how they feel about a particular organization may seem seductive, but they could not be more beside the point in determining which organizations are best able to improve the lives of hurting people. So long as we fail to move towards an evaluative framework that is centered on sound social outcomes practices, the only top ten list we can reliably compile is the “Top Ten Worst Ways to Rank Non-Profits.”
Originally posted on inforumusa.org
(Photo by surfspirit)

Hi David,
I absolutely agree: the effort to evaluate nonprofit effectiveness is complicated and fraught with potential pitfalls. This is why we all must work together to develop the necessary pieces to provide the public with a balanced, rich resource of information about nonprofits. We are building GreatNonprofits to be one part of this resource. After all, who would say that the stories of those who have experienced the work are NOT an important part of nonprofit evaluation?
Last week when Guidestar posted about our new page to help people find organizations working for Haiti (greatnonprofits.org/haiti), the headline was wrong- a product of an old template. Gayle was kind enough to inform us and Guidestar changed it immediately. GreatNonprofits and Guidestar (which is not itself a rating agency), are not trying to produce a top ten list of Haiti nonprofits, nor are we striving to promote one nonprofit over any other. We just want to give people who want to help as rich a resource as possible, including the stories and opinions of those who have previously experienced the work of these organizations.
Building such a resource takes time. We’re gathering as many stories as we can from those who have them to tell. We’re looking to everyone to help us spread the word so we can get more reviews of these organizations. We’re hoping that when rebuilding starts in Haiti- when the media has left and the real need increases a thousand-fold- we’ll have our tool to offer, with stories from volunteers, donors, and others who were on the ground during this critical period.
Thank you for your post. This is such an important debate, and my hope is that by working together, we can develop tools that will improve the nonprofit sector and help those who need it most.
Best,
Shari Ilsen
Director of Marketing and Outreach
GreatNonprofits
Hi Shari,
First I would like to thank you for your response. It says a lot about Greatnonprofits.org that you are willing to engage criticism, and that your organization and Guidestar responded as you did to Gayle’s criticism. My point is less to dwell on the Haiti rating incident and is more about the way we go about evaluating agencies.
You write “who would say that the stories of those who have experienced the work are NOT an important part of nonprofit evaluation?” I certainly believe information about how a social intervention impacts an individual is important. As I write, it is really the only thing that matters in assessing agency effectiveness. My point is about the way we collect information. Anecdotes and qualitative data are not the same thing. We cannot confuse user reviews on a website, a process that requires people to be net savvy, interested enough to review, and to know about the website, in order to solicit a review.
This model is so wrought with response bias that I worry it cannot have much meaning. I also worry about the mixing of donor, volunteer, and service recipient reviews. Does Greatnonprofits.org make public, or even collect data, on how an individual reviewer interacted with an agency? My guess would be that greatnonprofits.org is decidedly better at attracting donor, rather than recipient, reviews. I’d be happy to be wrong on this point, is there data you can share on this?
Let me conclude by saying I do not dislike Greatnonprofits.org, and trust me, there are orgs I don’t like, so I say this sincerely. That said, I think this model has serious deficiencies in assessing social impact, and if you can’t determine which organizations are impactful, how can you claim to know which non-profits are great?
Hi David,
Thanks for continuing this discussion! It is, indeed, the ultimate challenge of our mission to come up with the best possible model, given the premise of what we do, to present the public with a picture of each nonprofit’s effectiveness. Of course, the model is not yet perfect, and we are constantly looking for ideas and feedback about how to make our review template better suited for this.
In answer to your question about our reviews, of the more than 20,000 reviews currently posted on our site, client reviews make up about 18% of them currently, while donor reviews make up only 8%. Actually, the large majority of our reviews are posted by volunteers- almost 44%. This also doesn’t take into account the reviews left by board members, who you could argue are also volunteers (note: we have a total of 10 different personas people can choose from to categorize themselves when posting a review).
Granted, these numbers do still prove your point- many of our reviews are written by those who helped the organization, as opposed to those who were helped. Pare of this problem is created by the digital divide- an issue we are addressing as we move into the 3rd year of our existence. But in the larger picture of nonprofit evaluation, impactfulness is only one of the many factors we need to look at. It is arguably the most important, but there are still other things we need to pay attention to, including how an organization uses its volunteers and how it treats its donors.
Finally, the mission of GreatNonprofits is to present information about nonprofits to the public via the personal stories of those who have experienced their work in some way. But that’s actually only one half of our mission. The other half is to help nonprofits spread the word about their impact- especially smaller, local nonprofits that have no marketing budget and no money for direct outreach. We hope that using our site they will be able to gain support that they desperately need. Testimonials from donors and volunteers can make a huge difference as these organizations try to attract this support.
Given the Web 2.0 nature of our model, we’re able to take a double-pronged approach to helping the nonprofit sector. But as I’ve mentioned, our content can only be one piece of the entire nonprofit evaluation picture. What other elements would you say are key to evaluating nonprofits (for example, what are some ways we could measure impact beyond what we’re doing on our site?)
It’s a fascinating topic- I look forward to hearing more about your views!
Shari
Hey Shari,
It is a fascinating topic, and while I am resolute in my belief that outcomes evaluation is the critical factor in determining organizational effectiveness, I understand that there are several complicating factors in evaluating outcomes, and there may well be other organizational indicators that are important. I think Gayle puts it nicely in her comments below on this post, that Great Non-Profits can better add value in providing a forum for donor and volunteer (as your data suggests) feedback.
I think where I get uncomfortable is with the idea that donor and volunteer, or even client, reviews will be used to determine whether and organization is effective at meeting social needs. An organization can provide a great donor or volunteer experience, and may not add much social value. Indeed, an organization could get excellent client reviews and still fail to add much social value (if, for example, changes in client outcomes are actually attributable to macro-economic changes rather than an agency's own efforts).
So, my point is not to say that Great Non-Profits should not be collecting donor, volunteer, and client reviews. Instead, ideally I would want GNP to do more to be up-front about what the reviews can reveal, and what they cannot (not to say GNP is trying to be misleading, I don't believe you are). Ultimately, my concern is that I don't want stories to become a substitute, or a proxy, for evaluation.
I've read through how GNP got started and the mission, as stated on the site. Is there any insight you can offer beyond what is on the about section of the site as to what GNP's vision for evaluation in the non-profit sector is? Do you see GNP becoming a clearing-house for more than just donor, volunteer, and client reviews? Do you invite evaluators, like GiveWell, to upload reports they have compiled on particular organizations?
If GNP can become a hub for more than stories, but for consolidating indicators, that could be pretty powerful. However, even if that was GNP's vision, the reality is few organizations have the necessary metrics to report in the first place.
Thanks, David, for not pulling punches about an important topic for the nonprofit sector. And thanks, Shari and Perla at GreatNonprofits for going farther than just about everyone else in creating a platform where good people can evaluate good causes to help our communities grow stronger.
We’ve added a recommendations and review system at VolunteerMatch.org, too, and today there are more than 3,500 reviews of organizations from people who (hopefully) actually volunteered there. Perhaps in contrast to GN, we set our sights decidedly low in developing the system. We even scaled back some of the tools we were going to implement. Why? Because in the end we thought it much more important that the barriers that might keep volunteers from sharing their experiences be kept low. And, frankly, volunteers are looking at different criteria to evaluate their commitment than the donors and foundations that comprise GN’s audience. Some, for example, are more interested in the kind of work they might be able to do, the people they might be able to volunteer with, and the skills they might pick up.
So now it’s easy to review and recommend an organization, but it’s a far cry from the impact-focused evaluative framework David mentions above. Will we get there? In our case, it honestly depends on what the market wants and needs. We’re watching GN’s progress carefully.
David: can you recommend models you think we’d be better off emulating?
Robert Rosenthal
VolunteerMatch
Hey Robert,
Thanks for your comment. I think the difference in what VolunteerMatch is doing, as you point out, is you are soliciting volunteer reviews for the consumption of other volunteers. This is not an assessment of an agency’s ability to impact social outcomes, rather it is a review of a volunteer’s experience. That seems like a pretty reasonable approach, and scope, to me. I’d imagine as a volunteer that is pretty helpful information.
An organization that does not have much of a social impact can provide an excellent volunteer experience, and vice versa (or any other combination). My concern is when we take donor and volunteer reviews and imply that those reviews say something about an agency’s ability to produce social outcomes. Even if the reviews are from service recipients, I think there is too much response bias and other barriers to count as serious data.
Rating organizational effectiveness is critical for our sector. I think the first step in that direction is developing a consensus on what matters, and what doesn’t, in rating agencies. My professional focus is in the social service sector, for my sub-industry I think we have to focus on social outcomes indicators like changes in poverty level, educational attainment, incidences of violence, etc. Anything else falls short.
I think GreatNonprofits could be helpful to donors who want to evaluate the donor stewardship of an organization they are giving to. Why not know how well donors are treated and valued? If I were considering two equally “effective’ organizations, I’d certainly want to end up with the one who values me the most.
Though even that level of consumer rating can be fraught with peril. I just finished reading today a very cautionary article in INC Magazine about the consumer rating site YELP that pointed out concerns about [potential conflict sof interest when such sites are dependent on sponsorship or advertising for their financing. A conflict that many of the reviewed pointed out felt like they were being shaken down to buy advertising to influence what reviews were highlighted on the site.
But I have another question, David, for rating effectiveness even of social service organizations. I have to wonder how you are considering the critical network of social service organizations, all of which potentially have impact on client outcomes. For example, a community mental health center might have great client outcomes, but what happens if that client looses his/her job, or apartment and the surrounding network of NGOs doesnt’ have the resources needed to help that individual regain their footing? How will effectiveness ratings take into consideration community wide dislocations (the major employer moving to Vietnam) or economic meltdowns such as this one?
You are right that no one social service agency meets the holistic needs of clients (nor should it). You are also right that organizations work with clients that are subject to greater macro factors, not just the work of other service providers but changes in economic conditions and other exogenous variables. A thorough investigation of an agency’s social outcomes will include clearly defined outcomes variables and control groups.
In your example of the community mental health center, let’s say they believe their mental health services help workable clients re-enter the workforce and increase their earnings. A proper evaluation should include collecting client work and income data, but also should include a control group that does not receive the mental health services but who are otherwise similar to the agency’s clients. Now, as you suggest, lets say the economy in that area goes south. If the mental health center does have an impact on income, that effect should be comparatively visible to the control group, even though incomes of both groups would likely decrease.
While control groups are the ideal, few organizations use them in evaluation, and for good reason, they are expensive and complicated. Even without using control groups though, you could argue that in an economic meltdown an organization that tries to impact the income level of its clients should still do comparatively better to the general regional populous, even if net income decreases.
You said
“you could argue that in an economic meltdown an organization that tries to impact the income level of its clients should still do comparatively better to the general regional populous, even if net income decreases.”
Could you give a more specific example that you think would describe this?
Let’s say an organization aimed to increase the employment rate of a particular demographic in a specific geographic region. An imperfect comparative measure could be to use the regional employment figures for your target demographic as a baseline measure, then compare the employment statistics of the organization’s clients against that. Ideally, an organization would use a control group, a random sampling of people in the same targeted geographic area who do not receive an organization’s services, but otherwise are comparable to the people the organization serves.
The general point is that if macroeconomic factors are the sole determinant of whether people get employed or not (extending the employment service provider example) then how could we say this social service agency is in any way effective? An effective organization should provide better outcomes, on average, for its clients than for an otherwise identical grouping of people experiencing the same market conditions in the same geographic region who do not receive the services of the agency in question.
Your interesting article is highlighted on the D3 blog by noted Iraq guru Robert J. Swope. You can read it at:
http://www.robertswope.com/home/2010/2/12/d3-week...
Keep up the fine work.
[...] I believe evaluations need to be a central part of the work we do in the social sector, and that not all evaluative frameworks are created equal. Certainly establishing a reliable system for measuring [...]
[...] Great Nonprofits deserve a great rating system – full contact Philanthropy – a look at the need for good rating systems and the impact if charity rankings aren’t well thought out. [...]