Access

Published online 28 July 2008 | Nature | doi:10.1038/news.2008.988

News

Stats reveal bias in NIH grant review

Alternative system could make 'fairer' funding decisions for a quarter of awards.

The system used by the US National Institutes of Health (NIH) to evaluate grant proposals does not adequately compensate for reviewer bias, a new study concludes.

The assessment of grant reviews generated by more than 14,000 reviewers suggests that the NIH needs to overhaul the peer-review system it uses to rank proposals, according to biostatistician Valen Johnson of the University of Texas M.

Comments

Reader comments are usually moderated after posting. If you find something offensive or inappropriate, you can speed this process by clicking 'Report this comment' (or, if that doesn't work for you, email redesign@nature.com). For more controversial topics, we reserve the right to moderate before comments are published.

  • Important work, but still tinkering which fails to get at the key conceptual issue of how X dollars can be distributed among Y applicants to advance knowledge optimally. For this, the approach must be historical. For more see http://post.queensu.ca/~forsdyke/peerrev.htm

    • 29 Jul, 2008
    • Posted by: Donald Forsdyke
  • Typically, when resources diminish, more of the population struggles to meet their needs. Further, those in need begin to spend relatively more time and energy pursuing resources. Competition often begets secrecy, and worse encourage formation of alliances designed to better compete by a variety of dubious means not considered in times of sufficiency. The biomedical research community is undergoing a period of scarcity in funding, in one simple view because the growth of NIH's budget led to an increase in the number of researchers, and recent budget stagnation results in resource stress. The coincidental end of the Whittaker foundation, which created many new biomedical centers with a burst of startup funding as it gave away all of its money, left many hungry mouths for NIH to feed in the longer term. All of this stress exposes weaknesses in the grant review process which are not significant when resources are sufficient to assure funding of those grants which are closer to the gray area between meritorious and undeserving. Errors in judgement, whether from bias or lack of expertise or character flaws in the reviewers, have minimal impact when there is enough to go around for those who are essentially 'good enough', and program managers also have more leeway to address exceptions. These days, with funding cut lines at 10% (for scored applications), the times are bad enough to write home about ... between grant applications, that is. There will be blood, as they say, and attrition is a certainty. Just how bad it will get until the funding situation is recognized and repaired (by the executive and legislative branches) is unknown. But arguing about the review process now is to fight over crumbs: rather, the community needs to decide how much to support medical research, and stabilize that support over the long term so that the population of researchers better matches the available resources. Then, we can afford to review the review process itself. .

    • 29 Jul, 2008
    • Posted by: Peter Kaczkowski
  • I agree with Peter that "There Will Be Blood". Unfortunately, a better system is unlikely to result from the resource wars. This is in at least part because those who are likely to administer the system are likely to be those who in some way rely on the funding provided by the same system. Furthermore, as though it wasn't already difficult to fund projects that deviate in any way from the common wisdom, it will become far harder to to fund novel and competitive approaches to problems. At the very least, it will be quite tricky to balance incentives with objectivity in choosing what to fund.

    • 29 Jul, 2008
    • Posted by: John J. peloquin
  • When I served on a review panel, I read every proposal -- not just the ones for which I was a designated reader. How did the study ascertain the number of actual as opposed to nominal readers?

    • 29 Jul, 2008
    • Posted by: David Kaye
  • I am a "veteran" NIH reviewer and can say with some certainty that there can be no way to ascertain how many people have read a proposal before the vote. The number of written critiques does not reflect the extent of reading, as there is no rule requiring every reader to provide written input. In principle, people who have read the grant offer comments that are reflected, without attribution, in the summary written by the SRO. Moreover, people with background or research interests aligned with the reviewed application are very likely to read the application voluntarily and offer important insight during the discussion. It is sad that some statistical evaluation, as complex as it may be, but clearly flawed, is used to throw mud on a process that has produced stellar science, and call it "primitive", thus encouraging those who would continue to hold back support from one of the nation's most important activities - research and the furthering of beneficial knowledge. We all are struggling to improve the process, and constructive insight based on solid information is helpful (and was used in the current re-evaluation). The study in question does not appear to serve this purpose, however.

    • 29 Jul, 2008
    • Posted by: Harel Weinstein
  • Johnson's analysis may be correct but the solution he proposes is inane. All it would accomplish is to encourage some bright folks to further game a faulty system. Besides the assumption that science can be divided up by costs of projects has no obvious basis. I would like to suggest attacking the problems at another level entirely: 1. <i>SAR ... these jobs are greatly undervalued and all to often now attract careerists</i> who have not succeeded in the very system they are charged with reviewing. As a result SAR often lack the knowledge to assist the Chair in "leading" a meeting. Propose: <b>SAR jobs could be transitory served by interns or some sort of rotation system.</b> Propose: <b>Trained admins can provide the needed administrative backup better than folks trained as scientists. </b> 2. <i>Location of the review process in Bethesda</i> results in a a need to have career NIH staff in SAR positions. Why not take advantage of modern tools? Propose 3. <b>Have review done at divers locations.</b> This might also allow a stronger system of rotation of SAR since assistant profs could take on this role as part of their research training (remember the admin part would be done by pros>. 3. <i>Chairs vary immensely in leadership skills, knowledge</i> Also, other than convening the sessions, there is little specific charge or training that would allow a good chair to moderate some of the preferences issues raised by Johnson. Propose 4. <b> ad a vice chair, have chair and vice serve in succession. </b> Propose 5. Either the chair or the vice chair should take NAMED responsibility of for each grant and either write a 3 sentence summary of the action or use a check box system to assure that she or he has considered the issues raised by Johnson. 4. <i>The current process is NOT based on any strategic thinking. </I> Institute staffs lack the expertise to provide this and the review process discourages strategy that covers more than a few grants. Propose 6. <b>Depoliticize the Councils</b> The councils should serve a real job in the review process. This means they need to be mainly made up of scientists. A formal and difficult appeal system, based on issue like those addressed by Johnson, would put pressure for reform on the primary review as well.

    • 29 Jul, 2008
    • Posted by: Stephen Schwartz
  • test

    • 29 Jul, 2008
    • Posted by: Grant Gallagher
  • as a current NIH reviewer sitting as a regular ad-hoc on two study sections I would like to comment. First, there is a lot of old-boy network cronyism going on. Second, there is a huge variation between study sections and how they behave. For example, some Chairs allow proper discussion, some strictly limit this. Some panels have an unwritten rule that first round applications always get streamlined. It is true that three people read the grant and few if any people read all the proposals. The majority of reviewers in my experience work hard to be fair and to protect the interests of applicants, especially new investigators and those for whom the application represents their sole source of funding, but some do not, preferring to give money to those who already have it on the basis that these are "successful" and "productive" investigators. I've seen, often, new investigators righteously being denied while older people with mediocre applications get good scores. On the other hand, I've seen people fight hard for a grant and sway the committee in its favor. The point is that this IS an imperfect process and it certainly could be better, but by and large people do their best and even though there are clear abuses, its strengths outweigh its weaknesses. Some things could be improved - I'd scrap the rule whereby any proposal with one score of 1.4 or better (even if the other two are 5.0s) must be discussed - people abuse this too often. I'd decide the streamline on the median, not the mean, score. I'd streamline 75% of the applications to allow real discussion on those with even half a hope of achieving a fundable score (currently, reviewers are obliged to select 60% of the proposal not to be discussed). I'd limit the number of grants any one investigator can hold in order to allow more investigators overall to be funded (I've seen an investigator get a fundable score on his 7th grant - 7 grants running concurrently when new investigators are losing their labs is just Not Right). I'd limit the indirect costs to 40% - I wonder how many people know that some institutions get their total awards almost doubled because of their indirect cost rates - if NSF can limit indirects and provide the same amount to every investigator, so can NIH - again, the result would be to increase the number of grants awarded overall. Peer-review will always be flawed because of the human component, but things could be improved. What is needed is an acknowledgement by CSR and NIH that there isn't enough money to go around and that the best science doesn't always come from the biggest fattest groups. On the contrary - these often write the most complacent applications. FInally, they need to scrap the insane 'chatroom' review experiment. It's insulting to both the reviewers and the applicants, who deserve to have their proposals actually discussed in a forum where the participants are focused and engaged, not flitting in and out. Oh yes - to the person who said they read every application? I simply do not believe ANYONE who says that for a regular NIH panel - noone can read 70 or so proposals in the month one usually gets between the CD arriving and the data of the meeting. It's hard enough to read the 8 - 10 one is actually asked to review.

    • 29 Jul, 2008
    • Posted by: Anon Ymous
  • The problem with government funding of research is that it takes money by force from people (taxpayers), some of whom do not want to fund research. This is either immoral or criminal, depending on your point of view. Those people from whom the money has been taken then demand control over the research itself. Thus, government funding of research has enabled animal rights groups to restrict animal research and has given them the means to eventually end it.

    • 30 Jul, 2008
    • Posted by: rickye heffner
  • I am an ad hoc NIH grant reviewer. I like the comments posted by Anon Ymous. Limiting numbers of grants per investigator is a very fair approach. It is simply not right that certain individuals can have unlimited resources to pursue sciences (all provided by the tax payers), whereas some bright new investigators would not get a chance. Unequal distribution of indirect cost is also not fair. If certain regions of the country is more expensive to run business (like higher real estate prices, higher costs of living), then there should be indirect cost indexing to the region of the country, but not to particular institution. Again, I agree the grant review is and will always be an imperfect process, since human reviewers (who are always imperfect) are doing the works. Uniformed approaches across all study sections will make it fairer.

    • 30 Jul, 2008
    • Posted by: Larry Chan
  • The comments by Anon Ymous are right on the target. Limit grants to two or three per investigator. A person can't really supervise more than that. These people hire Asst. Prof. level scientists to run the grants. Just let the new people apply for the grants directly since they are doing the work. Capping overhead at 40 percent is also a great idea. The institutions with overhead numbers larger than that will figure out how to do the research if they want the grants. Why not fund more research projects? And, it will save all the time that the institutions and NIH put into the evaluation of what the overhead number should be through the generation of documents that are hundreds of pages long for each NIH-receiving place.

    • 30 Jul, 2008
    • Posted by: William Chaney
  • As a long-time veteran reviewer- service on 4 panels since the mid-80s, I have a few historical perspectives to add. First, reading every one of the ~ 100 applications that the various study section which I served on receive is simply not possible. However, my impression is that the reviewers who read the application do a conscientious and competent job. There are, however, some clear areas for improvement. That comment aside, the system could obviously be improved to "spread the wealth around" more. I agree completely with Dr. Anon Ymous; indirect costs must be capped. It is incredible that institutions receive 100% (or more!) indirect costs, especially when those institutions typically have huge endowments to begin with. It simply doesn't cost that much to "administer" a grant. A 40% cap would go a long way towards increasing the number of funded applications. Second, salary support should be capped as well- not the salary level, per se, but the amount of effort permitted to be charged to grants overall. Every university, including medical schools expects that faculty teach, as well as do research. These are expected responsibilities, and thus duties for which the faculty member should be paid by the institution. Simply expecting NIH to cover all the salary for a particular researcher directly contradicts this obligation. This problem is not restricted to private institutions. There are many state institutions which also expect the faculty to raise their own salary. Capping the % salary which can be supported overall would again reduce grant budgets. Third, Centers (P30, P50 awards) should be examined very closely, as they are huge financial "sinks", and in some instances really don't accomplish their espoused missions. Moving funds from these "flagships" of NIH institutes again permits a greater number of R-type grants to be awarded. Fourth, there should be a limit on either the number of grants, or dollar amount overall that an individual can have, or be a Co-I on. As with Dr. Anon Ymous, I too have seen instances of applicants with 5+ R-type awards seeking yet another grant. Putting some caps on the amounts provided as above would begin to limit these practices, although as with any other system, it would be open to "gaming" by simply having junior members of a large lab group submit applications in lieu of the mentor. I agree completely that the "chat room reviews" are a bad idea. All they end up doing is protracting "discussion" of a proposal, and in my experience the longer one talks about an application, the worse the score becomes, so this practice is a serious disservice to the applicant. Bottom line: it's imperfect, but is still the "gold standard" around the world. No one emulates a review system used in any other country, and for many of the European and Asian countries, the support system is much more "caste-driven" than in the U.S. There are, however, some simple changes that NIH could institute that would go a long way in improving the overall distribution of the available resources. These would ensure that more applicants get some level of support with which to establish research programs.

    • 30 Jul, 2008
    • Posted by: Bill Atchison
  • The NIH review system has another source of bias that is unfair, affects funding decisions for many grants, and could be fixed with no radical changes and at no cost. Over the course of ~2 days of intensive review discussion, the scoring drifts. That is a grant that might be voted 1.6 in the beginning of the meeting is likely to get closer to 1.5 or even 1.4 later in the meeting. In my experience there is sometimes discussion in the meeting to try to stop this kind of drift in the scores. Such discussions help but don’t solve the problem entirely. I propose the following solution that should be effective even without knowing the magnitude or direction of the drift. Drift only really matters for grants of similar merit. If the grant discussion is organized based on the preliminary scores posted by reviewers prior to the meeting, then grants of similar merit will be discussed at about the same time during the meeting, and, in this case, drift will have little effect on the final rank order of grants. Drift will either expand or compress the overall scoring range depending on whether discussion is organized from worst score to best or vice versa. The cost of this solution is the loss of flexibility in organizing review meetings. I believe that drift introduces a significant unfairness into the review process and a quantitative assessment of the effect should clarify whether or not steps should be taken to correct it.

    • 30 Jul, 2008
    • Posted by: Janet Leatherwood
  • Dr. Johnson is right, there is a "bias," his. It appears that he had an axe to grind from the beginning. I believe this because he tipped his hand by using the word "primitive." I have been on many study section reviews and it didn't take me long to realize that reviewers are participating in a system that is embedded in a democracy. Democracy allows all to have their say; this slows progress in making systems, such as the NIH review panels, more efficient and fair. The high administration of the NIH should be congratualted for recruiting Dr. Scarpa to change and hopefully improve the system, not fix it: fix is impossible in a democracy, even when 51% of the people believe it is fixed. I suggest that the formula for the indirect rate include the size of the endowment of the institution. This factor should have considerable weight because this would give the NIH a partnership role with the institution in how they allocate the interest from the endowment for infrastructure. I have always thought that the review system could be improved because a number of years ago I submitted a R01 and received a score of 112. At the time they were funding at the 111 level. I was advised by the administrator of the study section that I should not change a word and resubmit because the study section really liked my proposal. I did not change a word and in the next review, by the same study section, I received a 345 score. I bellieve this could only happen in a democracy; a small price to pay for living in a democracy. Alfred A. Rimm PhD

    • 30 Jul, 2008
    • Posted by: Alfred Rimm
  • I feel very sorry for Dr Rimm and the poor advice he received. To return a proposal to an NIH study section without the requested changes is to be deemed 'non-responsive' and dinged. There would also have likely been a question of why there had been no progress in the interim. I can only imagine what the "response to reviewers" looked like. Tragic. People work hard on study sections, but they are most definitely not democracies.

    • 30 Jul, 2008
    • Posted by: Grant Gallagher
  • In response to Dr. Leatherhead, one study section SRO did precisely what you suggested- placing the applications with the best preliminary scores first. I thought that it worked very well. This process had two benefits. First, it "set the standard" for considering other applications. Second, it permitted more discussion relative to applications that were scored in the "gray area", which depending on funding institute and study section, can range from 1.3-1.6. Adequate discussion is always needed for applications in this area, especially for A2 apps. I felt that this was a very useful approach. Of course, as with any other imperfect system, it too can be "gamed" if a reviewer "sandbags", namely giving a poorer initial score, only to espouse a radically better score on the final vote in order to move a preferred application into the F-range. Still,the idea of organizing the discussion to begin with the truly meritorious applications works, and is an excellent option for SROs to incorporate into their study sections

    • 31 Jul, 2008
    • Posted by: Bill Atchison
  • Dr. Anon Ymous is right on target. I would add, that I don't know why it wouldn't be possible for NIH grant proposals to be anonymous. Already there are built in barriers and requirements to apply for NIH funds, so, if the NIH review system is truly democratic it could tolerate such an anonymous system. Personally, I would find it very interesting to know how many established labs would be weeded out by such a system. (Certainly it would weed out the complacent and those sending proposals to study sections with "friends" on board). Also, more rigourous controls must be established on grant double-dipping. It is no longer tolerable to let grant holders submit substantially overlapping grants, i.e. holders of 7 grants should be a red flag! Who is checking? Even, allowing one "head" to administer multiple grants is inherently an abuse of the system, why shouldn't the individual who is actually doing the work be charged with administration of the grant? That way the individual grant holder is directly accountable for the failure of the research, and also credited with the success of the research. What accountablity is there to the head of department who applies 14% of effort? The system as it stands perpetuates the intellectual sweatshop that the universities have become ... or if you prefer, the mini-fiefdoms that head of departments run.

    • 31 Jul, 2008
    • Posted by: Wendy Gombert
  • Is the article being reported saying that there is a problem with assigned readers being more stringent than non-readers? Perhaps this is a good thing: readers are presumably assigned on how close the project is to their own expertise. They therefore see flaws that others might miss. (Of course, ironing out this "bias" would increase "riskier" projects being funded, so perhaps the bias is a bad thing because it makes the funding too conservative. However, it shouldn't then be dressed up as increasing fairness.) As for spreading the NIH budget more fairly, how about making all the grants budgets smaller along with the applications. Most PI's would rather have a smaller grant than none at all.

    • 01 Aug, 2008
    • Posted by: Jeremy Green
  • As a senior investigator who has been continuously funded for nearly three decades, and who has served on many study sections and review groups, I have just received nonfundable scores on a competing renewal of a grant that has run for 27 years. Without going into all the gory details, suffice it to say that it became abundantly clear that the primary and tertiary reviewers of my application lacked appropriate expertise and knowledge to provide an adequate review. The responsibility for matching reviewers with grant applications lies with the Special Review Officers, who are Government employees, usually with a PhD in some area of science. In my case, an appropriate match did not occur, and I believe it was because the SRO did not know the types of work I did in my laboratory, and thus was unable to select appropriate individuals who could review it. One possibility to remedy such a situation might be for NIH to consider asking each applicant to provide a list of the techniques and knowledge that a reviewer should possess in order adequately to review their proposal. Those lists could then be provided to potential reviewers who would then be asked to check off the items where they had specific expertise, thus enabling the SRO to make better assignments. At the present time SROs make these assignments based on reputations, word of mouth, and apparently a sort of seat of the pants approach; some are much better than others. Scientific journals ask for lists of expertise from potential manuscript reviewers, so why doesn’t NIH? It would be an easy process to implement, and could be done anonymously early on as the SRO was deciding how to make assignments. If none of the potential reviewers had relevant expertise, the SRO could seek ad hoc reviewers. To illustrate this problem, on a study section where I served many years ago, I was given a grant to review (not as a primary reviewer, fortunately) that proposed experiments to create a transgenic knock-in mouse. Although I could assess that the transgenic animal would be very important to construct and study, at that time I was primarily a synthetic medicinal chemist. I protested that I should not have been assigned this grant, but was told, “Do the best you can.” I took my charge very seriously, but it required an inordinate amount of time to learn about this relatively new technology, and to poll colleagues who knew much more about the field of transgenic mice than did I, and learn from them. All of that for only one of the applications I was assigned to review. Fortunately, the applicant was a senior investigator with a solid record in this area, and that also gave me confidence that this person could construct the proposed transgenic mouse. All reviewers are not so thorough, however, and they often make decisions about science where they really don’t have enough knowledge. Instead of saying, “I don’t know this well enough,” they go ahead and try to make some kind of comment, typically seeking some flaw in the details of the methodology. Thus, I would suggest that lack of relevant expertise on the part of some (but not all) reviewers is a problem that could be remedied rather easily.

    • 01 Aug, 2008
    • Posted by: David Nichols
  • I am concerned about applications that do not get scored because they are deemed too risky. Applications that raise more questions than provide guaranteed clean answers. Applications that do not nicely fit into the popular models. Applications that address a complex problem. I think that these applications also represent good science and are deserving of funding because in the long run they might have a good, if not higher, chance of opening up new avenues for research. These applications are generally submitted by ‘fringe’ PIs in a field, those who are threatened with extinction when their grants are rejected in the worst possible manner, “not scored”. Do these PIs deserve support in the R system, perhaps the only system that currently offers real hope for them? If the majority answer is yes, then resources could be invested to develop tools or devices to identify such grants from the pile of un-scored applications. Possibly, a small number of applications submitted to each study section could be evaluated by the same study section but with a different set of criteria and scoring method. Although other mechanisms such as R21 or R03 exist, they do not provide the resources like a R01 to rigorously address a hypothesis, particularly a very novel or complex one. If additional funds are required to support this approach, may be it is possible to divert funds from the pioneer program. After all, it is quite possible that true pioneers lurk within the masses of rejected R01 applicants.

    • 01 Aug, 2008
    • Posted by: Cedric Wesley