Taxonomy is “the practice and science of classification“. In this post, four subtypes of articles in PubMed Central (PMC) will be identified: 1) Author manuscripts that are publicly accessible; 2) Articles that are embargoed (still under both price and permission barriers); 3) Articles that are Libre OA (all price barriers, and at least some permission barriers, have been removed); 4) Other articles that are publicly accessible, via Gratis OA (price barriers removed, but not permission barriers).
For a definition of “author manuscripts”, see: Author Manuscripts in PMC (webpage last updated: June 30, 2005). An excerpt:
Many of the scientists who receive research funding from NIH publish the results of this research in journals that are not available in PubMed Central (PMC). In order to improve access to these research articles, NIH’s Public Access policy asks these authors to give PMC the final, peer reviewed manuscripts of such articles once they have been accepted for publication.
Get a list of author manuscripts available in PMC.
As of today (August 12, 2009), there was a total of 50704 author manuscripts in PMC. Use of the “Limits” option in a PMC search indicted that none of them were classified as “embargoed”.
The “Limits” option can be used to do a PMC search to find out how many author manuscripts had a publication date within the four months between April 7, 2008 and August 7, 2008. The result of such a PMC search: 7346 (none embargoed).
The initial date for the 4 month interval was chosen because the NIH Public Access Policy is applicable to any NIH-supported manuscript “accepted for publication in a journal on or after April 7, 2008“. The final date for the 4 month interval was chosen because it is more than a year ago. The NIH Policy requires NIH-supported manuscripts to be “accessible to the public on PubMed Central no later than 12 months after publication“. So, after a year, no NIH-supported articles should still be embargoed.
Another PMC search was done to find out how many articles in the PMC Open Access subset were published in the same 4 month interval in 2008. The result of such a PMC search: 3635. This number of (Libre) OA articles (“made available under a Creative Commons or similar license“) is substantially fewer (by about 2-fold) than the 7346 author manuscripts contributed to PMC during the same 4-month interval.
What was the total number of articles publicly (no price barrier) accessible via PMC during this same 4-month interval? The results of such a PMC search: 23582 (plus 378 embargoed). The total (publicly accessible plus embargoed): 23582+378=23960.
The number of articles classified as “not (Libre) OA” and “not author manuscript” can be obtained via another PMC search. The result: 12601 (plus 378 embargoed). The total of “author manuscripts” (7346) plus “Libre OA” (3635) plus “embargoed” (378) plus “not any of these subtypes” (12601) is 23960 (the same as “publicly accessible plus embargoed”, see above).
What was the total number of articles publicly accessible via PubMed during the same 4-month interval? (These include articles that are free at the journal site, in addition to those that are available from PMC). The result of such a PubMed search: 59258. Of these, how many were supported by NIH (either by Extramural or by Intramural research support)? The result of such a PubMed search: 16500 (28% of the total).
What was the total number of NIH-supported articles identified by PubMed during the same time interval? The result of such a PubMed search: 32504.
So, 16500/32504=51% of the NIH-supported articles contributed during this 4-month interval were publicly accessible via PubMed (either via articles submitted to PMC, or via the journal site, or both).
What percentage of the 16500 NIH-supported, publicly-accessible articles were in PMC (omitting those articles that were accessible only via the journal site)? Inspection of a 6% sample (of 1000 of the 16500 articles) indicated that the proportion is about 17%, at present, for this particular 4-month interval (about 2800 articles). The other 83% (about 13700 NIH-supported articles) were publicly accessible in PMC.
Because the total number of articles publicly accessible in PMC during this same 4-month interval was 23582 (see above), a rough estimate of the proportion of NIH-supported articles published during this 4-month interval, and publicly accessible in PMC, is about 13700/23582=58%. This estimate is somewhat greater than the percentage (51%) of NIH-supported articles, contributed during this 4-month interval, that were publicly accessible via PubMed (either via articles submitted to PMC, or via the journal site, or both). Perhaps the proportion of NIH-supported articles that are publicly accessible in PMC is somewhat greater than the proportion, indexed in PubMed, that only are accessible via the journal site?
Summary: The total number of articles published in the 4-month interval (April 7 to August 7, 2008) and contributed to PMC was 23960. The four subtypes of articles in PMC, and their estimated proportions during this 4-month interval, are: 1) Author manuscripts that are publicly accessible (7346/23960=30.7%); 2) Articles that are embargoed (378/23960=1.6%); 3) Articles that are Libre OA (3635/23960=15.2%); 4) Other articles that are publicly accessible, via Gratis OA (12601/23960=52.5%). These proportions are probably not very different for the subset of NIH-supported articles, if it’s assumed that, during this 4-month interval, about 50-60% of the articles contributed to PMC were NIH-supported.
Comment: It will be of interest to monitor any changes in these proportions, as the time during which the NIH Policy has been in effect increases. The monthly manuscript submission statistics have increased by more than two-fold between April 2008 and April 2009.