Collecting public information from diverse sources

A very interesting article by Gary Bader, entitled Open Access and Open Source Speed Computational Network Biology Research, was posted on 9 April 2007 at the University of Toronto’s Project Open Source|Open Access website. Excerpts:

A major challenge for studying the cellular network is collecting all known public information from very diverse sources, such as the biomedical literature, raw experimental data and the hundreds of existing pathway databases. Open access content and open source software systems are critical for overcoming this challenge. Once information is freely shared in open, standard formats, it can be aggregated, integrated, searched, visualized and analyzed.

Pathway Commons will be a convenient point of access to biological pathway information collected from public pathway databases, which you can browse or search. Pathways include biochemical reactions, complex assembly, transport and catalysis events, and physical interactions involving proteins, DNA, RNA, small molecules and complexes.

As one example of why such a database might be accessed, consider the OA article e-published on 22 March 2007 by Bowie MB, Kent DG, Dykstra B, McKnight KD, McCaffrey L, Hoodless PA, Eaves CJ, Identification of a new intrinsically timed developmental checkpoint that reprograms key hematopoietic stem cell properties, Proc Natl Acad Sci USA. 2007(Apr 3);104(14):5878-82. An excerpt:

Preliminary analysis of the molecular mechanism(s) involved (18) indicates that it affects a pathway affected by c-kit, a key receptor in mouse HSC self-renewal control and one whose activation differentially regulates fetal and adult HSC self-renewal both in vivo and in vitro (12, 28).

Entry of c-kit into the Search Pathway Commons box at the Pathway Commons site yields Pathway: Signaling events mediated by Stem cell factor receptor (c-Kit).

Pathguide currently contains information about 224 biological pathway resources. Many are freely available.

These examples illustrate the urgent need for agencies that support research to put into place policies of the kind being developed by the Canadian Institutes of Health Research (CIHR), on Access to Research Outputs.



  1. tillje said

    There’s a relevant article by Piwowar HA, Day RS, Fridsma DB, Sharing detailed research data is associated with increased citation rate, in PLoS ONE 2007(Mar 21);2:e308.

    The abstract:

    BACKGROUND: Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. PRINCIPAL FINDINGS: We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. SIGNIFICANCE: This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.

    From the Discussion:

    Indeed, there are many personal difficulties for those who undertake to share their data[1]. A major cost is time: the data have to be formatted, documented, and released. Unfortunately this investment is often larger than one might guess: in the realm of microarray and particularly clinical information, it is nontrivial to decide what data to release, how to de-identify it, how to format it, and how to document it. Further, it is sometimes complicated to decide where to best publish data, since supplementary information and laboratory sites are transient[23], [24] Beyond a time investment, releasing data can induce fear. There is a possibility that the original conclusions may be challenged by a re-analysis, whether due to possible errors in the original study[25], a misunderstanding or misinterpretation of the data[26], or simply more refined analysis methods. Future data miners might discover additional relationships in the data, some of which could disrupt the planned research agenda of the original investigators. Investigators may fear they will be deluged with requests for assistance, or need to spend time reviewing and possibly rebutting future re-analyses. They might feel that sharing data decreases their own competitive advantage, whether future publishing opportunities, information trade-in-kind offers with other labs, or potentially profit-making intellectual property. Finally, it can be complicated to release data. If not well-managed, data can become disorganized and lost. Some informed consent agreements may not obviously cover subsequent uses of data. De-identification can be complex. Study sponsors, particularly from industry, may not agree to release raw detailed information. Data sources may be copyrighted such that the data subsets can not be freely shared, though it is always worth asking.

    Although several of these difficulties are challenging to overcome, many are being addressed by a variety of initiatives, thereby decreasing the barriers to data sharing. For example, within the area of microarray clinical trials, several public microarray databases (SMD[27], GEO[9], ArrayExpress[10], CIBEX[28], GEDP( offer an obvious, centralized, free, and permanent data storage solution. …

  2. tillje said

    Another relevant blog item: Peter Murray-Rust, Copyrighted data, A Scientist and the Web, April 12, 2007.

    Excerpt: “So, funders and academia, your acquiescence to non-Open Data is destroying large areas of potential data-driven science.”

    See also; ACS claims copyright on author data files, OA News, April 13, 2007.

  3. tillje said

    See also the Wikipedia entry for Open Data.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: