Trustworthiness of Wikipedia entries

A recent commentary by Brock Read, Software Weighs Wikipedians’ Trustworthiness (Chronicle of Higher Education blog, 3 August 2007) is interesting. An excerpt:

A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team’s color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange.

One of Peter Suber’s comments (see Color-coding Wikipedia entries by trustworthiness,Open Access News blog, 4 August 2007):

Some bad entries go uncorrected because few people read them. Hence, I’d trust an entry more if it had a low rate of overwrites and a high rate of readership. Could the algorithm take the extra variable into account?

I wondered: might Google and Google Scholar be helpful?

A webpage at the UCSC Wiki Lab provides details about the software. The demo currently contains only a few hundred pages from Wikipedia. Once there, one can click on ‘Random page‘ on the left-hand side to get to other pages.

So, yesterday (Aug 5) I selected a ‘random page‘. Then I used the results obtained as key words in Google Scholar (GS) and Google (G) searches, and noted the number of results for each search. Finally, I examined the results of the Google search for links to entries in Wikipedia and in Encyclopaedia Britannica. My results for the first series of 10 consecutive ‘random pages’ were:

Page 1 was: Corrado Gini (little of the text was highlighted in orange).
GS: about 725 results for “Corrado Gini”.
G: about 22,300 results for “Corrado Gini.
G result #1: Corrado Gini – Wikipedia.
G result #3: Corrado Gini – Britannica.

Page 2 was: Clark Ashton Smith (little orange).
GS: about 65 results for “Clark Ashton Smith”.
G: about 172,000 results for “Clark Ashton Smith”.
G result #3: Clark Ashton Smith – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 3 was: Corcovado (little orange).
GS: about 2,600 results for Corcovado.
G: about 1,550,000 results for Corcovado.
G result #1: Corcovado – Wikipedia.
G result #16: Mount Corcovado – Britannica.

Page 4 was: Chaparral (little orange – noted that the word has multiple uses).
GS: about 17,300 results for Chaparral.
G: about 4,680,000 results for Chaparral.
G result #5: Chaparral – Wikipedia.
G result #73: Chaparral – Britannica.

Page 5 was: Donegal fiddle tradition (some orange).
GS: 3 results for “Donegal fiddle tradition”.
G: about 2,470 results for “Donegal fiddle tradition”.
G result #1: Donegal fiddle tradition – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 6 was: Caribbean Sea (little orange).
GS: about 17,900 results for Caribbean Sea.
G: about 1,890,000 results for Caribbean Sea.
G result #1: Caribbean Sea – Wikipedia.
G result #8: Caribbean Sea – Britannica.

Page 7 was: CCC (mostly orange – the term has various meanings).
GS: about 478,000 results for CCC (e.g. initials of an author).
G: about 28,300,000 results for CCC (e.g. stock symbol for a company).
G result #24: CCC – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 8 was: Canton (mostly orange – the word has various meanings).
GS: about 131,000 results for Canton (e.g. surname of an author).
G: about 38,500,000 results for Canton.
G result #4: Canton – Wikipedia
[No Britannica entry found in the first 100 results]

Page 9 was: Commodore 64 (little orange).
GS: about 1,350 results for Commodore 64.
G: about 2,220,000 results for Commodore 64.
G result #1: Commodore 64 – Wikipedia
[No Britannica entry found in the first 100 results]

Page 10 was: Collection (mostly orange – the word has various meanings).
GS: about 7,570,000 results for Collection.
G: about 471,000,000 results for Collection.
G result #2: Collection – Wikipedia
[No Britannica entry found in the first 100 results]

Although only a very small sample of 10 pages was examined, some interesting findings were:

1) Nine of the 10 pages were about entries beginning with the letter “C”.

2) The pages that showed the most obvious orange highlighting were those for key words that had multiple meanings.

3) Google Scholar yielded results for all 10 searches. However, #7 (“CCC”) and #8 (“Canton”) were identified by Google Scholar as authors’ initials (for “CCC”) or surnames (for “Canton”).

4) For all 10 key words, Wikipedia entries were ranked higher by Google’s ranking algorithm than were entries in Encyclopaedia Britannica.

Hence, it appears that searches using Google and Google Scholar may serve as useful adjuncts to any assessments of the trustworthiness of entries in Wikipedia.

And, I agree (as noted by Matthew Cockerill, BioMed Central blog, 16 March 2007) that Wikipedia and OA are a “natural match“. Just as the OA movement can do much to contribute to the further development of Wikipedia, so can Wikipedia help greatly to foster awareness of the benefits of OA. See also: John Willinsky, What open access research can do for Wikipedia, First Monday 2007(Mar); 12(3).

On the basis of this sample of 10 entries, Wikipedia and Google are also a “natural match“.

Advertisements

5 Comments »

  1. realitycheck101 said

    The comparisons between Encyclopaedia Britannica and Wikipedia are very interesting.

    From a corporate perspective, Encyclopaedia Britannica is in serious trouble. Britannica never thought that an open source product like Wikipedia would seriously challenge the credibility of its brand. They were wrong and Encyclopaedia Britannica’s staff seriously misread the global market. They are now very concerned about the widespread use of a free Wikipedia vs their paid subscription model. Industry analysis shows that the accuracy of both encyclopedic databases is similar.

    It is interesting that Wikipedia founder Jimmy Wales is developing a new search engine. It is the combination of a) improved search engines and b) the success of Wikipedia that has put financial pressure on Encyclopedia Britannica over recent years. Many libraries and schools are questioning the need to pay to subscribe to Encyclopaedia Britannica when the content is free on the internet. Google even has free direct links to Encyclopaedia Britannica’s main database !!

  2. tillje said

    For some information about the new search engine that’s under development, see, for example:

    Wikiasari.

    Wikiasari: An Answer for Search?: Greg Sterling, Screenwerk, 27 December 2006.

    Jimmy Wales and Wikia Release Open Source Distributed Web Crawler Tool: Press release, PRWeb, 27 July 2007.

    Jimmy Wales’ Search Project Gets a Grub Stake: Rob Hof, BusinessWeek, 27 July 2007. Excerpt:

    Wikipedia founder Jimmy Wales is moving another step closer to creating an open alternative to Google’s search engine. Announced last December, the Search Wikia project is committed to “fix Internet search by working to free the judgment of information from invisible rules inside an algorithmic black box.”

  3. tillje said

    The Wikipedia entry about Encyclopædia Britannica includes a section on Criticisms. An excerpt:

    Various authorities ranging from Virginia Woolf to academic professors have criticised the Britannica for having bourgeois and old-fashioned opinions on art, literature and social sciences.[24].

    The online Encyclopædia Britannica entry about Wikipedia (by Michael Aaron Dennis) includes the comment:

    The English-language version of Wikipedia began in 2001, and by March 2006 it had more than one million articles and was growing at a rate of millions of words per month. Much of its content treats popular culture topics not covered by traditional encyclopaedias.

    But, how to define “popular culture“? For an attempt to answer this question, see: popular culture: introduction (The Applied History Research Group, University of Calgary / Red Deer College, 21 January 2001). Excerpts:

    It is only recently that attention has been turned towards establishing the value of studying popular culture from the condemnation of it. The consequence of this is that as yet no universally accepted definition of popular culture exists…

    The idea of popular culture, as we know it, only came about in the second half of the nineteenth century and for the first fifty years or so was viewed very negatively by those who dared to acknowledge its existence. The idea that “culture” was divisible into different types – high, popular, and folk are the most common distinctions – in the way that society was divisible into classes came primarily from the writings of Matthew Arnold, particularly his book Culture and Anarchy. …

    I agree that Wikipedia contains much material that’s not available in more conventional encyclopedias such as Encyclopædia Britannica. An example (from a Canadian perspective) is the Wikipedia entry for First Nations. There’s subsection about Culture areas (but unsurprisingly, not one about “popular culture“).

  4. tillje said

    There’s an interesting article, Scientific citations in Wikipedia, by Finn Årup Nielsen, in First Monday 2007(Aug); 12(8). Abstract:

    The Internet-based encyclopædia Wikipedia has grown to become one of the most visited Web sites on the Internet, but critics have questioned the quality of entries. An empirical study of Wikipedia found errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks.” This paper describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a comparison against journal statistics from Journal Citation Reports such as impact factors. The results show an increasing use of structured citation markup and good agreement with citation patterns seen in the scientific literature though with a slight tendency to cite articles in high-impact journals such as Nature and Science. These results increase confidence in Wikipedia as a reliable information resource for science in general.

    Excerpts from the section Outlook:

    The present number of structured outbound citations from Wikipedia is quite small compared to the total number of scientific citations found in current scientific literature. …

    However, the use of the cite journal template has grown from zero in February 2005 when first introduced, to 19,066 in November 2006, 24,656 in February 2007, to a total of 30,368 citations in April 2007. ….

    Thus use of structured scientific citations in Wikipedia will very likely continue to grow and increasingly benefit researchers that look for well–organized pointers to original research.

  5. tillje said

    See also: Judging the Reliability of Wikipedia Through Colour by Dean Giustini (Open Medicine Blog, 22 August 2007), and my Comment (28 August 2007).

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: