Posts Tagged Wikipedia

Alpha version of Freebase

Freebase is “an open, shared database of the world’s knowledge“.

From the Freebase FAQ:

1. What is Freebase?

Freebase is a uniquely structured database that you can easily search, add to and edit; you can also use the data in it to power your own projects. It’s a data commons in the way that a public square is a land commons—available to anyone to use.

Freebase covers millions of topics in hundreds of categories. It’s been seeded with a few million topics from open sources, including Wikipedia and Musicbrainz

Freebase Policies:
* Licensing Policy
* Copyright Policy
* Privacy Policy
* Terms of Service

Example: Search Freebase.com for “open access”.

The first two search results: Open access and Open access publishing.

Comments (1)

Trustworthiness of Wikipedia entries

A recent commentary by Brock Read, Software Weighs Wikipedians’ Trustworthiness (Chronicle of Higher Education blog, 3 August 2007) is interesting. An excerpt:

A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team’s color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange.

One of Peter Suber’s comments (see Color-coding Wikipedia entries by trustworthiness,Open Access News blog, 4 August 2007):

Some bad entries go uncorrected because few people read them. Hence, I’d trust an entry more if it had a low rate of overwrites and a high rate of readership. Could the algorithm take the extra variable into account?

I wondered: might Google and Google Scholar be helpful?

A webpage at the UCSC Wiki Lab provides details about the software. The demo currently contains only a few hundred pages from Wikipedia. Once there, one can click on ‘Random page‘ on the left-hand side to get to other pages.

So, yesterday (Aug 5) I selected a ‘random page‘. Then I used the results obtained as key words in Google Scholar (GS) and Google (G) searches, and noted the number of results for each search. Finally, I examined the results of the Google search for links to entries in Wikipedia and in Encyclopaedia Britannica. My results for the first series of 10 consecutive ‘random pages’ were:

Page 1 was: Corrado Gini (little of the text was highlighted in orange).
GS: about 725 results for “Corrado Gini”.
G: about 22,300 results for “Corrado Gini.
G result #1: Corrado Gini – Wikipedia.
G result #3: Corrado Gini – Britannica.

Page 2 was: Clark Ashton Smith (little orange).
GS: about 65 results for “Clark Ashton Smith”.
G: about 172,000 results for “Clark Ashton Smith”.
G result #3: Clark Ashton Smith – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 3 was: Corcovado (little orange).
GS: about 2,600 results for Corcovado.
G: about 1,550,000 results for Corcovado.
G result #1: Corcovado – Wikipedia.
G result #16: Mount Corcovado – Britannica.

Page 4 was: Chaparral (little orange – noted that the word has multiple uses).
GS: about 17,300 results for Chaparral.
G: about 4,680,000 results for Chaparral.
G result #5: Chaparral – Wikipedia.
G result #73: Chaparral – Britannica.

Page 5 was: Donegal fiddle tradition (some orange).
GS: 3 results for “Donegal fiddle tradition”.
G: about 2,470 results for “Donegal fiddle tradition”.
G result #1: Donegal fiddle tradition – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 6 was: Caribbean Sea (little orange).
GS: about 17,900 results for Caribbean Sea.
G: about 1,890,000 results for Caribbean Sea.
G result #1: Caribbean Sea – Wikipedia.
G result #8: Caribbean Sea – Britannica.

Page 7 was: CCC (mostly orange – the term has various meanings).
GS: about 478,000 results for CCC (e.g. initials of an author).
G: about 28,300,000 results for CCC (e.g. stock symbol for a company).
G result #24: CCC – Wikipedia.
[No Britannica entry found in the first 100 results]

Page 8 was: Canton (mostly orange – the word has various meanings).
GS: about 131,000 results for Canton (e.g. surname of an author).
G: about 38,500,000 results for Canton.
G result #4: Canton – Wikipedia
[No Britannica entry found in the first 100 results]

Page 9 was: Commodore 64 (little orange).
GS: about 1,350 results for Commodore 64.
G: about 2,220,000 results for Commodore 64.
G result #1: Commodore 64 – Wikipedia
[No Britannica entry found in the first 100 results]

Page 10 was: Collection (mostly orange – the word has various meanings).
GS: about 7,570,000 results for Collection.
G: about 471,000,000 results for Collection.
G result #2: Collection – Wikipedia
[No Britannica entry found in the first 100 results]

Although only a very small sample of 10 pages was examined, some interesting findings were:

1) Nine of the 10 pages were about entries beginning with the letter “C”.

2) The pages that showed the most obvious orange highlighting were those for key words that had multiple meanings.

3) Google Scholar yielded results for all 10 searches. However, #7 (“CCC”) and #8 (“Canton”) were identified by Google Scholar as authors’ initials (for “CCC”) or surnames (for “Canton”).

4) For all 10 key words, Wikipedia entries were ranked higher by Google’s ranking algorithm than were entries in Encyclopaedia Britannica.

Hence, it appears that searches using Google and Google Scholar may serve as useful adjuncts to any assessments of the trustworthiness of entries in Wikipedia.

And, I agree (as noted by Matthew Cockerill, BioMed Central blog, 16 March 2007) that Wikipedia and OA are a “natural match“. Just as the OA movement can do much to contribute to the further development of Wikipedia, so can Wikipedia help greatly to foster awareness of the benefits of OA. See also: John Willinsky, What open access research can do for Wikipedia, First Monday 2007(Mar); 12(3).

On the basis of this sample of 10 entries, Wikipedia and Google are also a “natural match“.

Comments (5)