Wednesday, January 4, 2012

Words and phrases: frequency, genres, collocates, concordances, synonyms, and WordNet

For those of you who like working with corpora, Mark Davies has just released a new interface for the COCA (Corpus of Contemporary American English). Mouthwatering stuff!  Especially the upcoming resource that will let you input a text you've written to get an analysis and suggestions for words that you could use to improve it. My prediction of more corpus-assisted language production tools just came true.  God bless Mr Davies!

Here is a brief introduction to how the site works:

Read on for a copy of the body of the email message circulated by Mark Davies (thanks to Gul┼čen Musayeva-Vefali and Nilgun Hancio─člu for this):

We have just released an important new interface for the Corpus of Contemporary American English (COCA):

Even more so than the standard COCA interface (which will continue to be available), the new website is designed to provide information on nearly everything that you might want to know about a word and its usage -- all on one screen. Users can look for specific words or browse through the entire frequency listing (words 1-60,000). And then for any matching words, they can see:

  • the definition(s) of the word
  • the overall frequency in the 425 million word corpus, and its rank (1-60,000)
  • the frequency in each of the five main genres -- spoken, fiction, magazines, newspapers, and academic
  • 20-30 collocates (nearby words), which provide useful insight into meaning and usage
  • 200 concordance lines (re-sortable), which provide insight into the patterns in which the word occurs
  • synonyms (grouped by meaning and sorted by frequency); can click to see the entries for related words
  • WordNet entries, showing related words with a more specific or a more general meaning

As noted, all of this information is displayed together on one screen, with extensive links from one word to another (which allow to to compare words in many useful ways). If you are interested in English words, their frequency, their meaning, the relationship to related words, and the patterns in which a word occurs, we believe that this new resource will be invaluable for you in your teaching, learning, and research. And as always, it is available for free.

Finally, we might note that in the next month or two we'll be releasing two more related resources. The first will allow you to input a text (e.g. a newspaper article or a paper that you've written) and then it will analyze the text by frequency and suggest other alternatives for highlighted words and phrases (based on COCA data). The second resource is a special version of -- oriented to English for Academic Purposes (EAP) and based on the 85 million words of academic texts in COCA. We'll let you know about these as they become available.


Mark Davies
Brigham Young University

No comments:

Post a Comment