Monday, February 13, 2012

One click corpora heaven....

Below is a copy of a message circulated to users of Mark Davies' CORPORA sites.  This was a feature that I mentioned last month, when describing the new site.  True to his word, the 'upcoming' feature is now live.

Here is a really quick SCREENRcast to introduce the nuts and bolts of the feature.  At the time of writing, the feature still has some bugs.  Please have some patience as the bugs are ironed out.  It will be well worth it!

God bless you Mr Davies!


We've added a new feature at -- the alternative interface for COCA. You can now input an entire text -- maybe a newspaper article that you've copied from a website, or something you've written -- and it will then give you detailed information about the words and phrases in the text. There's now no need to copy and paste individual words and phrases into the regular COCA interface -- just work seamlessly from your original text.

First, it will highlight all of the medium and lower-frequency words in your text (based on frequency data from COCA), and create lists of these words that you can use offline. This frequency data can help language learners focus on new words, and it can allow you to see "what the text is about" (i.e. text-specific words). You can also have it show you the "academic" words in your text (again, based on COCA data).

Second, you can click on any word in your text to get detailed information about the word (all on one screen) -- its overall frequency in COCA, its frequency in each genre (spoken, fiction, magazine, newspaper, and academic), the 20-30 most frequent collocates (nearby words), up to 200 sample concordance lines, synonyms, and related words from WordNet. There's no need to go consult other dictionaries or thesauruses or online-resources -- it's all right there, with just one click for each and every word in your text.

Finally, you can also see detailed information about phrases in your text. Just click on a phrase in the text, and it will show you related phrases from COCA. For example, if you're writing a paper and have used the phrase potent argument, you could click on that phrase and then have it suggest related phrases based on COCA data -- in this case, where there is a synonym of potent followed by argument. For example, it would list strong / persuasive / convincing argument (all of which are more common in COCA). It will show you the frequency of each phrase in COCA and you can click on any of these to see them in context in the corpus. In this way, it serves as a sort of "grammatical thesaurus" to find just the right phrase in English.

All of this is now available at, along with the features that were there before, including the ability to browse through and search a huge frequency dictionary of English and see detailed information about any word. If you are interested in English words and phrases, their meaning, their frequency, and their distribution in different genres, we believe that this will be an exciting new resource. And as with all of our corpora, it is available for free.


Mark Davies
Brigham Young University


  1. This is truly a fantastic site Steve...
    However, we seem to have lost your voice on the screenr above after the first few sentences??

  2. Hmmm...will check. It could be a streaming and bandwidth issue. I need to do another one as there were a few bugs when I tried it for the SCREENRcast.

  3. looks wonderful Steve..... will definitely be using it, and let you know.