New York Times Chronicle

Counting Names and Phrases in the Times

Chronicle LogoLogo for New York Times R & DThe New York Times Chronicle is  a new resource for “visualizing language usage in New York Times news coverage throughout its history,” which began in 1851. Enter a word or phrase and it will appear as a colored line on a graph showing the percentage of articles it appeared in from 1851 to the present. You can search several words or phrases sequentially and each one will appear in a different color so you can compare them.

Or you can ask for the number of articles. “Obama”first appeared in 2004 in .1% of the articles. In 2009, peaked at 6.63%. In 2012 he appeared in the highest number of articles: 19,675. The peak percentage in 2009 and the peak number in 2012 probably indicates that  they published more articles in 2009. Numbers are not always as clear as we are led to believe they are.

What Are They Measuring?

Jackson Pollack appeared in 1 article in 1944, 2 in 1957, 6 in 1964, and peaked at 8 in 1980. He died in August 1956 and did not appear at all. I think there is a problem with this data. His market soared in 1961 but there were only 3 articles. He continued to be mentioned 1-5 times until 1985 when he dropped off the graph again. He appears a few times more but not as often as I would have expected. Low percentages, yes. Low numbers, no.

Not sure what they are measuring. If someone quotes this data, are they quoting the actual number of appearances or the number that the NYT Labs counted? Either way it is still interesting and I’m sure it will be quoted often.

Export the Data

You can also export the data. Not sure the export will work for you. This is a small excerpt of the export on abolition:

{“article_matches”:289,”year”:1856,”total_articles_published”:18162},{“article_matches”:168,”year”:1857,”total_articles_published”:18168},{“article_matches”:187,”year”:1858,”total_articles_published”:17388},{“article_matches”:177,”year”:1859,”total_articles_published”:14595},{“article_matches”:447,”year”:1860,”total_articles_published”:19288},{“article_matches”:372,”year”:1861,”total_articles_published”:27009},{“article_matches”:424,”year”:1862,”total_articles_published”:25180},{“article_matches”:277,”year”:1863,”total_articles_published”:22563},{“article_matches”:220,”year”:1864,”total_articles_published”:20458},{“article_matches”:328,”year”:1865,”total_articles_published”:22727},

And Produces a List of the Articles

To see the actual numbers and dates, you put the cursor over the line on the graph. If you click on the line, you are taken to a list of the articles with an excerpt. That is really fine.

A very nice resource. One to remember.

 

WordCount

Another fabulous gem from the UK. WordCount is a ranking of the 86,800 most used words in the English language by frequency of use. Presented in the same format as a timeline—a beautiful timeline. Very minimalist and elegant. Perfectly simple. The design itself is worth the effort. You can also use it to analyze the vocabulary on your site—are the words you are using common, if understanding is your goal, or rare if sounding obtuse is  your goal.

From the site:

WordCount data currently comes from the British National Corpus (BNC), a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent an accurate cross-section of current English usage. WordCount includes all words that occur at least twice in the BNC. In the future, WordCount will be modified to track word usage within any desired text, website, and eventually the entire Internet.

You can scroll the horizontal line of words or search for a specific word. Very interesting results. Then you can go to QueryCount that tracks the words that people search. Note: A screenshot of those words would be R-rated.

WordCount: How Many Times Is a Word Used?

Wordcount LogoAnother fabulous gem from the UK. WordCount is a ranking of the 86,800 most used words in the English language by frequency of use. Presented in the same format as a timeline—a beautiful timeline. Very minimalist and elegant. The design itself is worth the effort.

From the site:

WordCount data currently comes from the British National Corpus (BNC), a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent an accurate cross-section of current English usage. WordCount includes all words that occur at least twice in the BNC. In the future, WordCount will be modified to track word usage within any desired text, website, and eventually the entire Internet.

You can scroll the horizontal line of words or search for a specific word. Very interesting results. Then you can go to QueryCount that tracks the words that people search. A screenshot of those words would be R-rated.

Warning: Doesn’t work with all browsers. Try another one.