Share this blog entry...

Wednesday, January 25, 2012

Word Cloud's of all the State of the Union addresses ever given by a U.S. President

State of the Union



State of the Union (SOTU) provides access to the corpus of all the State of the Union addresses from 1790 to 2012. SOTU allows you to explore how specific words gain and lose prominence over time, and to link to information on the historical context for their use. SOTU focuses on the relationship between individual addresses as compared to the entire collection of addresses, highlighting what is different about the selected document. You are invited to try and understand from this information the connection between politics and language–between the state we are in, and the language which names it and calls it into being.

The Words
SOTU maps the significant content of each State of the Union address so that users can appreciate its key terms and their relative importance.

The horizontal axis shows the average position of a word in the document. The vertical axis displays the word’s relative frequency, determined by comparing how frequently the word occurs in the document to how frequently it appears throughout the entire body of SOTU addresses (see appendix for details).

Common words (“and,” “the,” etc.) and words that occur frequently in the entire corpus (“states”) are largely filtered out; what remains are words that are especially characteristic of a given address. The size of the word indicates how many times it was used in the document. Click the word to view the full text of the address with the word highlighted. Rollover the word to get detailed frequency data.

The Data
The data underneath the map of significant words shows trends in the language of the State of the Union addresses. On the graph, white bars indicate the word length of each address. The red dots indicate readability as measured by the address’s Flesch-Kincaid score, which is meant to suggest the grade level in an American school for which the text is comprehensible. The actual scores are displayed in the bottom right corner of the interface (for more information on Flesch-Kincaid, see the appendix).

The current corpus contains:

  • 226 documents
  • 1,734,587 words
  • 27,308 unique words


Photobucket

No comments:

Post a Comment