Post #4: Data Visualization and Organization

For my data visualization project, I analyzed two texts that are instrumental to my research area: George Kennan’s “Long Telegram” from 1946  and Kennan’s 1947 article “The Sources of Soviet Conduct.”  The purpose of  comparing these two texts is to evaluate the differences between the documents.  Kennan wrote the “Long Telegram” as a dispatch to Secretary of State James Byrnes intended only for use within the US government.  In 1947, Kennan penned “The Sources of Soviet Conduct” as an internal report for the State Department, but it was published later that same year under the pseudonym “X” in Foreign Affairs magazine.  The two documents are not identical – the “Long Telegram” is 5,336 words while “The Sources of Soviet Conduct” comes in at just over 6,850 – but their tone and purpose is similar enough that using a data visualization tool like Voyant helps reveal shifts in Kennan’s policy concerns.

After eliminating Stop Words, Voyant highlights, both in the word cloud and list of word frequencies throughout the corpus, the major continuities and discontinuities that exist across the two documents.  “Soviet” is the most frequently used unique word and occurs as an almost identical percentage of the total words used in both articles, slightly over 1%.  The words “power” and “world” are both in the top 5 unique words identified in the two articles, though they do not appear in similar percentages of total words.  However, beyond these three key words a greater variation in usage and percentage of words is visible.  Given the subject matter, comparing the words “communist” and “capitalist” demonstrates differences in the focus of each article.  In the “Long Telegram,” Kennan uses “capitalist” 16 times and “communist” only 9 times.  In “The Sources of Soviet Conduct,” however, this trend is reversed with “communist” appearing 18 times compared to only “12” uses of “capitalism.”

By itself, this type of data does not provide a definitive interpretations of Kennan’s writings, but it does provide a new method of accessing the material.  Voyant is particularly helpful in accomplishing this task because it creates multiple visualizations including a word cloud, frequency list, and trend graph.  While manipulating these tools is not an entirely straightforward process once the visualizations are generated they are pretty transparent and do not require a great deal of specialized knowledge to interpret, thereby avoiding one of the major pitfalls of statistical analysis noted by Theibault.

“Long Telegram” visualization.

“The Sources of Soviet Conduct” visualization.

Unfortunately, I could not figure out how to get the URL’s to link to the data set with the Stop Words already locked in place.  Individual tools (i.e. Word Cloud) were capable of providing URL’s for manipulated data sets, but I could not figure out how to do it as whole.