Content analysis

Content analysis is a research method for studying documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner.[1] One of the key advantages of using content analysis to analyse social phenomena is its non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

Practices and philosophies of content analysis vary between academic disciplines. They all involve systematic reading or observation of texts or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful pieces of content.[2][3] By systematically labeling the content of a set of texts, researchers can analyse patterns of content quantitatively using statistical methods, or use qualitative methods to analyse meanings of content within texts.

Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths. Machine learning classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate.


Content analysis is best understood as a broad family of techniques. Effective researchers choose techniques that best help them answer their substantive questions. That said, according to Klaus Krippendorff, six questions must be addressed in every content analysis:[4]

  1. Which data are analyzed?
  2. How are the data defined?
  3. From what population are data drawn?
  4. What is the relevant context?
  5. What are the boundaries of the analysis?
  6. What is to be measured?

The simplest and most objective form of content analysis considers unambiguous characteristics of the text such as word frequencies, the page area taken by a newspaper column, or the duration of a radio or television program. Analysis of simple word frequencies is limited because the meaning of a word depends on surrounding text. Keyword In Context routines address this by placing words in their textual context. This helps resolve ambiguities such as those introduced by synonyms and homonyms.

A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications. There are strong parallels between qualitative content analysis and thematic analysis.[5]