Measuring Quality in Online Text

As the social web grows and people become increasingly socially aware, news sites are becoming ever larger discussion communities where users can address and comment on common issues spurred by the news articles. One of the key features promoting the success of these online communities, is the large-scale user-engagement seen in the forms of rating, tagging and commenting on content. User-contributed comments offer a much richer source of contextual information than ratings or tags, albeit often a “messy” source of information. Comments are often variable in quality, substance, relevance and style.

Websites like Digg.comm and have successfully deployed systems to moderate their comments. These systems often make use of regression to predict ratings for comments as they come in. The comments are then given a rank within the comment thread. It becomes valuable to be able to capture the quality of comments when trying to algorithmically improve the quality of commentary on articles via moderation. Various features can be constructed from different measures of quality and then used to train regression algorithms for rating prediction. Below I discuss various interesting features that can be extracted from comment text.


When measuring informativeness, one attempts to capture how unique a comment is within its relative thread. A popular informative measure is Term Frequency – Inverse Document Frequency (TF-IDF).
The informativeness of a word is the product of the term frequency and the inverse document frequency. Term frequency is often just taken as the number of times the term appears in the comment, or alternatively, what percentage of the comment is taken up by the word. The inverse document frequency is a measure of how much information the word provides, that is, whether the term is common or rare across all the comments in the thread. This is taken as the total number of comments divided by the number of comments in which the word appears.

The informativeness of an entire comment is then taken as the sum of the informativeness of the individual words in the comment. It is a powerful measure that may capture an author’s prowess in expressing themselves uniquely from the rest of the authors in the comment thread.


The readability of a comment is defined by the level of ease the reader is able to read the comment with. This can be quantified by the Flesch Reading Ease Test (FRES). The system was the basis for the Flesch-Kincaid readability measure that was used on military documentation in the united states navy in 1975. A high score (above 90) indicates that the text can be understood by an average 11-year old, whereas conversely, a low score (between 0 and 30) indicates that the text will probably only be understood by university graduates. Reader’s Digest magazine has a readability index of about 65, Time magazine scores about 52, an average 12-year-old’s written assignments should carry a readability score of 60–70.

The formula for the FRES is:

FRES(Cj) = 206.835 – 1.015 (Wj / Sj) – 84.6 (Bj / Wj)

where Cj is the j’th comment in the thread, Wj the number of words, Sj the number of sentences, and Bj the number of syllables.


The relevance of a comment can be measured relative to the article or relative to the comment thread that the comment is present in. To calculate the relevance within the comment thread, the overlap between the words in the comment and the words in the thread, is quantified. A bag of words (BoW) is generated from all the comments in the thread, and sorted according to their individual frequencies. The relevance of a single comment within its comment thread, is then measured as the number of words that overlap between the text in the comment and all the words in the thread.

I am currently doing research to determine whether these, and other features, show any correlation to actual quality measures, like the number of up-votes and down-votes a comment receive in its lifetime. Extracting accurate features from the comment has been shown to be of great importance when trying to show correlation with real comment quality. Whether predicting the quality of new comments can be an indicator for human behaviour regarding the comment, remains to be seen!

No comments yet.

Leave a comment

Leave a Reply