Dabbling in text mining to study ‘othering’ on Bonfiire Stellenbosch


For my Honours research project this year, I analysed ways of increasing user participation in blog-based social networks. My case study was Bonfiire, a multi-community network for discussion and debate, that I co-founded with a friend in 2012. Since the public launch of Bonfiire Stellenbosch (our first community, targeted at Stellenbosch University students and alumni) in January 2013, we’ve seen the platform evolve into a vibrant virtual space for the discussion of campus issues. In my research project, I focused specifically on modelling Bonfiire using a system dynamics approach. However, along the way, I’ve stumbled across a number of interesting phenomena that I am very curious to examine further.

One of the things I’ve been wanting to do, is to use text mining to gain insight into the focus and nature of discussion on Bonfiire, as well as the way in which users express themselves. In recent years, there has been a growing interest in large-scale text mining as a means to gaining rich insights from unstructured (i.e. natural language) data sources. Bonfiire presents an interesting case in this regard, because blog posts by users are long (usually three to six paragraphs) and rich (packed with the writer’s sentiment towards and opinion on specific topics).

In this blog post, I’m going to use rudimentary text mining to answer the question: What were the main discussion topics in the months that saw the highest frequency of ‘othering’ words (“they”, “their”, “them”)? The time frame will be January 2013 to October 2014.

The concept of ‘othering’, and measuring it in Bonfiire blog posts

In Wiktionary’s definition, ‘othering’ refers to “the process of perceiving or portraying someone or something as fundamentally different or alien”. ‘Othering’ focuses on emphasising how one, or one’s group, differs from others, through using exclusionary language (“our culture”, “we condone such practices”, “they reject our idea”, “the leader is one of them”) and, in a broader sense, implicitly identifying the ‘other’ in a negative light. This article describes the language of ‘othering’ in greater detail.

Arguably the most rudimentary level at which ‘the other’ is indicated in the English language, is through the use of the words “they”, “their” and “them”, referring to out-groups in the third person. These words can easily be mined in large heaps of text. That is precisely what I intend to do, to measure to what an extent a post may be said to use ‘othering’ language. (Of course, my approach has a number of important limitations. Fear not; I discuss these later on.)

My approach

To do the above, I followed a number of steps:

  1. I parsed the content of all Bonfiire Stellenbosch blog posts in each of the months from January 2013 to October 2014.
  2. I tallied the number of times the above-mentioned ‘othering’ words were used in each blog post, and divided this by the number of blog posts in the given month to obtain the average “othering words per post” number for each month.
  3. I graphed this over time to see trends in the use of the use of ‘othering’ words.
  4. I identified four months of interest, where the number of the ‘othering’ words per post were particularly high.
  5. For each of these months, I extracted the most-used tags (short words added to blog posts as metadata, by the writer) and visualised these in tag clouds.
  6. Finally, I showed each of the tag clouds with the average number of ‘othering’ words per post for the given month, to see if anything interesting pops out.

The results

I’ve created a short slideshow to show the results of the above. Watch it below, but first read these notes:

  • At 00:13, I’ve graphed the number of posts per month over the 22 month period, to serve as context. I’ve also indicated University holidays (December-January; June-July), where the posts per month understandably dip quite a bit, so that we can ignore those in the results.
  • At 00:24, you’ll see the average number of ‘othering’ words per post graphed over time. I’ve indicated the four months of interest. Note that we can ignore July 2014 (“2014-07”), because the number of posts in that (holiday) month was too low to be meaningful.
  • From 00:39 onwards, I show the tag clouds for the four months of interest.

What’s interesting

  • February 2013 had an average of 5.625 ‘othering’ words per post. The biggest discussion topics (according to the number of tags) were “transformation” (discussing transformation at the university) and “src” (discussing the role of the Student Representative Council).
  • October 2013 had an average of 6.625 ‘othering’ words per post. The biggest discussion topics were “#matiesdiversity” (discussing diversity at Stellenbosch, also on the basis of a blogging competition that asked “What does it mean to be a born free?”) and “#i-dont-have-sex” (discussing views on sexuality, on the basis of a related blogging competition).
  • March 2014 had an average of just over 6 ‘othering’ words per post, with big topics being “Human Rights”, “Critical thinking”, “apartheid”, “transformation”, “opinions” and “born free”.
  • October 2014 had nearly 9 ‘othering’ words per post, with the four big topics being “language”, “diversity”, “DOOKOOM” (referring to the artist behind this controversial video) and “Culture”, with “Sex” and “blackface” as further interesting points of discussion.

The catch, and what can be learnt

At this point, before we get too excited about the insights above, I need to add a few disclaimers.

  1. The words I’ve chosen to represent ‘othering’ are naturally very limited; one could expand the choice of words (and perhaps even the types of expressions) that may be classified as ‘othering’ language.
  2. “They”, “their” and “them” aren’t always used to refer to out-groups; they can also be used to refer to abstract concepts and other objects (for example, in this sentence: “human rights form the foundation of society; they preserve constructive human relations and without them we would be lost.”).
  3. Due to time constraints, I haven’t included Afrikaans translations for the above words in my searches. Since a sizeable amount of Bonfiire Stellenbosch’s blog posts are in Afrikaans, it would be interesting to see how the picture changes if we include Afrikaans ‘othering’ words.

Nevertheless, if we take the relationship between months with high numbers of ‘othering’ words and the topics discussed in that month (as depicted through the tags) at face value, it would seem that the topics mentioned above necessitate — or promote — the use of ‘othering’ language. Anecdotally, this seems to make perfect sense for polarising issues where identity is the focus (e.g. transformation, diversity, views of sexuality, language, etc.) and where people will necessarily need to write in terms of ‘us’ and ‘them’. It could be very interesting to supplement these quantitative indicators with a qualitative examination of blog post content in the months of interest.

Holistically speaking, I am surprised at how such a rudimentary indicator as the average number of ‘othering’ words per post, when brought into view of the actual topics of discussion, can yield relatively interesting results. If anything, I think this illustrates — albeit at a very superficial level — how text mining can serve as point of departure for further qualitative analyses.

, , ,

No comments yet.

Leave a comment

Leave a Reply