Stephan Gouws (PhD-II).

Deep Unsupervised Feature Learning for Natural Language Processing

The map is not the territory

In a previous life I had dreams of becoming a rock star. When that idea tanked I aimed for becoming a stage hypnotist with the idea of working as an entertainer on cruise ships where I could travel the world and hypnotise people into paying for it. Instead, I fell in love with truly understanding the beautiful and intricate interactions that exist between people interacting with other people, the ebb and flow that is human communication — a quest that led me to discover the work of one of the most elegant communicators of all time, Milton H. Erickson. It was especially excruciating when I realised (in retrospect) that I lived (unknowingly at the time) two streets from his home in Phoenix, Arizona, during the years that I sold books door-to-door for the Southwestern Co.

Realising that my aspirations of changing the world through my questionable talent for song and dance were quickly drawing to a close, I tried to marry my background in Electronic Engineering with my passion for semantics and brains, a process which resulted in my Masters thesis on “Measuring Conceptual Similarity by Spreading Activation over Wikipedia’s Hyperlink Structure”.

Recently I have been spending way too much time trying to understand computational approaches for extracting some of the inherent structure that exist in all data, especially interactions between humans and (other) machines.

I work on developing techniques for analysing (especially large) collections of text, to understand the main entities being discussed, how they relate to one another, and the general structural relationships between them. To do this, I am working on developing unsupervised and lightly supervised probablistic methods for extracting the content structure of large volumes of text, such as discussions found on the Web and in online social media sites like Twitter. These methods should be able to deal with the various types of noise and redundancy found in these media and give users a good idea of what is being said, in what way, and by whom.

I find joy in meeting people whose capabilities vastly overshadow mine, love drinking red wine, enjoy jogging in the wonderful outdoors of Stellenbosch, and I have finally made peace with the fact that I am congenitally incapable of remembering to return my library books on time.


Unsupervised Mining of Lexical Variants from Noisy Text &#91pdf&#93, Stephan Gouws, Dirk Hovy and Donald Metzler, Proceedings of Unsupervised Methods in NLP Workshop at EMNLP 2011, Conference on Empirical Methods in Natural Language Processing (EMNLP-2011), Edinburgh, Scotland

Contextual Bearing on Linguistic Variation in Social Media &#91pdf&#93, Stephan Gouws, Donald Metzler, CongXing Cai, Eduard Hovy, Proceedings “Language in Social Media”, The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), Portland, Oregon, USA

Measuring Conceptual Similarity by Spreading Activation over Wikipedia’s Hyperlink Structure &#91pdf&#93, Stephan Gouws, GJ van Rooyen, Herman A Engelbrecht, Proceedings “Collaboratively Constructed Semantic Resources”, International Conference on Computational Linguistics (COLING), 23 – 28 Aug 2010, Beijing

Contact Details:

Twitter: @sgouws

LinkedIn: stephangouws

E-mail: stephan “at” ml “dot” sun . ac . za

Read blog posts by Stephan
Join Stephan at the MIH Media Lab 

My contact details: