Dirko Coetsee (MScEng-I).

Normalisation of noisy web text by using Machine Learning techniques

Internet users generate massive amounts of textual content on an daily basis.  While one would like to extract useful information from this text, the informal and unstructured language usage presents a challenge.

Noise” is caused by the ubiquitous use of abbreviations and acronyms, homophones, creative punctuation and emoticons, as well as the normal mistypes, spelling mistakes and the use of slang.

These factors suggest that text should first be “normalised” before it can effectively be mined. My research will be about applying machine learning techniques so a computer can automatically do this normalisation step.

Although I have a background in electrical and electronic engineering, I am interested in NLP, machine learning, and AI.

Read Dirko’s blog posts
Join Dirko in 2012 at the MIH Media Lab 

My contact details: dirko@ml.sun.ac.za