Michael P. Oakes

Personal Details

Research Interests

Computational Linguistics, Corpus Linguistics, Information Retrieval

PhD. Supervision

I have successfully supervised as Director of Studies the following seven Ph.D. students: Fadi Yamout (Query Reformulation in Search Engines, 2008), Chufeng Chen (Time and Location-based Clustering of Personal Photographs, 2007) and George Ke (Email Classification, 2008), Vic Lin (Boosting for Image Classification, 2010), Ahmad AbuSukhon (Query Partitioning for large scale IR, 2009), Naveed Anwar (Data Mining of Audiology Records, 2012) and Nandita Tripathi (Automatic Classification of Newspaper Articles, 2012). I am currently supervising Victor Thompson (Detection of near-duplicate text records). I have also co-supervised Mustafa Abusalah (Multilingual Ontologies for the Travel Domain, 2007) and Chris Stokoe (Word Sense Disambiguation, 2003) to completion.

Previous Experience

From 2001 to 2013 I was a Senior Lecturer in Computing at the University of Sunderland, England.

From 2007 to 2010 I was the Principal Investigator at Sunderland for the EU-funded VITALAS project , mainly responsible for the text processing aspects of a multi-media search engine.

From April 2010 to April 2012 I was employed part-time as a Senior Researcher in the Computational Linguistics Group at Uni Research, a company mainly owned by the University of Bergen.


Oakes M P. (1998) Statistics for Corpus Linguistics, in the series "Edinburgh Textbooks in Empirical Linguistics", ed. T McEnery & A Wilson, Edinburgh University Press, 287 pp., 1998.

With Meng Ji at the University of Tokyo, I have edited a collection on Quantitative Methods in Corpus-Based Translation Studies . Publisher John Benjamins in the Series "Studies in Corpus Linguistics" 51, publication date April 2012.

I co-edited an e-book called The Many Facets of Corpus Linguistics in Bergen, with Lidun Hareide and Christer Johansson at the University of Bergen. The book is in honour of Knut Hofland, and is downloadable free of charge. BeLLs (Bergen Language and Linguistics) Series, 2013.

I have just completed a monograph called Literary Detective Work on the Computer , to be published in June 2014 by John Benjamins in the series "Natural Language Processing".

Recent Publications

Oakes, M.P. and Pichler, A (2013). Computational Stylometry of Wittgenstein's "Diktat fuer Schlick". In L. Hareide et al. (eds.), "The Many Facets of Corpus Linguistics in Bergen", BeLLs Vol 3 no. 1, pp 221-240. PDF

Panchev, C., Anwar. M N, Oakes, M.P. (2013). "Hearing Aid Classification Based on Audiology Data", ICANN (International Conference on Artificial Neural Networks), pp 375-380. 

Oakes, M P, Anwar, M N and Panchev, C. (2013). "Data Mining for Gender Differences in Tinnitus". Proceedings of the World Congress on Engineering (WCE), International Conference on Data Mining and Knowledge Engineering (ICDMKE), pp 1504-1509. PDF

Oakes, M.P. (2012). Describing a Translational Corpus. In M.P.Oakes and Meng Ji, (eds.), "Quantitative Methods in Corpus-Based Translation Studies", John Benjamins, Series "Studies in Linguistics" 51.

Ji, M. and Oakes, M.P. (2012). A Corpus Study of Early English Translations of Cao Xueqin's "Hongloumeng", In M. P. Oakes and Meng Ji, (eds.), "Quantitative Methods in Corpus-Based Translation Studies", John Benjamins, Series "Studies in Linguistics" 51.

Erwin, H. and Oakes, M. (2012). "Correspondence Analysis of the New Testament", Lanugage Resources and Evaluation of Religious Texts (LRE-Rel), Istanbul, 22nd May, 2012. PDF

Tripathi, N., Oakes, M. and Wermter, S. (2011). Semantic Subspace Learning for Text Classification. International Journal of Hybrid Intelligent Systems (IJHIS) 8(2): 99-114. PDF

Oakes, M.P. and Hall, L. (2011). Search Engine Metrics to Discover Terms Characteristic of a Database of Images with Captions. Terminologie et Intelligence Artificielle, Paris, 7-8th November, 2011. PDF

Anwar, M. N. and Oakes, M. P. (2011). Data Mining of Audiology Patient Records: Factors Influencing the Choice of Hearing Aid Type. Presented at the ACM 5th International Workshop on Data and Text Mining in Biomedical Informatics, Glasgow, Ocotber 24th, 2011. PDF

Tripathi N, Oakes M and Wermter S (2011). Hybrid Parallel Classifiers for Semantic Subspace Learning. International Conference on Artificial Neural Networks (ICANN), Espoo, Finland, June 14th – 17th, 2011.PDF

Anwar, M N, Oakes, M P and McGarry, K. (2011). Chi-squared, Yule's Q and Likelihood Ratios in Tabular Audiology Data. Springer LNEE Series. Vol. 90, pp. 365-376.PDF

Xu, Y. and Oakes, M.P. (2010). "An initial study on text summarisation in film stories", Procesiamento de Lenguaje Natural (SEPLN), 44. PDF

Anwar, M.N., Oakes, M.P. and McGarry, K (2010). "Chi-squared and associations in tabular audiology data". Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress in Engineering 2010 (WCE 2010), 30 June – 2 July, London, UK., pp. 346-351. PDF

Anwar, M.N., Oakes, M.P., Wermter, S. and Hinrich, S. (2010). "Clustering audiology data", The Annual Machine Learning Conference of Belgium and the Netherlands, (Benelearn 2010), May 27th – 28th, Leuven, Belgium. PDF

Lin, W-C., Oakes, M.P. and Tait, J. I (2010). "Improving image annotation via representative feature vector selection". Neurocomputing, 73(10-12), pp. 1774-1782. PDF

Lin, W-C, Oakes, M.P. & Tait, J. (2009). "Using information gain to select representative image features". Cognitive Processing 10(3):233-242. PDF

Oakes, M.P. (2009) "Corpus linguistics and language variation", In "Contemporary Approaches to Corpus Linguistics", edited by Paul Baker, Continuum, pp. 283-328.PDF

Oakes, M.P. (2009) "Javanese" in "The World's Major Languages", edited by Bernard Comrie, Routledge, pp. 819-832. PDF

Oakes, M.P. (2009) "Preprocessing Multilingual Corpora", In "Corpus Linguistics: An International Handbook", edited by A Lüdeling and M. Kytö, Mouton de Gruyter, pp. 685-705. PDF

Oakes, M.P. (2009) "Corpus Lingusitics and Stylometry". In "Corpus Linguistics: An International Handbook", edited by. A Lüdeling and M. Kytö, Mouton de Gruyter, pp. 1070-1090. PDF

