Michael P. Oakes

Personal Details

Name: Michael Philip Oakes (Dr).

Affiliation: University of Wolverhampton

Position: Reader in Computational Linguistics

Address: Research Institute of Information and Language Processing, University of Wolverhampton, Stafford Street, Wolverhampton WV1 1NA, United Kingdom.

Telephone: 01902 322967

Fax: 01902  323543

Michael.Oakes at wlv dot ac dot uk

Research Interests

Computational Linguistics, Corpus Linguistics, Information Retrieval

PhD. Supervision

I have successfully supervised as Director of Studies the following seven Ph.D. students: Fadi Yamout (Query Reformulation in Search Engines, 2008), Chufeng Chen (Time and Location-based Clustering of Personal Photographs, 2007) and George Ke (Email Classification, 2008), Vic Lin (Boosting for Image Classification, 2010), Ahmad AbuSukhon (Query Partitioning for large scale IR, 2009), Naveed Anwar (Data Mining of Audiology Records, 2012) and Nandita Tripathi (Automatic Classification of Newspaper Articles, 2012). I am currently supervising Victor Thompson (Detection of near-duplicate text records). I have also co-supervised Mustafa Abusalah (Multilingual Ontologies for the Travel Domain, 2007) and Chris Stokoe (Word Sense Disambiguation, 2003) to completion. I am currently supervising three Ph.D. students: Mireille Makary (Automatic Generation of Relevance Judgements), Najah Albaqawi (Gulf Pidgin Arabic) and Ahmed Omer (Disputed Authorship Studies). Please contact me if you are interested in studying with me for a Ph.D.

Previous Experience

From 2001 to 2013 I was a Senior Lecturer in Computing at the University of Sunderland, England.

From 2007 to 2010 I was the Principal Investigator at Sunderland for the EU-funded VITALAS project , mainly responsible for the text processing aspects of a multi-media search engine.

From April 2010 to April 2012 I was employed part-time as a Senior Researcher in the Computational Linguistics Group at Uni Research, a company mainly owned by the University of Bergen.


Oakes M P. (1998) Statistics for Corpus Linguistics, in the series "Edinburgh Textbooks in Empirical Linguistics", ed. T McEnery & A Wilson, Edinburgh University Press, 287 pp., 1998.

With Meng Ji, then at the University of Tokyo, I have edited a collection on Quantitative Methods in Corpus-Based Translation Studies . Publisher John Benjamins in the Series "Studies in Corpus Linguistics" 51, publication date April 2012.

I co-edited an e-book called The Many Facets of Corpus Linguistics in Bergen, with Lidun Hareide and Christer Johansson at the University of Bergen. The book is in honour of Knut Hofland, and is downloadable free of charge. BeLLs (Bergen Language and Linguistics) Series, 2013.

I have completed a monograph called Literary Detective Work on the Computer , published in June 2014 by John Benjamins in the series "Natural Language Processing".

Recent Publications

Oakes, M.P. (2017). Statistical Analyis of the Texts in Mahadevan's Concordance of the Indus Valley Script. Journal of Quantitative Linguistics. PDF

Oakes, M.P. (2017). Computer Stylometry of C.S. Lewis's " The Dark Tower " and Related Texts. Digital Scholarship in the Humanities. PDF

Franklin, Emma and Oakes, M.P. (2016). Ngrams and : The Use of Structural and Conceptual Features to Discriminate Between English Translations of Religious Texts. Corpora 11(3): 299-341 PDF

Oakes, M.P. (2016). Computers and the Study of Lost Languages. Proceedings of the 13th International Conference on Statistical Analysis of Textual Data (JADT), 7-10 June 2016, Nice: 49-64. PDF

Makary, Mireille, Oakes, M.P. and Yamout, Fadi (2016). Using Key Phrases as New Queries in Building Relevance Judgements Automatically. Lernen, Wissen, Daten, Analysen Conference (LWDA 2016), Hasso-Plattner Institute, Potsdam, Germany, Sept. 12-14. PDF

Makary, Mireille, Oakes, M.P. and Yamout, Fadi (2016). Towards Automatic Generation of Relevance Judgements for a Test Collection. 11th International Conference on Digital Information Management (ICDIM), Porto, Portugal, Sept. 19-21. PDF

Vilares, Jesus, Vilares, Manuel, Alonso Miguel A. and Oakes, M.P. (2016). On the Feasibility of Character N-grams Pseudo-Translation for Cross-Language Information Retrieval Tasks. Computer Speech and Language 36(March)136-164. PDF

Bobicev, Victoria, Sokolova, Marina and Oakes, M.P. (2015). What Goes Around Comes Around: Learning Sentiments in Online Medical Forums. Cognitive Computation 7(5):609-621 Digital Scholarship in the Humanities. PDF

Tripathi, Nandita, Oakes, M.P. and Wermter, Stefan (2015). A Scalable Meta-Classifier for Combining Search and Classification Techniques for Multi-Level Text Categorization. International Journal of Computational Intelligence and Applications 14(4) PDF

Shahsepandy, Homayun, Oakes, M.P. and Van Heesvijck, A. (2014). The Isle of Wight Suicide Study: A Case of Suicide in a Limited Geographic Area. Irish Journal of Psychological Medicine, 31(2):133-141 Digital Scholarship in the Humanities. PDF

Oakes, M.P. and Pichler, A (2013). Computational Stylometry of Wittgenstein's "Diktat fuer Schlick". In L. Hareide et al. (eds.), "The Many Facets of Corpus Linguistics in Bergen", BeLLs Vol 3 no. 1, pp 221-240. PDF

Panchev, C., Anwar. M N, Oakes, M.P. (2013). "Hearing Aid Classification Based on Audiology Data", ICANN (International Conference on Artificial Neural Networks), pp 375-380. 

Oakes, M P, Anwar, M N and Panchev, C. (2013). "Data Mining for Gender Differences in Tinnitus". Proceedings of the World Congress on Engineering (WCE), International Conference on Data Mining and Knowledge Engineering (ICDMKE), pp 1504-1509. PDF

Oakes, M.P. (2012). Describing a Translational Corpus. In M.P.Oakes and Meng Ji, (eds.), "Quantitative Methods in Corpus-Based Translation Studies", John Benjamins, Series "Studies in Linguistics" 51.

Ji, M. and Oakes, M.P. (2012). A Corpus Study of Early English Translations of Cao Xueqin's "Hongloumeng", In M. P. Oakes and Meng Ji, (eds.), "Quantitative Methods in Corpus-Based Translation Studies", John Benjamins, Series "Studies in Linguistics" 51.

Erwin, H. and Oakes, M. (2012). "Correspondence Analysis of the New Testament", Lanugage Resources and Evaluation of Religious Texts (LRE-Rel), Istanbul, 22nd May, 2012. PDF

Tripathi, N., Oakes, M. and Wermter, S. (2011). Semantic Subspace Learning for Text Classification. International Journal of Hybrid Intelligent Systems (IJHIS) 8(2): 99-114. PDF

Oakes, M.P. and Hall, L. (2011). Search Engine Metrics to Discover Terms Characteristic of a Database of Images with Captions. Terminologie et Intelligence Artificielle, Paris, 7-8th November, 2011. PDF

Anwar, M. N. and Oakes, M. P. (2011). Data Mining of Audiology Patient Records: Factors Influencing the Choice of Hearing Aid Type. Presented at the ACM 5th International Workshop on Data and Text Mining in Biomedical Informatics, Glasgow, Ocotber 24th, 2011. PDF

Tripathi N, Oakes M and Wermter S (2011). Hybrid Parallel Classifiers for Semantic Subspace Learning. International Conference on Artificial Neural Networks (ICANN), Espoo, Finland, June 14th – 17th, 2011.PDF

Anwar, M N, Oakes, M P and McGarry, K. (2011). Chi-squared, Yule's Q and Likelihood Ratios in Tabular Audiology Data. Springer LNEE Series. Vol. 90, pp. 365-376.PDF

Xu, Y. and Oakes, M.P. (2010). "An initial study on text summarisation in film stories", Procesiamento de Lenguaje Natural (SEPLN), 44. PDF

Anwar, M.N., Oakes, M.P. and McGarry, K (2010). "Chi-squared and associations in tabular audiology data". Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress in Engineering 2010 (WCE 2010), 30 June – 2 July, London, UK., pp. 346-351. PDF

Anwar, M.N., Oakes, M.P., Wermter, S. and Hinrich, S. (2010). "Clustering audiology data", The Annual Machine Learning Conference of Belgium and the Netherlands, (Benelearn 2010), May 27th – 28th, Leuven, Belgium. PDF

Lin, W-C., Oakes, M.P. and Tait, J. I (2010). "Improving image annotation via representative feature vector selection". Neurocomputing, 73(10-12), pp. 1774-1782. PDF

Lin, W-C, Oakes, M.P. & Tait, J. (2009). "Using information gain to select representative image features". Cognitive Processing 10(3):233-242. PDF

Oakes, M.P. (2009) "Corpus linguistics and language variation", In "Contemporary Approaches to Corpus Linguistics", edited by Paul Baker, Continuum, pp. 283-328.PDF

Oakes, M.P. (2009) "Javanese" in "The World's Major Languages", edited by Bernard Comrie, Routledge, pp. 819-832. PDF

Oakes, M.P. (2009) "Preprocessing Multilingual Corpora", In "Corpus Linguistics: An International Handbook", edited by A Lüdeling and M. Kytö, Mouton de Gruyter, pp. 685-705. PDF

Oakes, M.P. (2009) "Corpus Lingusitics and Stylometry". In "Corpus Linguistics: An International Handbook", edited by. A Lüdeling and M. Kytö, Mouton de Gruyter, pp. 1070-1090. PDF

Earlier publications may be found here

Dublin slides here

authorship slides here plagiarism slides here Shakespeare slides here IRSG AGM Agenda 2014 here IRSG AGM Minutes 2013 here