Publications

Here is a complete list of my publications. You can also visit my profiles at Google Scholar, Semantic Scholar, DBLP, and ResearchGate.

2017

Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia (2017) Complex Word Identification: Challenges in Data Annotation and System Performance. Proceedings of the 4th Workshop on NLP Techniques for Educational Applications (NLPTEA). Taipei, Taiwan. [pdf] [url]

Ekaterina Lapshinova-Koltunski, Marcos Zampieri (2017) Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective. Grammar of Genres and Styles. Mouton de Gruyter. [pdf]

Marcos Zampieri, Alina Maria Ciobanu, Liviu P. Dinu (2017) Native Language Identification on Text and Speech. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA). Copenhagen, Denmark. [pdf] [url]

Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu (2017) Including Dialects and Language Varieties in Author Profiling. Working Notes of CLEF - Conference and Labs of the Evaluation Forum. Dublin, Ireland. [pdf] [url]

Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2017) Predicting the Law Area and Decisions of French Supreme Court Cases. Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 716-722. Varna, Bulgaria. [pdf] [url]

Preslav Nakov, Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed ali (2017) Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Valencia, Spain. Association for Computational Linguistics. [pdf] [url]

Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann, Yves Scherrer, Noëmi Aepli (2017) Findings of the VarDial Evaluation Campaign 2017. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-15. Valencia, Spain. [pdf] [url]

Shervin Malmasi, Marcos Zampieri (2017) German Dialect Identification in Interview Transcriptions. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 164-169. Valencia, Spain. [pdf] [url]

Shervin Malmasi, Marcos Zampieri (2017) Arabic Dialect Identification Using iVectors and ASR Transcripts. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 178-183. Valencia, Spain. [pdf] [url]

2016

Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela, Josef van Genabith, Ruslan Mitkov (2016) Improving Translation Memory Matching and Retrieval Using Paraphrases. Machine Translation. Springer. Volume 30, Issue 1–2, pp. 19–40. [url]

Santanu Pal, Sudip Kumar Naskar, Marcos Zampieri, Tapas Nayak, Josef van Genabith (2016) CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research. Proceedings of the 26th International Conference on Computational Linguistics (COLING). pp. 98-102. Osaka, Japan. [pdf] [url]

Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi (2016) Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). Osaka, Japan. Association for Computational Linguistics. [pdf] [url]

Shervin Malmasi, Marcos Zampieri, Nikola Ljubešić, Preslav Nakov, Ahmed Ali, Jörg Tiedemann (2016) Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 1-14. Osaka, Japan. [pdf] [url]

Shervin Malmasi, Marcos Zampieri (2016) Arabic Dialect Identification in Speech Transcripts. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). pp. 106-103. Osaka, Japan. [pdf] [url]

Marcos Zampieri, Shervin Malmasi, Octavia-Maria Sulea, Liviu P. Dinu (2016) A Computational Approach to the Study of Portuguese Newspapers Published in Macau. Proceedings of the Workshop on Natural Language Processing Meets Journalism (NLPMJ). pp. 47-51. New York, United States. [pdf] [url]

Eckhard Bick, Marcos Zampieri (2016) Grammatical Annotation of Historical Portuguese: Generating a Corpus-based Diachronic Dictionary. Proceedings of the 19th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence - LNAI 9924. Springer. pp. 3-11. [pdf] [url]

Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéolé, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri (2016) Findings of the 2016 Conference on Machine Translation (WMT16). Proceedings of the First Conference on Machine Translation (WMT). pp. 131-198. Berlin, Germany. [pdf] [url]

Santanu Pal, Marcos Zampieri, Josef van Genabith (2016) USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing. Proceedings of the First Conference on Machine Translation (WMT). pp. 759-763. Berlin, Germany. [pdf] [url]

Marcos Zampieri, Shervin Malmasi, Mark Dras (2016) Modeling Language Change in Historical Corpora: The Case of Portuguese. Proceedings of Language Resources and Evaluation (LREC). pp. 4098-4104. Portoroz, Slovenia. [pdf] [url]

Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri (2016) Discriminating Similar Languages: Evaluations and Explorations. Proceedings of Language Resources and Evaluation (LREC). pp. 1800-1807. Portoroz, Slovenia. [pdf] [url]

Santanu Pal, Marcos Zampieri, Mihaela Vela, Tapas Nayak, Sudip Kumar Naskar, Josef van Genabith (2016) CATaLog Online: Porting a Post-editing Tool to the Web. Proceedings of Language Resources and Evaluation (LREC). pp. 599-604. Portoroz, Slovenia. [pdf] [url]

Shervin Malmasi, Marcos Zampieri, Mark Dras (2016) Predicting Post Severity in Mental Health Forums. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology (CLPsych). pp. 133-137. San Diego, United States. [pdf] [url]

Marcos Zampieri, Liling Tan, Josef van Genabith (2016) MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for Complex Word Identification. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 1001-1005. San Diego, United States. [pdf] [url]

Shervin Malmasi, Mark Dras, Marcos Zampieri (2016) LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 996-1000. San Diego, United States. [pdf] [url]

Shervin Malmasi, Marcos Zampieri (2016) MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 991-995. San Diego, United States. [pdf] [url]

Marcos Zampieri (2016). Automatic Language Identification. Working with Text: Tools, Techniques and Approaches for Text Mining. pp. 189-205. Chandos Publishing, Elsevier. [url]

2015

Preslav Nakov, Marcos Zampieri, Petya Osenova, Liling Tan, Cristina Vertan, Nikola Ljubešić, Jörg Tiedemann (2015) Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). Hissar, Bulgaria. Association for Computational Linguistics. [pdf] [url]

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Preslav Nakov (2015) Overview of the DSL Shared Task 2015. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). pp. 1-9. Hissar, Bulgaria. [pdf] [url]

Marcos Zampieri, Binyam Gebrekidan Gebre, Hernani Costa, Josef van Genabith (2015) Comparing Approaches to the Identification of Similar Languages. Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial). pp. 66-72. Hissar, Bulgaria. [pdf] [url]

Tapas Nayek, Sudip Kumar Naskar, Santanu Pal, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2015) CATaLog: New Approaches to TM and Post Editing Interfaces. Proceedings of the Workshop on Natural Language Processing for Translation Memories (NLP4TM). pp. 36-32. Hissar, Bulgaria. [pdf] [url]

Marcos Zampieri, Ekaterina Lapshinova-Koltunski (2015) Investigating Genre and Method Variation in Translation Using Text Classification. Proceedings of the 18th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Computer Science - LNCS 9302. Springer. pp. 41-50. [pdf] [url]

Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith, Lucia Specia (2015) Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation. Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT). pp. 121-128. Antalya, Turkey. [pdf] [url]

Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela, Josef van Genabith (2015) Can Translation Memories Afford Not to Use Paraphrasing? Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT). pp. 35-42. Antalya, Turkey. [pdf] [url]

Marcos Zampieri, Alina Maria Ciobanu, Vlad Niculae, Liviu P. Dinu (2015) AMBRA: A Ranking Approach to Temporal Text Classification. Proceedings of the 9th Workshop on Semantic Evaluation (SemEval). pp. 851-822. Denver, United States. [pdf] [url]

2014

Marcos Zampieri, Liling Tan (2014) Grammatical Error Detection with Limited Training Data: The Case of Chinese. Proceedings of the 22nd International Conference on Computers in Education (ICCE). pp. 69-74. Nara, Japan. [pdf] [url]

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann (2014) Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin, Ireland. Association for Computational Linguistics. [pdf] [url]

Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann (2014) A Report on the DSL Shared Task 2014. Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). pp. 58-67. Dublin, Ireland. [pdf] [url]

Marcos Zampieri, Renato Cordeiro de Amorim (2014) Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery. Proceedings of the 9th International Conference on Natural Language Processing (PolTAL). Lecture Notes in Computer Science - LNCS 8686. Springer. pp. 438-449. [pdf] [url]

Marcos Zampieri, Binyam Gebrekidan Gebre (2014) VarClass: An Open Source Language Identification Tool for Language Varieties. Proceedings of Language Resources and Evaluation (LREC). pp. 3305-3308. Reykjavik, Iceland. [pdf] [url]

Liling Tan, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann (2014) Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection. Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC). pp. 6-10. Reykjavik, Iceland. [pdf] [url]

Vlad Niculae, Marcos Zampieri, Liviu Dinu, Alina Maria Ciobanu (2014) Temporal Text Ranking and Automatic Dating of Texts. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). pp. 17-21. Gothenburg, Sweden. [pdf] [url]

Marcos Zampieri, Mihaela Vela (2014) Quantifying the Influence of MT Output in the Translators' Performance: A Case Study in Technical Translation. Proceedings of the EACL Workshop on Humans and Computer-assisted Translation (HaCat). pp. 93-98. Gothenburg, Sweden. [pdf] [url]

2013

Marcos Zampieri (2013) Using Bag-of-words to Distinguish Similar Languages: How Efficient are They? Proceedings of the 14th IEEE International Symposium on Computational Intelligence and Informatics (CINTI). pp. 37-41. Budapest, Hungary. [pdf] [url]

Renato Cordeiro de Amorim, Marcos Zampieri (2013) Effective Spell Checking Methods Using Clustering Algorithms. Proceedings of Recent Advances in Natural Language Processing (RANLP). pp. 172-178. Hissar, Bulgaria. [pdf] [url]

Sanja Štajner, Marcos Zampieri (2013) Stylistic Changes for Temporal Text Classification. Proceedings of the 16th International Conference on Text, Speech and Dialogue (TSD), Lecture Notes in Artificial Intelligence - LNAI 8082, Springer. pp. 519-526. [pdf] [url]

Marcos Zampieri, Sascha Diwersy (2013) Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. [url]

Marcos Zampieri, Jürgen Hermes, Stephan Schwiebert (2013) Identification of Patterns and Document Ranking of Internet Texts: A Frequency-based Approach. Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. pp. 25-39. [pdf] [url]

Marcos Zampieri, Martin Becker (2013) Colonia: Corpus of Historical Portuguese. Non-Standard Data Sources in Corpus-based Research. ZSM-Studien Series - Vol. 5. Shaker. pp. 77-84. [pdf] [url]

Marcos Zampieri, Binyam Gebrekidan Gebre, Sascha Diwersy (2013) N-Gram Language Models and POS Distribution for the Identification of Spanish Varieties. Proceedings of TALN. pp. 580-587. Sables d'Olonne, France. [pdf] [url]

Binyam Gebrekidan Gebre, Marcos Zampieri, Peter Wittenburg, Tom Heskens (2013) Improving Native Language Identification with TF-IDF Weighting. Proceedings of the 8th NAACL Workshop on Innovative Use of NLP for Building Educational Applications (BEA). pp. 216-223. Atlanta, United States. [pdf] [url]

2012

Marcos Zampieri (2012) Evaluating Knowledge-poor and Knowledge-rich Features in Automatic Classification: A Case Study in WSD. Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics (CINTI). pp. 359-363. Budapest, Hungary. [pdf] [url]

Marcos Zampieri, Binyam Gebrekidan Gebre, Sascha Diwersy (2012) Classifying Pluricentric Languages: Extending the Monolingual Model. Proceedings of the Fourth Swedish Language Technology Conference (SLTC). pp. 79-80. Lund, Sweden. [pdf] [url]

Marcos Zampieri, Binyam Gebrekidan Gebre (2012) Automatic Identification of Language Varieties: The Case of Portuguese. Proceedings of KONVENS. pp. 233-237. Vienna, Austria. [pdf] [url]

2010

Marcos Zampieri (2010) A Supervised Machine Learning Method for Word Sense Disambiguation of Portuguese Nouns. Bulletin de Linguistique Apliquee et Generale - BULAG 34. pp. 187-203. [pdf] [url]

Jorge Baptista, Neusa Costa, Joaquim Guerra, Marcos Zampieri, Maria Cabral, Nuno Mamede (2010) P-AWL: Academic Word List for Portuguese. Proceedings of PROPOR, Lecture Notes in Artificial Intelligence - LNAI 6001, Springer. pp. 120-123. [pdf] [url]

Last Updated: October 2017