CheNER: Chemical Named Entity Recognizer

Anabel Usié1,2Rui Alves2, Francesc Solsona1, Miguel Vázquez3* and FAlfonso Valencia3*

Corresponding authors: Miguel Vázquez This email address is being protected from spambots. You need JavaScript enabled to view it. - Alfonso Valencia This email address is being protected from spambots. You need JavaScript enabled to view it.

1 Department d'Informàtica i Enginyeria Industrial, Universitat de Lleida, Av. Jaume II n°69, 25001 Lleida, Spain
2 Department de Ciències Mèdiques Bàsiques & IRBLleida, Universitat de Lleida, Montserrat Roig n°2, 25008 Lleida, Spain
3 Structural Biology and Biocomputing Programme, Spanish National Cancer Research Center (CNIO), Madrid, Spain

Bioinformatics 2013, doi:10.1093/bioingormatics/btt639

The electronic version of this article is the complete one and can be found online at: http://bioinformatics.oxfordjournals.org/content/early/2013/11/30/bioinformatics.btt639#aff-1

© 2013 Usié et al.

Abstract

_____________________________________________________________________________________

Motivation

Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of International Union of Pure and Applied Chemistry (IUPAC) chemical compounds, which due to the complex morphology of IUPAC names requires more advanced techniques than that of brand names.

Results

We present CheNER, a tool for automated identification of systematic IUPAC chemical mentions. We evaluated different systems using an established literature corpus to show that CheNER has a superior performance in identifying IUPAC names specifically, and that it makes better use of computational resources.