Extracting Multiword Expressions using Enumerations of Noun Phrases in Specialized Domains: first experiences

Abstract

We present a recognition algorithm for enumerations of Noun Phrase (NPEs) whose objective is to detect and extract multiword expression (MWE). The algorithm used syntactic rules elaboration from linguistic information aiming to recognize NPEs. This information corresponds to morphological categories (noun, adjective, female, male, etc.). The evaluation takes into account only bigrams found in two di⬚fferent domain corpora of medicine and legal texts. The results are encouraging because, despite the low recall of MWEs, many signi⬚ficant terminological units from the two specialized domains were detected and extracted.