Dr. Christian M. Meyer

UBY – A Large-Scale Unified
Lexical-Semantic Resource

Abstract. We present UBY, a large-scale lexical-semantic resource combining a wide range of in­for­ma­tion from expert-constructed and collaboratively created resources for English and German. It currently contains nine resources in two languages: English WordNet, Wiktionary, Wikipedia, FrameNet and VerbNet, German Wikipedia, Wiktionary, and GermaNet, and the multilingual Omega­Wiki.

The main contributions of our work can be summarised as follows. First, we define a standardised format for modelling the heterogeneous in­for­ma­tion coming from the various lexical-semantic resources (LSRs) and languages included in UBY. For this purpose, we employ the ISO standard Lexical Markup Frame­work and Data Categories selected from ISOCat. In this way, all types of in­for­ma­tion provided by the LSRs in UBY are easily accessible on a fine-grained level. Further, this standardised format facilitates the extension of UBY with new languages and resources. This is different from previous efforts in combining LSRs which usually targeted particular applications and thus focused on aligning specific types of in­for­ma­tion only.

Second, UBY contains nine pairwise sense alignments between resources. Through these alignments, we provide access to the complementary in­for­ma­tion for a word sense in different resources. For example, if one looks up a particular verb sense in UBY, one has simultaneous access to the sense in WordNet and to the corresponding sense in FrameNet.

Third, UBY is freely available and we have developed an easy-to-use Java API which provides unified access to all types of in­for­ma­tion contained in UBY. This facilitates the utilization of UBY for a variety of NLP tasks.

Eingereicht: 14.11.2012 | Veröffentlicht: 17.01.2013
Poster presented at the conference.
