Petya Osenova
Sofia University "St.Kliment Ohridsky", Bulgarian Language, Faculty Member
- Open Access Books in Linguistics, Morphology, Corpus Linguistics, Syntax, Functional Linguistics, Slavic Linguistics, and 16 moreBulgarian Language, Languages and Linguistics, Information Retrieval, Ontology (Computer Science), Semantic Web Technologies, Theoretical Linguistics, Slavic Languages, Language Typology, Valency, Linguistic Typology, Grammaticalization, Morphology and Syntax, Morphosyntax, Case and Agreement, Agreement, and Czechedit
The paper focuses on the modelling of multiword expressions (MWE) in Bulgarian-English parallel news corpora (SETimes; CSLI dataset and PennTreebank dataset). Observations were made on alignments in which at least one multiword expression... more
The paper focuses on the modelling of multiword expressions (MWE) in Bulgarian-English parallel news corpora (SETimes; CSLI dataset and PennTreebank dataset). Observations were made on alignments in which at least one multiword expression was used per language. The multiword expressions were classified with respect to the PARSEME lexicon-based (WG1) and treebank-based (WG4) classifications. The non-MWE counterparts of MWEs are also considered. Our approach is data-driven because the data of this study was retrieved from parallel corpora and not from bilingual dictionaries. The survey shows that the predominant translation relation between Bulgarian and English is MWE-to-word, and that this relation does not exclude other translation options. To formalize our observations, a catenae-based modelling of the parallel pairs is proposed.
Research Interests:
The paper focuses on the traditional understanding of the syntactic relations that are relevant for Bulgarian. These are: agreement, government, prepositional linking and apposition. Although in the Bulgarian linguistic literature there... more
The paper focuses on the traditional understanding of the syntactic relations that are relevant for Bulgarian. These are: agreement, government, prepositional linking and apposition. Although in the Bulgarian linguistic literature there exist some similar observations, they are not quite consistently discussed. This survey suggests a structured typology of the syntactic relations. It shows more systematically that the surface syntactic relation might differ from the underlying one and that there is a possibility of accommodating two syntactic relations among two lexical elements that make a constituent.
Research Interests:
In this paper we aim at outlining the joint exploitation of two nominal grammars - named-entity grammar and chunk grammar in the process of building a treebank. Their contribution towards unified and effective NP shallow parser is... more
In this paper we aim at outlining the joint exploitation of two nominal grammars - named-entity grammar and chunk grammar in the process of building a treebank. Their contribution towards unified and effective NP shallow parser is stressed upon. Taking into account their specific underlying principles, the points of interrelation are discussed and related problems are pointed out.
Research Interests:
Research Interests:
In this paper we are reporting about an ongoing project LT4eL (Language Technolohy for eLearning) aiming at improving the effectiveness of retrieval and accessibility of learning objects within a learning management system. We elaborate... more
In this paper we are reporting about an ongoing project LT4eL (Language Technolohy for eLearning) aiming at improving the effectiveness of retrieval and accessibility of learning objects within a learning management system. We elaborate the process of building the domain ontology and present the multilingual support offered to the application.
Research Interests:
This paper addresses the problem of efficient resources compilation for less-processed languages. It presents a strategy for the creation of a morpho-syntactically tagged corpus with respect to such languages. Due to the fact that human... more
This paper addresses the problem of efficient resources compilation for less-processed languages. It presents a strategy for the creation of a morpho-syntactically tagged corpus with respect to such languages. Due to the fact that human languages are morphologically non- homogenous, we mainly focus on inflecting ones. With certain modifications, the model can be applied to the other types as well. The strategy is described within a certain implementational environment - the CLaRK System. First, the general architecture of the software is described. Then, the usual steps towards the creation of the language resource are outlined. After that, the concrete imlementational properties of the processing steps within CLaRK are discussed: text archive compilation, tokenization, frequency word list creation, morphological lexicon creation, morphological analyzer, semi-automatic disambiguation.
Research Interests:
The paper outlines a hybrid architecture for a partial parser based on regular grammars over XML documents. The parser is used to support the annotation process in the BulTreeBank project. Thus the parser annotates only the... more
The paper outlines a hybrid architecture for a partial parser based on regular grammars over XML documents. The parser is used to support the annotation process in the BulTreeBank project. Thus the parser annotates only the 'sure' cases. To maximize the number of the analyzed phrases the parser applies a set of grammars in a dynamic fashion. Each grammar determines not only the constituent structure (plus some syntactic dependencies internal to the structure), but also a description of the local and global context of the recognized phrase. The grammars available to the parser are arranged in a network. The order of the grammars application depends on the initial ordering in the network and the descriptions associated with the grammars. Thus the traverse is not deterministic. Additionally, the application of the grammars can be interleaved with the applications of other XML tools like remove, insert and transform operations. This architecture provides a flexible means for g...
Research Interests:
Reliable automatic semantic annotation systems do not exist for many languages. Their creation depends in many respects on construction of gold standard corpora. In this paper we present a system for supporting the semi-automatic... more
Reliable automatic semantic annotation systems do not exist for many languages. Their creation depends in many respects on construction of gold standard corpora. In this paper we present a system for supporting the semi-automatic construction of such corpora. The ...
Research Interests:
@Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik}, title = {Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}, month = {September}, year = {2011}, address... more
@Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik}, title = {Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}, month = {September}, year = {2011}, address = {Hissar, Bulgaria}, url ...
Research Interests:
Research Interests:
The paper presents the term head from various points of view. On the one hand, the perspectives on the head have been discussed in the context of different theoretical approaches. On the other hand, the content of the term has been... more
The paper presents the term head from various points of view. On the one hand, the perspectives on the head have been discussed in the context of different theoretical approaches. On the other hand, the content of the term has been presented with a view to the criteria for its detection and also to the complexity of the language data.
Research Interests:
The notion of catena was introduced originally to represent the syntactic structure of multiword expressions with idiosyncratic semantics and non-constituent structure. Later on, several other phenomena (such as ellipsis, verbal... more
The notion of catena was introduced originally to represent the syntactic structure of multiword expressions with idiosyncratic semantics and non-constituent structure. Later on, several other phenomena (such as ellipsis, verbal complexes, etc.) were
formalized as catenae. This naturally led to the suggestion that a catena can be considered a basic unit of syntax. In this paper
we present a formalization of catenae and the main operations over them for modelling the combinatorial potential of units
in dependency grammar.
formalized as catenae. This naturally led to the suggestion that a catena can be considered a basic unit of syntax. In this paper
we present a formalization of catenae and the main operations over them for modelling the combinatorial potential of units
in dependency grammar.
Research Interests:
The paper focuses on the sense annotation of BulTreeBank. It discusses three levels of annotation: valency frames, lexical senses and DBPedia URIs. The lexical sense annotation is considered in more detail and in relation to the other two... more
The paper focuses on the sense annotation of BulTreeBank. It discusses three levels of annotation: valency frames, lexical senses and DBPedia URIs. The lexical sense annotation is considered in more detail and in relation to the other two processes. Special attention is paid to the quality validation with respect to two aspects: inter-annotator agreement and cross-resource control.
Research Interests:
The paper discusses the syntactic relations within Bulgarian complex words with a verbal root as head and nominal or adverbial root as dependant. The parts-of speech considered are verbs and verbal nouns. The corpus-based survey shows... more
The paper discusses the syntactic relations within Bulgarian complex words with a verbal root as head and nominal or adverbial root as dependant. The parts-of speech considered are verbs and verbal nouns. The corpus-based survey shows that verbal nouns allow more often the existence of word-internal arguments than verbs. The data presents both cases: when any word-external arguments are blocked, and when such arguments are allowed.
Research Interests:
The paper presents the semantic modeling of Bulgarian parts-of-speech within the theory of Head-driven Phrase Structure Grammar (HPSG). The parts-of-speech are divided into two main groups: referents and events. The referents are... more
The paper presents the semantic modeling of Bulgarian parts-of-speech within the theory of Head-driven Phrase Structure Grammar (HPSG). The parts-of-speech are divided into two main groups: referents and events. The referents are indicated by the nouns, while the events are presented through verbs, prepositions, adjectives, numerals, adverbs. Since HPSG is a monostratal linguistic theory, the morphosyntactic and semantic information are presented at one level and in close relation to each other. In such a context, the challenge is that Bulgarian is a language with rich morphology, and the model has to balance between the grammar and semantics respectively.
Research Interests:
Having being proposed for the fourth time, the QA at CLEF track has confirmed a still raising interest from the research community, recording a constant increase both in the number of participants and submissions. In 2006, two pilot... more
Having being proposed for the fourth time, the QA at CLEF track has confirmed a still raising interest from the research community, recording a constant increase both in the number of participants and submissions. In 2006, two pilot tasks, WiQA and AVE, were proposed beside the main tasks, representing two promising experiments for the future of QA.Also in the main task some significant innovations were introduced, namely list questions and requiring text snippet(s) to support the exact answers. Although this had an impact on the work load of the organizers both to prepare the question sets and especially to evaluate the submitted runs, it had no significant influence on the performance of the systems, which registered a higher Best accuracy than in the previous campaign, both in monolingual and bilingual tasks. In this paper the preparation of the test set and the evaluation process are described, together with a detailed presentation of the results for each of the languages. The pilot tasks WiQA and AVE will be presented in dedicated articles.
Research Interests:
Research Interests:
(с оглед на автоматичната обработка на естествен език)
