Česky


Charles University

Faculty of Arts

Institute of Theoretical and Computational Linguistics




Czech syntactic lexicon

Hana Skoumalova



2001


Supervisor: Prof. PhDr. Jarmila Panevova, DrSc.



You can download the whole file here: .ps.gz, .pdf.gz, or single parts from the Table of Contents below.

You can look at slides from my lecture on the lexicon (here in PDF -- both in Czech).

Users who know the password can browse the finished lexicons.


Abstract

In this work, an electronic lexicon of Czech verbs is presented. The lexicon contains valency frames of ca 15,000 Czech verbs, and its purpose is to enrich information contained in other electronic dictionaries. The trend of recent years is to make large-scale reusable sources which can be combined with other sources. This work shows how the lexicon cooperates with an existing morphological lexicon and how it can be used in various NLP systems.

Chapter 2 discusses several theoretical approaches in comparison with Functional Generative Description (FGD), which is used for the dictionary. The explication concentrates especially on the structure of lexicons in single theories. A lexicon usually conforms certain preconditions resulting from using a given theoretical framework, and so the possibility of creating a lexicon which would be transferable to another theoretical framework is explored.

Chapter 3 discusses the possibility of using existing sources, with respect to the desired result and the theoretical framework adopted for the work. There were already several Czech syntactic lexicons created in the past, but unfortunately their reuse would be rather difficult. This chapter mentions several such attempts, and describes in detail a lexicon which is used.

Chapter 4 describes the verb frame. First, the format of the lexical entry is described, then various types of reflexive constructions in Czech, and their encoding in the lexicon are discussed. In the next section, possible diatheses of the basic (active) frame are shown, and it is also discussed which of these diatheses can be added to the dictionary on a regular basis and which have to be treated as exceptions. The last section describes so called equi and raising verbs.

In Chapter 5, the procedure of automatic conversion of the source dictionary to the proposed format is shown. For this conversion, an algorithm was created which assigns the functors (semantic roles) to single members of a frame. The output of this procedure will serve as an input for an editor. It is discussed what amount of the source data can be completed by this procedure and what amount needs post-editing. It is also shown how the resulting lexicon can be used in NLP systems.

Chapter 6 sums up. In Section 6.1, verbs are sorted into groups according their frames, and the results are compared with results of other researchers. In Section 6.2, perspectives of the language processing based on symbolic methods are discussed, and the possible usage of the lexicon in corpus linguistics.




Contents (.ps, .pdf)

Acknowledgments . . . ii

1. Introduction . . . 1 (.ps, .pdf)
1.1. Terminological remarks . . . 2


2. Theoretical background . . . 3
2.1. An overview of FGD . . . 3
2.2. Comparing FGD with other theories . . . 6
2.2.1. Government Binding Theory . . . 6
2.2.2. Lexical Functional Grammar . . . 7
2.2.3. Head Driven Phrase Structure Grammar . . . 7
2.2.4. Comparison with FGD . . . 9


3. Using existing sources . . . 10 (.ps, .pdf)
3.1. Source data . . . 11
3.1.1. The attributes used in the lexicon and their values . . . 11


4. Content of the lexicon . . . 14 (.ps, .pdf)
4.1. Format of a lexical entry . . . 14
4.1.1. Voice . . . 15
4.1.2. Reflexivity . . . 16
4.1.3. Subject . . . 16
4.1.4. Functor . . . 17
4.1.5. Grammatemes . . . 17
4.1.6. Diatheses . . . 18

4.2. Reflexivity . . . 21
4.2.1. True reflexive with se. . . 21
4.2.2. True reflexive with si. . . 23
4.2.3. Reciprocal verbs with se. . . 23
4.2.4. Reciprocal verbs with si. . . 27
4.2.5. Reflexive tantum with se. . . 28
4.2.6. Derived reflexive verbs with se. . . 28
4.2.7. Reflexive tantum with si. . . 28
4.2.8. Derived reflexive verbs with si. . . 29
4.2.9. Reflexive with optional se. . . 29
4.2.10. Reflexive with optional si. . . 30
4.2.11. Reflexive passive . . . 31
4.2.12. Mediopassive . . . 31
4.2.13. Homonymy of reflexive verbs . . . 31

4.3. Diatheses . . . 33
4.3.1. Diatheses encoded in the lexicon . . . 40
4.3.2. Periphrastic passive . . . 41
4.3.3. Reflexive passive . . . 44
4.3.4. Mediopassive . . . 46
4.3.5. Constructions with mít and dostat . . . 47
4.3.6. Resultative construction with mít . . . 49

4.4. Verbs with the infinitive in their frames . . . 49
4.4.1. Raising verbs . . . 55
4.4.2. Equi verbs . . . 59


5. Algorithm for processing the surface frames . . . 66 (.ps, .pdf)
5.1. Identifying and merging frames, marking the obligatority . . . 66
5.2. Assigning functors . . . 68
5.3. Marking diatheses . . . 73
5.4. Usage of the final lexicon . . . 73
5.4.1. Generating frame instances from frames . . . 74
5.4.2. Extracting subcat lists . . . 76


6. Conclusions . . . 78 (.ps, .pdf)
6.1. Verb grouping . . . 78
6.2. Further perspectives . . . 80

Bibliography . . . 81

Subject index . . . 86

Verbs used in examples . . . 88

A. Abbreviations . . . 90 (.ps, .pdf)

B. Symbols used in the dictionary . . . 92
B.1. Voice . . . 92
B.2. Reflexivity . . . 92
B.3. Subject . . . 93
B.4. Functors . . . 93
B.5. Grammatemes . . . 94
B.6. Obligatority . . . 96
B.7. Passive and other diathesis . . . 96


C. Possible functors assigned to grammatemes . . . 97
C.1. Abbreviations used in lists of possible functors . . . 97
C.2. Lists of functors attached to every surface realization . . . 98


D. Algorithm for assigning functors . . . 102
D.1. Prototypical and less typical surface forms . . . 102
D.2. Assigning non prototypical frame . . . 103
D.3. Results . . . 103
D.3.1. Verbs processed fully automatically . . . 103
D.3.2. Verbs with ambiguous frames . . . 108


E. Classification of Czech frames . . . 115
E.1. Automatically processed frames . . . 115
E.2. Ambiguous frames . . . 116


F. Experiment with LFG . . . 121 (.ps, .pdf)
F.1. Verb lexicon . . . 121
F.2. Templates . . . 122
F.3. Lexical rules . . . 123
F.4. Grammar . . . 125
F.5. Test sentences . . . 126


G. Web interface to the lexicon . . . 132 (.ps, .pdf)
2nd part (.ps, .pdf)
3rd part (.ps, .pdf)



List of Tables

4.1. Taxonomy of reflexive verbs . . . 21
4.2. Three types of reciprocal verbs . . . 24
4.3. Reciprocal verbs with si. . . 27
4.4. Subject diatheses . . . 39
4.5. Subject diatheses revisited . . . 40

5.1. Identifying single frames . . . 67
5.2. Merging frame variants . . . 67
5.3. Prototypical frames . . . 70
5.4. Non prototypical frames . . . 70
5.5. Merging frame of the verb čertit se (be angry) . . . 71

6.1. Classification of verbs . . . 78
6.2. Classification of verbs with adjuncts simplified . . . 79



List of Figures

4.1. Three level system . . . 36
4.2. Three level system revisited . . . 37

5.1. Mapping between TL and ML in active voice . . . 69
5.2. Mapping between TL and ML for verbs with at least three actants . . . 69

D.1. The algorithm for assigning functors to non prototypical frame . . . 104

F.1. Simple grammar in LFG . . . 125
F.2. Testing sentences . . . 126
F.3. C structure of sentence 140a . . . 127
F.4. F structure of sentence 140a . . . 127
F.5. C structure of sentence 140b . . . 128
F.6. F structure of sentence 140b . . . 128
F.7. C structure of sentence 140c . . . 129
F.8. F structure of sentence 140c . . . 129
F.9. C structure of sentence 140d . . . 130
F.10.F structure of sentence 140d . . . 131

G.1. Main window of the web interface . . . 133
G.2. File with all frames containing hPTc2 . . . 134
G.3. Frames processed fully automatically, with ambiguous free modifications . . . 135