Home Encoding of Czech characters

Derived frames and the lexicon*

Hana Skoumalová

1 Introduction

In this paper I want to show one aspect of creating an electronic lexicon, namely how to store all the frames derived from the `base' verb frame. I use the term derived frames, as none of the terms used in the literature (passive constructions, agentless constructions, impersonal constructions) expresses all the varieties of derivations.

The first version of the electronic lexicon has been implemented in DATR (see [EG90]), and is described in [Sko97]. The verb frames contained in the electronic lexicon are immediately valid for verbs in active voice. The problem is, how to store all the derived variants of them. They can either be listed in the lexicon, or it should be possible to derive them from the active frames by lexical rules. The derivations of the verb frames show many regularities, that can be generalized. So, it seems that the lexical rules approach is better.

In this paper, I will examine possible variants of verb frames and suggest the algorithms for necessary lexical rules. I will discuss several theoretical approaches to verb frame classification and examine which of them is the most suitable for my purposes.

2 Theoretical background

In the lexicon, I utilized the theory developed by Sgall, Hajičová and Panevová -- Functional Generative Description (FGD) [SHP86], and especially the part dealing with the verb frames [Pan74, Pan75, Pan80]. Two levels of syntactic description -- the underlying (i.e. deep) structure and the surface structure -- are distinguished. In the underlying structure we work with inner participants (Tesniere's actants) and free modifications. A verb can have up to five inner participants: Actor, Patient, Addressee, Origin and Effect. These inner participants are members of the verb frame and they are realized as subject and objects in the surface structure. Some of them can be optional (facultative), which means that they do not need to be present in the sentence -- in the deep as well as in the surface structure. Other participants are always obligatory in the deep structure, however, they do not need to be obligatory in the surface structure. Those participants that can be omitted on the surface, because they are known from the context, are called (obligatory) deletable participants. The so called general participants are classed with the deletable participants. Whether a participant is optional, obligatory or deletable can be tested by a question test [Pan80, pp.29-32].

In other theoretical models [DHG87, KNR95], the repertory of participants is wider: instead of Actor one speaks about Agent, Causer, Experiencer, etc. Patient is more or less a countepart of the direct object and Recipient of the indirect object. In FGD, Actor and Patient are determined by syntactic criteria rather than by semantic ones (cf. Tesniere's approach [Tes59]), and other participants are determined semantically:

  1. If the verb frame contains only one participant, this participant is Actor.
  2. If the frame contains two participants, one of them is Actor and the other is Patient. In most cases, Actor is the subject of the active constructions, but there are some exceptions to this rule, which will be discussed later.
  3. If the verb frame has more than two participants, the positions of Actor and Patient must be occupied, and the other participants are Addressee, Effect or Origin. The decision about which participant bears which role is based on the semantics of the participants.

In this work, I will only consider such derived constructions in which the surface syntactic structure is different from the primary shape. Such constructions as
(1)a. Bolest probudila Pavla.
Pain woke Pavel.
b. Marie probudila Pavla.
Marie woke Pavel.
differ in the semantics of the subject. In (1a), the subject has the role of Causer (according to [DHG87, DH87]), while in (1b), the subject is Agent. In the FGD approach, however, both the subjects have the role of Actor. Both the constructions are identical on the surface level and they only differ in the lexical setting of the subject. If we wanted to make this fine distinction we would have to work with semantic features in the lexicon. In the literature ([GK89, DH87]), some semantic features are used, but their exact description is missing. After a thorough revision they may be used in the next stage of our work.

3 Verb frames and their surface realization

In the further text I will deal with the following types of derived constructions: In this section, I will discuss the conditions under which the single types of derived constructions can be formed, and the lexical rules that can be employed for their construction.

3.1 Periphrastic passive

The periphrastic passive uses the auxiliary verb být (to be) and a passive participle; the whole predicate agrees with the subject in person, gender and number:
(2)Kniha je čtena.
Book is read.
This construction is usually formed from transitive verbs (i.e. verbs with an object in accusative), but there are exceptions: The subject slot of the passive construction is either filled by the original accusative object (typically Patient), or it is empty. In the case when the subject is empty the verb shows agreement with neuter singular. The original subject (Actor) becomes an optional member of the surface in instrumental:
(4)Kniha byla napsána slavným autorem.
Book was written by famous author.
The periphrastic passive is felt as bookish or obsolete in modern Czech, especially the passive with expressed Actor. Unlike its English counterpart, Czech passive is very rarely used for changing the Topic-Focus articulation -- for this purpose the word order is employed. The passive construction is mainly used, if the speaker wants to avoid saying who/what Actor is, or if Actor is general; in these cases, the reflexive passive is used more often.

There is another possible surface form of Actor: the prepositional phrase od `from' + genitive, but it seems that this form cannot be used with all verbs -- here, again, the semantics of the verb and its participants plays a role:
(5)a. Pepík je bit od otce.
Pepík is beaten from father.
b.*Kniha byla napsána od slavného autora.
*Book was written from famous author.
The conditions in which this construction can be used will be examined in the future work. In this paper, I assume that Actor can only occur in instrumental.

The algorithm for deriving the frame of the periphrastic passive is described here:

The only verbs which could be exceptional are the ditransitive verbs (verbs with two accusatives in the frame). There are only two such verbs in Czech:

3.2 Reflexive passive

In this construction, the verb stays in the active voice, but the reflexive particle se (in the literal Engl. translations copied as SE) is added to the sentence, the participant in accusative (if present) becomes the subject, and Actor disappears.
(7)a.Bábovka se peče.
Cake SE bakes.
`The cake is being baked.'
b.Do města se jde tudy.
To town SE goes this way.
`This is the way to the town.'
The example in (7a) is a real reflexive passive, derived from a transitive verb, while the sentence in (7b) is an impersonal active construction, derived from an intransitive verb. I mark both these constructions as reflexive passive as the algorithms for deriving them are very similar; the two constructions differ in occupying the subject position, like at the periphrastic passive.

The reflexive passive is sometimes indistinguishable from the intrinsic or true reflexive. The sentence
(8)Děti se učí dobře.
Children SE teach well.
`Children are easy to teach.' or `The children learn well.'
has two readings, as the verb učit `to teach' in reflexive passive has the same form as the reflexive verb učit se `to learn'. This ambiguity is inherent in the language and we will not try to solve this problem in the lexicon.

The algorithm for deriving the reflexive passive frame is nearly identical with the algorithm for the periphrastic passive. The only difference is that Actor is deleted.

The rules for handling the ditransitive verbs stát `to cost' and učit `to teach' are the same as at the periphrastic passive: stát cannot be passivized and with the verb učit, the frame to be passivized can contain only one accusative.
(9)a.Děti se učí (matematicedat).
Children SE teach (to mathematics).
b.*Děti se učí matematikuacc.
*Children SE teach mathematics.
c.Matematika se učí od první třídy.
Mathematics SE teaches from first grade.
d.*Matematika se učí děti.
*Mathematics SE teaches children.
The reflexive passive of učit, however, is homonymous with the reflexive verb učit se `to learn', and thus it is difficult for a Czech speaker to understand the examples in (9a) and (9b) in the passive meaning. As an active sentence with the verb učit se, (9b) is correct.

3.3 Mediopassive

This construction is very similar to the previous one -- some linguistic books actually do not distinguish between them. In mediopassive, an adverb like dobře `well', špatně `badly', snadno `easily', etc., is an obligatory member of the frame, and Actor in dative becomes an optional member of the frame. If the Actor is missing in the surface level, then there is general Actor in the deep structure. Examples:
(10)a.Matematika se mi učí snadno.
Mathematics SE medat teaches easily.
`It's easy for me to learn/teach mathematics.'
b.Z této látky se šije dobře.
From this fabric SE sews well.
`It's easy (for anyone) to make clothes from this fabric.'
This construction can also be ambiguous -- either with a reflexive passive or with an intrinsic reflexive. The dative member is then understood as Benefactor:
(11)a.Děti se mi učí dobře.
Children SE medat teach well.
`It's easy for me to teach children.' or `My children learn well.'
b.Teď už se mi píše potvrzení dobře.
Now already SE medat writes receipt well.
`Now, the receipt is finally being written correctly for me.'
or `Now, it's already easy for me to write the receipt.'
The mediopassive can also be derived from an intransitive verb:
(12)a.S kopce dolů se mi jde dobře.
From hill down SE medat goes well.
`It's easy for me to walk down-hill.'
The mediopassive can only be used with imperfective verbs; this construction describes a permanent quality of someone/something, which is expressed by the non-terminated nature of the verb.

The algorithm for deriving the mediopassive frame is nearly identical with the algorithm for the periphrastic passive, with these exceptions:

3.4 Constructions with mít and dostat

In this type of constructions, a dative member of the frame (typically Addressee) becomes the subject of a construction with the copular verb mít or dostat and the main verb occurs in the predicate as a passive participle in accusative. If the main verb has an accusative object (typically Patient), the participle agrees with it in gender and number. If the accusative object is missing, the participle has the form of singular neuter. Actor (the original subject) becomes an optional member of the frame in the form of od + genitive:
(13)a.Matka slíbila Petrovi hračku.
Mother promised Petrdat toy.
b.PetrAddr má/dostal (od matky) slíbenu hračku.
Petr has/got (from mother) promised toy.
`Petr was promised a toy (by the mother).'
c.Otec vynadá Pepíkovi.
Father will scold Pepíkdat.
d.Pepíkdat dostane vynadáno (od otce).
Pepík will get scolded (from father).
`Pepík will be scolded (by the father).'
Some verbs allow any of the two copular verbs, while others allow only the verb dostat (mít/dostat slíbeno, dostat/*mít vynadáno). It seems that the semantic role of the dative object is important here: if Addressee is moved to the subject position, both the copular verbs can be used, while with Patient only the verb dostat is admissible.

Instead of the (short) passive participle we can use the long form of adjective (long passive participle), especially in the spoken language. In such a case, however, the sentence can become ambiguous:
(14)a.Petr dostal (od matky) slíbenu hračku.
Petr got (from mother) promised toy.
`Petr was promised a toy (by the mother).'
b.Petr dostal (od matky) slíbenou hračku.
Petr got (from mother) promised toy.
`Petr was promised a toy (by the mother).'
or `Petr got the promised toy (from the mother).'
The algorithm for deriving the verb frame of this construction follows:

There is one more construction with the copular verb mít. This is not really a passive construction, as Actor remains as the subject. It is rather a sort of resultative tense. It corresponds to the English perfective constructions:
(15)a.Upeču bábovku.
I will bake cake.
b.Bábovku už mám upečenu/upečenou.
Cake already I have baked.
At this derivation, the frame remains the same as in the base form. The only operation in forming this construction is changing the predicate.

All the above constructions can only be derived from perfective verbs, as they express a result.

4 Infinitive and derived frames

In this section, I want to examine the conditions for derived constructions in frames containing an infinitive. In such frames, potentially both the verbs (the governing verb and the dependant) can occur in a derived construction. First I will examine the raising verbs. This term means that the subject of the infinitive becomes (is raised as) the subject or an object of the governor; in the deep structure, this participant is present only once. Next, I will examine the equi verbs. This term means that certain participant of the governor is coindexed with a participant of the dependant. On the surface level, such a participant is present only once, but in the deep structure, it is present twice -- as a member of the governor's frame as well as of the dependant's frame.

4.1 Raising verbs

First, we will examine the subject-raising verbs. This group of verbs contains modal and aspectual verbs. Examples of various active and passive constructions:
(16)a.Petr smí odejít.
Petr may to-leave.
b.Začalo pršet.
Started to-rain.
`It started raining.'
c.Petr musí být pochválen.
Petr must to-be praised.
d.Musí se zabít dvě mouchy jednou ranou.
Must SE to-kill two flies by one hit.
`Two flies must be killed by one hit.'
e.Bábovka se začala péci.
Cake SE started to-bake.
`The cake started to be baked.'
f.Únosce musí dostat slíbeno výkupné.
Kidnapper must to-get promised ransom.
`The kidnapper must be promised the ransom.'
g.Matka už musí mít uvařeno.
Mother already must to-have cooked.
`Mother must have already cooked (everything).'
h.Tady se ti musí sedět nepohodlně.
Here SE youdat must to-sit uncomfortably.
`This must be an uncomfortable seat for you.'
We can see in the examples that the subject is shared by the two verbs, no matter which voice is used in the infinitive construction. The infinitive can occur in both periphrastic and reflexive passive and in the construction with the verb dostat; the mediopassive and the active construction with the verb mít are only possible with the verb muset `must' in the meaning of high probability. It seems that the governor can only occur in active voice, but we will come back to this issue later.

Subject-to-object raising verbs are such verbs that have an infinitive in the frame and the subject of this infinitive becomes an object of the higher verb. This group contains the verbs of perception:
(17)a.Vidím ho přicházet.
I see him to-come.
`I see him coming.'
b.?Vidím ho být tázána.
?I see him to-be asked.
`I see him being asked.'
c.?Cítím bábovku péct se.
?I smell cake to-bake SE.
`I can smell that a cake is being baked.'
The passive constructions are questionable with this group of verbs; a further research on a text corpus will be necessary.

4.2 Equi verbs

At equi verbs, the subject and possibly some objects of the infinitive are coindexed with members of the frame of the control verb, but in the deep structure, these participants are present twice. First we will examine the possibilities of passivization of the governor. I will show the possible derivations on the verb slíbit, which is syntactically ambiguous -- either Actor of this verb or Addressee is coindexed with the subject of the infinitive.
(18)a.?Rodičei Petrovij slíbili 0j svézt se na poníkovi.
?Parents Petrdat promised to-ride on pony.
b.Petrovij bylo (rodičii) slíbeno 0j svézt se na poníkovi.
Petrdat was (by parents) promised to-ride on pony.
c.Petrovij se slíbilo 0j svézt se na poníkovi.
Petrdat SE promised to-ride on pony.
d.Petrj má/dostal (od rodičůi) slíbeno 0i svézt se na poníkovi.
Petr has/got (from parents) promised to-ride on pony.
e.Rodičei Petrovij slíbili 0i přestat kouřit.
Parents Petrdat promised to-stop to-smoke.
f.*Petrovij bylo (rodičii) slíbeno 0i přestat kouřit.
*Petrdat was (by parents) promised to-stop to-smoke.
g.*Petrovij se slíbilo 0i přestat kouřit.
*Petrdat SE promised to-stop to-smoke.
h.*Petrj má/dostal (od rodičůi) slíbeno 0i přestat kouřit.
*Petr has/got (from parents) promised to-stop to-smoke.
In the first four sentences, Addressee of the main verb is coindexed with the subject of the infinitive. The construction (18a) is rejected by some speakers, but it can be converted into passive constructions (18b)-(18d), which are admitted by all speakers. The sentence (18e) is perfectly correct, but the passivization of the infinitive is impossible. The possible reason why (18b)-(18d) are correct and (18f)-(18h) are not is that in (18f)-(18h) the subject of the infinitive is coindexed with a member, that can be missing even in the deep structure.

Now, we will explore the possibilities of passive construction of the infinitive. We will use the verb chtít `to want' as the governor, which is a verb that has Actor coindexed with the subject of the infinitive.
(19)a.Petr se chce svézt na poníkovi.
Petr SE wants to-ride on pony.
b.Petr chce být pochválen.
Petr wants to-be praised.
c.Bábovkanom se nechce péct.
Cake SE does not want to-bake.
`The cake refuses to get baked.'
d.Bábovkuacc se mi nechce péct.
Cake SE medat does not want to-bake.
`I don't want to bake a cake'.
e.(*)Bábovkanom se mi nechce péct.
(*)Cake SE medat does not want to-bake.
`The cake refuses to get baked by me.'
or `I don't want to bake a cake.'
f.Dortnom/acc se mi nechce péct.
Cake SE medat does not want to-bake.
g.*Ten pánnom se mi nechce zdravit.
*That man SE medat does not want to-greet.
`I don't want to greet that man.'
h.Toho pánaacc se mi nechce zdravit.
That man SE medat does not want to-greet.
The sentence in (19a) contains two active voices, and in (19b) the infinitive is in periphrastic passive. The sentence in (19c) is a construction with reflexive passive of the infinitive. (19d) looks alike in the surface, but its syntactic structure is different. Here, the word bábovka is a direct object of the infinitive, and the whole infinitive clause is the subject of the verb chtít. Now, the question is what are the syntactic roles of the reflexive particle se and of the dative participant mi `to me'. The whole construction in the main clause could be mediopassive of the verb chtít, but then an adverb is missing in the construction. A more satisfying explanation is, that here we have an intrinsic reflexive chce se, where the infinitive is the subject of the frame (with the role of Patient) and the dative member has the role of Actor. There are several such verbs in Czech, whose Actor is not the subject of the construction, but an object in dative case (e.g. líbit se `to like', zdát se `to seem' or `to dream').

The sentence in (19e) differs from the previous sentence by the case of the word bábovka, and from (19c) by the additional dative member mi `to me'. We can understand the sentence as a variation of (19c), with Benefactor expressed by the dative case. In colloquial speech, however, this construction is sometimes used in the meaning of (19d), although some speakers reject this construction. The problem with this sentence is that we have two candidates for the subject of the main clause. The first candidate is the word bábovka, which is in nominative, and the second candidate is the infinitive péct, as in (19d). My conclusion is, that this construction is incorrect; it may be inspired by sentences like (19f), where the form of masculine inanimate noun dort is homonymous. The incorrectness of this construction is fully shown in (19g), where the position of the `nominal subject' is lexically occupied by a masculine animate noun. This sentence is out for all speakers.

4.3 Passive of raising verbs

The last issue that I want to discuss in this section is the possible reflexivization of modal and aspectual verbs. As I have said above, raising verbs do not seem to allow passivization, but let us consider the following conversation:
(20)a.Honzanom/*Honzuacc se musí požádat o povolení.
Honza SE must to-ask for permission.
`Honza must be asked for permission.'
b.Co že se musí udělat?
What that SE must to-do?
`What did you say that must be done?'
c.Požádat Honzuacc o povolení.
To-ask Honza for permission.
d.Žádat Honzuacc o povolení se mi nechce.
To-ask Honza for permission SE medat does not want.
`I don't want to ask Honza for permission.'
e.Požádat Honzuacc/*Honzanom o povolení se musí!
To-ask Honza for permission SE must!
`Honza MUST be asked for permission!'
In the sentence (20a) the embedded infinitive is in reflexive passive and its subject (Addressee in the deep structure) is raised as the subject of the modal verb muset. In (20e) the infinitive is in active voice, with Addressee in accusative. The whole infinitive clause is the subject of reflexive passive of the verb muset. (20e) is in fact an impersonal variant of the sentence
(21)Každýnom musí požádat Honzuacc o povolení.
Everybody must ask Honza for permission.
It is interesting that both the active voice and the reflexive passive of the modal verbs can only occur in certain word orders -- the sentences (20a) with Honza in accusative and (20e) with Honza in nominative are ungrammatical.

As the conditions for forming the reflexive passive are rather complex, a set of special grammar rules will be needed to handle exclusively the raising verbs.

5 Conclusion

In the previous sections I showed that the main types of passive constructions in Czech are derived regularly from the active voice, and thus we can formulate an algorithm (set of lexical rules) that performs the derivation. A lexical entry should contain the information about the members of the verb frame, their roles in the deep structure and their surface form, and a list of types of derived constructions. I will present two entries of the lexicon: the verb přát si `to wish' and the verb poručit to order.
L_přát_si:
	<syn refl> = si
	<syn subj surf> = NPnom
	<syn subj deep> = Actor
	<syn subj oblig> = oblig_deletable 
	<syn 1_obj surf> = NPacc, VPinf [subj = ^Actor; refl = yes;
                                         pass = perif, mít_2, dostat].
	<syn 1_obj deep> = Patient
	<syn 1_obj oblig> = obligatory
	<syn pass> = no.

L_poručit:
	<syn refl> = no
	<syn subj surf> = NPnom
	<syn subj deep> = Actor
	<syn subj oblig> = oblig_deletable
	<syn 1_obj surf> = VPinf [subj = ^Addr; refl = yes; pass = no].
	<syn 1_obj deep> = Patient
	<syn 1_obj oblig> = obligatory
	<syn 2_obj surf> = NPdat
	<syn 2_obj deep> = Addr
	<syn 2_obj oblig> = oblig_deletable
	<syn pass> = perif, refl, medio, mít_1, dostat.
A detailed description of the notation can be found in [Sko97], here I will only explain the attributes and values concerning the derived constructions. The attribute <syn pass> contains information about possible derived forms of the main verb. The values perif, refl, medio, mít_1, dostat, and mít_2 correspond to possibilities of forming the periphrastic passive, the reflexive passive, mediopassive, the passive construction with mít and dostat, and the active construction with mít, respectively. The possible derivations of the infinitive are stored in the description of its surface form.
* I would like to thank my colleagues Vladimír Petkevič, Alexandr Rosen and Milena Hnátková for judging the examples and fruitful discussion on the draft of this paper. This work has been partially supported by the grant GAČR 405/96/K214.

Bibliography

[DH87]
František Daneš, Zdeněk Hlavsa, et al. Větné vzorce v češtině (Sentential paradigms in Czech). Studie a práce lingvistické 23. Academia, Prague, 1987.
[DHG87]
František Daneš, Zdeněk Hlavsa, Miroslav Grepl, et al. Mluvnice češtiny 3 -- Skladba (Grammar of Czech 3 -- Syntax). Academia, Prague, 1987.
[EG90]
Roger Evans and Gerald Gazdar, editors. The DATR Papers, Volume 1. Number 139 in CSRP. University of Sussex, Brighton, 1990.
[GK89]
Miroslav Grepl and Petr Karlík. Skladba spisovné češtiny (Syntax of Standard Czech). SPN, Prague, 2nd edition, 1989.
[KNR95]
Petr Karlík, Marek Nekula, and Zdenka Rusínová, editors. Příruční mluvnice češtiny (Handbook of Czech Grammar). Nakladatelství Lidové Noviny, Prague, 1995.
[Pan74]
Jarmila Panevová. On verbal frames in functional generative description, Part I. Prague Bulletin of Mathematical Linguistics, 22:3-40, 1974.
[Pan75]
Jarmila Panevová. On verbal frames in functional generative description, Part II. Prague Bulletin of Mathematical Linguistics, 23:1752, 1975.
[Pan80]
Jarmila Panevová. Formy a funkce ve stavbě české věty (Forms and Functions in Syntax of Czech Sentence). Studie a práce lingvistické 13. Academia, Prague, 1980.
[PBS71]
Jarmila Panevová, Eva Benešová, Petr Sgall. Čas a modalita v češtině (Tense and Modality in Czech). Universita Karlova, Prague, 1971.
[SHP86]
Petr Sgall, Eva Hajičová, and Jarmila Panevová. The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. D. Reidel Publishing Company, Dordrecht, 1986.
[Sko97]
Hana Skoumalová. Verb frames in the Czech hierarchical lexicon. TELRI Newsletter, 6:1832, August 1997.
[Šmi67]
Vladimír Šmilauer. Novočeská skladba (Syntax of Modern Czech). Academia, Prague, 1967. 3rd edition.
[Svo62]
Karel Svoboda. Infinitiv v současné spisovné češtině (Infinitive in Contemporary Standard Czech). Rozpravy ČSAV. Academia, Prague, 1962.
[Tes59]
Lucien Tesniere. Eléments de syntaxe structurale. Klincksieck, Paris, 1959.