HPSG: An Overview and Some Work in Progress
                 Head-Driven Phrase Structure Grammar
                            Carl Pollard
                     Pacific Asia Conference on
                Language, Information, and Computation
                         Kyung Hee University
                             Seoul, Korea
                            Dec. 21, 1996


In an earlier talk, I tried to convey something of the nature of
constraint-based grammar, a general approach to theoretical and
computational linguistics that has been emerging since the mid-to-late
1970's. In doing so I identified a set of characteristics that are
common to constraint-based theories and systems, repeated here in (1).
The key thing to keep in mind in understanding these characteristics
is that the grammar is a set of constraints, and that the relationship
between the constraints and linguistic structures is essentially the
same as the relationship between a logical theory and a
model-theoretic interpretation of that theory.

(1) Characteristics of Constraint-Based Grammar

    A. Generativity: It is determinate what are the candidate 
       structures and what are the constraints; and it is decidable
       whether a candidate structure satisfies the constraints.

    B. Expressivity: the language in which the theory is formulated
       is richly expressive, since constraints are imposed BY
       the theory, not UPON the theory.

    C. Empirical Adequacy: the theory must be able to capture
       empirical generalizations accurately (get the facts right).

    D. Psycholinguistic Responsibility: theories should lend
       themselves to interfacing with plausible processing models.

    E. Nondestructiveness: grammars do not make reference to
       operations that destructively modify existing structure.

    F. Locality: satisfaction of constraints by a structure depends
       only on that structure, not on "competitors".

    G. Parallelism: the levels of representation are not sequentially
       derived; they are mutually constrained parallel structures.

    H. Radical Nonautonomy: syntax has no distinguished status; 
       instead syntactic representations are just one (or more) of
       the parallel structures.

In this talk I will try to convey something of the flavor of work in
constraint-based grammar by briefly sketching the outlines of the
constraint-based framework most familiar to me, head-driven phrase
structure grammar (HPSG), and then describing some work in progress.

The development of HPSG began in at Stanford University and
Hewlett-Packard Laboratories in the early to mid 1980s. It started out
as an attempt to revise an earlier theory, GPSG, but also incorporated
many insights and analytic techniques from other frameworks and
research traditions, including categorial grammar, LFG, GB, situation
semantics, datatype theory, and knowledge representation. Most of the
effort in HPSG so far has been devoted to syntax and semantics, but in
recent years HPSG work has gradually extended into other areas, such
as morphology, phonology, and pragmatics. In addition to theoretical
linguistic work in HPSG, there has also been a great deal of activity
connected with establishing correct formal foundations and developing
efficient computer implementations, but today I'll limit my attention
to mostly to syntax, semantics, and the interface between them.

Since its inception, HPSG theory has undergone continual revision.
Just to give myself a stationary target, I'm going to take as my point
of reference a version of the theory sometimes called HPSG3, which is
essentially the version of HPSG sketched in the final chapter of the
book HPSG (Pollard and Sag 1994). Since HPSG is constraint-based, the
first thing you need to know about in order to understand it are what
the candidate structures look like, and what the constraints look
like.

In HPSG, the structures are TYPED FEATURE STRUCTURES, which are a kind
of rooted, connected, directed graph. The edges are labelled with
symbols called FEATURES, and the nodes are labelled with symbols
called TYPES. The type tells what kind of linguistic object the node
represents, and the features tell what parts the object has.  Since
feature structures are hard to display, usually what we show instead
is a certain kind of description of a feature structure called an
ATTRIBUTE-VALUE MATRIX or AVM. For example, the AVM shown in (2) is
the lexical entry for the verb WALKS.

(2) attribute-value matrix description of a feature structure
    representing (a token of) the word WALKS (slightly simplified)

    +-                                                            -+
    |word                                                          |
    |PHON <w ks>                                                   |
    |       +-                                                 -+  |
    |       |synsem                                             |  |
    |       |         +-                                    -+  |  |
    |       |         |category                              |  |  |
    |       |         |     +-          -+                   |  |  |
    |       |         |     |verb        |                   |  |  |
    |       |         |     |VFORM finite|                   |  |  |
    |       |         |HEAD |AUX   -     |                   |  |  |
    |       |         |     |INV   -     |                   |  |  |
    |SYNSEM |CATEGORY |     |PRD   -     |                   |  |  |
    |       |         |     +-          -+                   |  |  |
    |       |         |        +-                        -+  |  |  |
    |       |         |        |val                       |  |  |  |
    |       |         |VALENCE |SUBJECT     <NP[nom]::[1]>|  |  |  |
    |       |         |        |COMPLEMENTS <>            |  |  |  |
    |       |         |        |SPECIFIER   <>            |  |  |  |
    |       |         +-       +-                        -+ -+  |  |
    |       |        +-                           -+            |  |
    |       |        |walk                         |            |  |
    |       |        |          +-             -+  |            |  |
    |       |CONTENT |          |index          |  |            |  |
    |       |        |WALKER [1]|PERSON 3rd     |  |            |  |
    |       |        |          |NUMBER singular|  |            |  |
    +-      +-       +-         +-             -+ -+           -+ -+

Here, as in all AVM descriptions, type symbols are given in lower-case
and feature symbols in upper-case.  Here the symbol "NP[nom]::[1]"
given as the value of the SUBJECT feature is actually shorthand for
another AVM description, shown in (3)a.  (I'll come back to this
shortly.) The lexical entry in (2) tells us, among other things, that
WALKS is pronounced /w ks/, that it is a finite nonauxiliary verb,
that it selects as its subject a 3rd-singular nominative NP, and that
it refers to the semantic walk relation. Also note the two occurences
of the tag [1], which indicate structure sharing.  This means that two
paths in the feature structure lead to the same node.  In this case
the structure sharing means that the referential index of the subject
NP is assigned to the semantic role of WALKER in the walk relation.
This is a typical HPSG lexical entry. Any instance of the word WALKS
will be a feature structure that satisfies this description.

As I just mentioned, in HPSG, syntactic category symbols are just
abbreviations for certain feature descriptions, as shown in (3).

(3) Some category symbols in HPSG

              +-                                         -+
              |synsem                                     |
              |      +-                               -+  |
              |      |local                            |  |
              |      |         +-                  -+  |  |
              |      |         |category            |  |  |
              |      |         |HEAD noun           |  |  |
    NP::[1] = |      |         |        +-      -+  |  |  |
              |LOCAL |CATEGORY |        |valence |  |  |  |
              |      |         |        |SUBJ  <>|  |  |  |
              |      |         |VALENCE |COMPS <>|  |  |  |
              |      |         |        |SPR   <>|  |  |  |
              |      |         +-       +-      -+ -+  |  |
              |      |        +-       -+              |  |
              |      |        |nom-obj  |              |  |
              |      |CONTENT |INDEX [1]|              |  |
              +-     +-       +-       -+             -+ -+

              +-                                         -+
              |synsem                                     |
              |      +-                               -+  |
              |      |local                            |  |
              |      |         +-                  -+  |  |
              |      |         |category            |  |  |
          S = |      |         |HEAD verb           |  |  |
              |LOCAL |CATEGORY |        +-      -+  |  |  |
              |      |         |        |valence |  |  |  |
              |      |         |        |SUBJ  <>|  |  |  |
              |      |         |VALENCE |COMPS <>|  |  |  |
              |      |         |        |SPR   <>|  |  |  |
              +-     +-        +-       +-      -+ -+ -+ -+

				 -4-

              +-                                                 -+
              |synsem                                             |
              |      +-                                       -+  |
              |      |local                                    |  |
              |      |         +-                          -+  |  |
              |      |         |category                    |  |  |
         VP = |      |         |HEAD verb                   |  |  |
              |LOCAL |CATEGORY |        +-              -+  |  |  |
              |      |         |        |valence         |  |  |  |
              |      |         |        |SUBJ  <[synsem]>|  |  |  |
              |      |         |VALENCE |COMPS <>        |  |  |  |
              |      |         |        |SPR   <>        |  |  |  |
              +-     +-        +-       +-              -+ -+ -+ -+

              +-                                                 -+
              |synsem                                             |
              |      +-                                       -+  |
              |      |local                                    |  |
              |      |         +-                          -+  |  |
              |      |         |category                    |  |  |
         N' = |      |         |HEAD verb                   |  |  |
              |LOCAL |CATEGORY |        +-              -+  |  |  |
              |      |         |        |valence         |  |  |  |
              |      |         |        |SUBJ  <>        |  |  |  |
              |      |         |VALENCE |COMPS <>        |  |  |  |
              |      |         |        |SPR   <[synsem]>|  |  |  |
              +-     +-        +-       +-              -+ -+ -+ -+

Thus, an NP is an expression whose head feature -- essentially its
part of speech -- is nominal and all of whose valence requirements are
satisfied, whereas an S is a verbal expression whose valence
requirements are satisfied. But a VP is a verbal expression whose
subject valence is unsatisified, while an N' is a nominal expression
whose specifier valence is unsatisified.

In HPSG, as in many constraint-based theories, a lot of the action is
in the lexicon. This can be seen by considering a few simplified
lexical entries (here many of the features and type symbols are
omitted for simplicity): 

(4) Some HPSG lexical entries (simplified)
 
           +-         +-
           |          |HEAD verb
           |          |        +-
           |CATEGORY  |        |SUBJ  <NP::[1]>
    WALK   |          |VALENCE |COMPS <>
           |          +-       +-
           |        +-       
           |CONTENT |walk
           |        |WALKER [1]
           +-       +-

           +-         +-
           |          |HEAD verb
           |          |        +-
           |CATEGORY  |        |SUBJ  <NP::[1]>
    SEE    |          |VALENCE |COMPS <NP::[2]>
           |          +-       +-
           |        +-       
           |        |see
           |CONTENT |SEER [1]
           |        |SEEN [2]
           +-       +-

           +-         +-
           |          |HEAD verb
           |          |        +-
           |CATEGORY  |        |SUBJ  <NP::[1]>
    GIVE   |          |VALENCE |COMPS <NP::[2],PP[to]::[3]>
           |          +-       +-
           |        +-       
           |        |give
           |CONTENT |GIVER [1]
           |        |GIVEN [2]
           |        |RECIPIENT [3]
           +-       +-

           +-         +-
           |          |HEAD verb
           |CATEGORY  |        +-
           |          |        |SUBJ  <NP::[1]>
    TRY    |          |VALENCE |COMPS <VP[SUBJ <NP::[1]>]:[2]>
           |          +-       +-
           |        +-       
           |CONTENT |try
           |        |TRYER [1]
           |        |TRIED [2]
           +-       +-

           +-         +-
           |          |HEAD verb
           |CATEGORY  |        +-
           |          |        |SUBJ  <[1]>
    SEEM   |          |VALENCE |COMPS <VP[SUBJ <[1]>]:[2]>
           |          +-       +-
           |        +-       
           |CONTENT |seem
           |        |ARGUMENT [2]
           +-       +-

Here the WALK entry is just a simplification of the entry given in
(2); the SEE entry is a typical transitive verb entry, with the
subject index assigned to the SEER role and the object index to the
SEEN role; and the GIVE entry is a typical ditransitive verb entry,
which assigns three roles. The TRY entry is typical for a subject-
control verb: the verbal complement is treated as a VP (not an S),
whose unrealized subject is required to be coindexed with the matrix
subject. And finally, the SEEM lexical entry typifies
raising-to-subject verbs: the entire complex of syntactic and semantic
features of the matrix subject is constrained to be identical with
those of the unrealized complement subject.

In HPSG the representation of phrases is similar to the representation
of words, except that phrases have an additional feature that words
lack, namely the DAUGHTERS feature, whose value is a feature structure
called a constituent-structure that encodes information about the
phrase's immediate consituents. For example, a very partial AVM
description of the phrase KIM SAW SANDY has the form shown in (5):

(5) AVM description of a feature structure representing the sentence
    KIM SAW SANDY

    +-                                                        -+
    |phrase                                                    |
    |PHON   <kIm s  s ndi>                                     |
    |SYNSEM S[fin]                                             |
    |     +-         +-            -+                      -+  |
    |     |          |phrase        |                       |  |
    |     |SUBJ-DTR <|PHON   <kIm>  |                       |  |
    |     |          |SYNSEM NP[nom]|                       |  |
    |     |          +-            -+                       |  |
    |     |         +-                                  -+  |  |
    |     |         |phrase                              |  |  |
    |     |         |PHON <s  s ndi>                     |  |  |
    |     |         |SYNSEM VP[fin]                      |  |  |
    |DTRS |         |     +-        +-           -+      |  |  |
    |     |         |     |         |word         |      |  |  |
    |     |HEAD-DTR |     |HEAD-DTR |PHON    <s > |      |  |  |
    |     |         |     |         |SYNSEM V[fin]|      |  |  |
    |     |         |DTRS |         +-           -+      |  |  |
    |     |         |     |            +-            -+  |  |  |
    |     |         |     |            |phrase        |  |  |  |
    |     |         |     |COMP-DTRS  <|PHON   <s ndi>|> |  |  |
    |     |         |     |            |SYNSEM NP[acc]|  |  |  |
    +-    +-        +-    +-           +-            -+ -+ -+ -+

To make phrase descriptions easier to read, we usually use a tree-like
notation, so that (5) comes out looking like (6):

(6)                         S[fin]
                           / \
                     SUBJ/     \HEAD
                       /         \
                 NP[nom]         VP[fin]
                    |             /\
                   Kim      HEAD/    \COMP
                              /        \
                         V[fin]        NP[acc]
                            |            |
                           saw         Sandy

However, it is important to keep in mind that in the structure
described by (6) is actually a feature structure, not a tree.

Of course, not just any old feature structure is a legal linguistic
object. To be well-formed, a feature structure has to satisfy the
constraints imposed by the grammar. Now technically speaking,
constraints in HPSG are expressed in a special formal language called
feature constraint logic, but since I don't have time to explain that,
I'll just express whatever constraints we need to discuss in a
combination of pictures and (I hope) plain English.

To start with, we need a set of constraints that are usually called
FEATURE GEOMETRY CONSTRAINTS, as described in (7):

(7) Feature geometry constraints specify:

    a. what the types are;
    b. which types are subtypes of other types;
    c. for each type, what features are appropriate for that type;
    d. for each type and each feature appropriate for that type,
       what types are appropriate for the value of that feature.

One convenient way to present the feature geometry constraints is by
using TYPE DECLARATIONS, like the ones shown in (9). The way to read
these is explained in (8):

(8) How to read HPSG type declarations

    type0
      type1  "The type type0 has the subtypes type1 and type2."
      type2
             

    type0: [FEAT1: type1, FEAT2: type2]

             "The type type0 has the features FEAT1 and FEAT2,
              with values of type type1 and type2 respectively."

(9) Some HPSG type declarations

    sign: [PHON: list[phonstring], SYNSEM: synsem]
      word
      phrase [DTRS: constit-struc]

    synsem: [CAT: category, CONTENT: content]

    category: [HEAD: head, VALENCE: valence]

    valence: [SUBJ: list[synsem], COMPS: list[synsem], SPR: list[synsem]]

    content
      state-of-affairs
        walk: [WALKER: index]
        see:  [SEER: index, SEEN: index]
      nom-obj: [INDEX: index]

    index: [PER: per, NUM: num, GEND: gend]

    constit-struc
      coord-struc
      headed-struc

    head
      noun: [CASE: case]
      verb: [VFORM: vform, AUX: boolean, INV: boolean]
      prep: [PFORM: pform]
      adjective
      marker
      determiner

    vform
      fin
      inf
      ger
      psp
      prp
      pas

    case
      nom
      acc   

    gend
      masc
      fem
      neut

    num
      sing
      plur

    per
      1st
      2nd
      3rd    

    bool
      plus
      minus

Very often, an important part of a new analysis in HPSG is revising
one or more feature geometry constraints. For example, in his recent
research on relative clauses, Ivan Sag eliminates the DTRS feature for
phrases. This sounds trivial, but one of the results is that subtypes
of phrase can now be used to cross-classify phrases along the lines of
construction grammar. (Jong-Bok Kim and Byung-Soo Park's paper on
free relatives at this conference adopts this approach.)

In addition to the feature geometry constraints, we also need
constraints to tell us what the well-formed words and phrases are.
For words, this is simple (10):

(10) Wellformedness constraint for words:

     Every word must satisfy one of the lexical entries.

For phrases, things are more complicated, since there are several
different constraints that must be satisfied. Some of the most
important ones are given in (11-13):

(11) Immediate Dominance Constraint

     Every phrase has to satisfy one of the following descriptions
     (order irrelevant):

     a. head-subject schema

                        [SUBJ <>]
                           /\
                     SUBJ/    \HEAD
                       /        \                    
                   phrase     phrase
                              [COMPS <>,
                               SPR   <>]

     b. head-specifier schema

                       [SPR <>]
                          /\
                     SPR/    \HEAD
                      /       \
                  phrase     phrase
                             [COMPS <>]

     c. head-complement schema

                      [COMPS <>]
                        /\
                  HEAD/    \COMPS
                    /        \
                 word     list of phrases

     d. ...

Basically this constraint offers a handful of options for what a local
tree can look like. Thus it is roughly analogous to X-bar theory.

(12) Head Feature Constraint

     For every headed phrase, the head features must be the same
     as those of the head daughter.

This guarantees that the head features of any word also show up on all
the phrasal projection sof that word.

(13) Valence Constraint

     For each of the valence features (SUBJECT, COMPLEMENTS,
     SPECIFIER), the value of that feature on a headed phrase
     must be the value on the head daughter minus those valence
     requirements satisfied by one of the nonhead daughters.

In essence, this constraint is analogous to functional application
in categorial grammar: it serves to check off valence requirements
of lexical heads as they are satisfied.

Together, all these constraints on phrasal wellformedness will have
the effect that the sentence KIM SAW SANDY in (6) actually looks like
the picture in (14):

(14) KIM SAW SANDY revisited

                +-                  -+
                |HEAD [3]            |
                |        +-      -+  |
                |        |SUBJ  <>|  |
                |VALENCE |COMPS <>|  |
                |        |SPR   <>|  |
                +-       +-      -+ -+
                       /     \
                 SUBJ/         \HEAD
                   /             \
           [2]NP[nom]           +-                     -+
              |                 |HEAD [3]               |
             Kim                |        +-         -+  |
                                |        |SUBJ  <[2]>|  |
                                |VALENCE |COMPS <>   |  |
                                |        |SPR   <>   |  |
                                +-       +-         -+ -+
                                    /      \
                              HEAD/          \COMP
                                /              \
             +-                     -+      [3]NP[acc]
             |HEAD [3]verb[fin]      |          |
             |        +-         -+  |        Sandy
             |        |SUBJ  <[2]>|  |
             |VALENCE |COMPS <[1]>|  |
             |        |SPR   <>   |  |
             +-       +-         -+ -+
                       |
                      saw

We can summarize all of this as in (15):

(15) Wellformedness in HPSG

     A feature structure is well-formed just in case:

     a. Every node satisfies the feature geometry constraints;
     b. Every word node satisifies one of the lexical entries;
     c. Every phrase node satisfies the immediate dominance
        constraint, the head feature constraint, the valence
        constraint, etc.

That is the end of the overview. Now let's take a quick look at
 a couple of representative works in progress.

First, a look at some important work by Abeille and Godard on French
auxiliaries. This is part of a larger project on French syntax, which
also includes Ivan Sag, Philip Miller, and Jong-Bok Kim. Most work in
HPSG, as in other syntactic theories, has assumed that the complement
of an auxiliary is a phrase, usually a verb phrase, as shown in (16):

(16)  Chris has seen Dana. (Fr. Jean a vu Dominique)

                           S[fin,+AUX]
                            / \
                          /     \
                    NP[nom]      VP[fin,+AUX]
                    |             /\
                 Chris          /    \
                 Jean  V[fin,+AUX]   VP[psp]
                          |          /\
                         has       /    \
                          a   V[psp]    NP[acc]
                                |         |
                              seen       Dana
                               vu      Dominique

In HPSG, auxiliaries have standardly been analyzed as verbs that are
positively specified for the feature AUX, and which have the same
valence as a typical raising-to-subject verb:

(17)  category of auxiliary HAS

      +-                                    -+
      |HEAD verb[fin,+AUX]                   |
      |        +-                        -+  |
      |        |SUBJ  <[1]>               |  |
      |VALENCE |COMPS <VP[psp,SUBJ <[1]>]>|  |
      +-       +-                        -+ -+

Here the VALENCE value is essentially the same as for the raising verb
SEEM given in (4). Roughly the same kind of auxiliary analysis (a
single verbal phrasal complement) has been standardly assumed for
French, for example in the influential work on French negation and
adverb placement by Pollock in the GB framework. However, there are
some mysterious facts about the so-called tense auxiliaries AVOIR
`have' and ETRE `be' in French that are not explained by this
analysis.  I'll mention just two kinds of relevant facts here. First,
consider the facts in (18):

(18)a. Paul a   bruyamment ri      et   chante.  [Adverb can scope
       Paul has loudly     laughed and sung      over both verbs]

    b. Jean a   immediatement pense   a  une reponse et 
       Jean has immediately   thought of an  answer  and

       contre-attaque.     [Adverb can only scope over 
       counter-attacked     the first verb.]

These facts seem difficult to explain if the conjoined complements of
the auxiliary are verb phrases. Next, consider the facts in (19),
adapted from a recent paper French and English negation by Jong-Bok
Kim and Ivan Sag:

(19)a. Nous n'allons pas ne pas [aller en vacances cette ete].
       We   NE go    not NE not NE not go    on vacation this summer
      `We're not going to not go on vacation this summer.'

    b.*Nous ne sommes pas ne pas alles en vacances cette ete.
       We   NE are    not NE not gone  on vacation this  summer
      `We didn't not go on vacation this summer.'

As Kim and Sag argue on the basis of other facts unrelated to these,
the French negative word PAS can be either an adjunct to a nonfinite
VP, or else a complement to a finite verb. Thus in (19)a, we have two
different cases of PAS adjoining to the VP complement ALLER EN
VACANCES CETTE ETE. Then why is (19)b bad? Well, the first PAS is a
complement to the finite auxiliary SOMMES. Since complements cannot
repeat, the only option for the second PAS would be to adjoin to the
VP complement. The fact that this is impossible strongly suggests that
the auxiliary has no VP complement for the negation to adjoin to.

In order to explain these and other facts, Abeille and Godard assume
that the correct structure for French auxiliary complementation is NOT
as in (16), but rather as shown in (20):

(20)                     S[fin,+AUX]
                            / \
                          /     \
                    NP[nom]      VP[fin,+AUX]
                    |             /  |     \
                  Jean          /    |       \
                       V[fin,+AUX]   V[psp]   NP[acc]
                          |          |         |
                          a         vu      Dominique

But how can a verb, even an auxiliary, take a lexical verb, not a VP,
as its complement? And how does DOMINIQUE, which is intuitively the
object of the particple VU, become a complement of the auxiliary? In
order to explain this, Abeille and Godard make use of an idea first
proposed by Hinrichs and Nakazawa for the analysis of German, and also
employed in the analysis of many other languages, such as the account
of Korean complex predicates in Chan Chung's recent dissertation. This
idea, variously known as argument composition or argument attraction,
is embodied in the lexical entry for the French tense auxiliary given
in (21):

(21) tense auxiliary AVOIR (simplified from Abeille & Godard)

     +-                                                   -+
     |HEAD verb[VAUX avoir]                                |
     |        +-                                       -+  |
     |        |SUBJ  <[1]>                              |  |
     |        |       +-                            -+  |  |
     |VALENCE |       |HEAD verb[psp,VAUX avoir]     |  |  |
     |        |COMPS <|         +-        -+         |  |  |
     |        |       |VALENCE <|SUBJ <[1]>|> + [4]  |  |  |
     |        |       |         |COMPS [4] |         |  |  |
     +-       +-      +-        +-        -+        -+ -+ -+

Here the + sign should be read as list concatenation. What this says
in essence is that the complements of AVOIR are a past-particple verb
plus whatever complements the participle subcategorized for.  Given
this lexical entry, the structure in (20) is generated by
already-existing constraints. The only thing we have to change is the
head-complement schema in (11)c: we have to eliminate the requirement
that complements always be phrasal.

Next, let's take a quick look at some work in the area of binding
theory. Binding theory is concerned with the question of which NPs in
a sentence may or may not be the antecedents of different kinds of
pronouns. In a 1992 paper, Ivan Sag and I proposed a binding theory
for English along the following lines. First of all, we divide
referential NPs into three basic types as in (22):

(22) Referential types for NPs (adapted from Pollard & Sag 1992)

     nom-obj
       nonpronoun (npro)
       pronoun    (pron)
         personal-pronoun  (ppro)  [e.g., SHE, HIM]
         anaphor  (ana)            [e.g. HIMSELF, EACH OTHER]

The theory depends crucially on a relationshop called OBLIQUENESS,
defined in (23):

(23)a. The OBLIQUENESS relation is defined on the valents of a head
       as follows:

       1. The subject is less oblique than the specifier and 
          complements.
       2. The specifier is less oblique than the complements.
       3. A complement is less oblique than another complement
          that occurs later on the COMPS list.

Notice that according to this definition, obliqueness is defined
purely in terms of grammatical relations, not on the basis of tree
configurations or linear order. Thus, for example, order of
obliqueness is determined as in (24):

(24) Order of obliqueness

     a. Kim_1 gave the books_2 to Sandy_3.
     b. Kim's_1 picture of Sandy_2

Next, we define the relationship called O-COMMAND as follows:

(25) O-command

     Given two referential phrases XP and YP, 

     a. XP LOCALLY O-COMMANDS YP provided they are valents of 
        the same head and XP is less oblique than YP.

     b. XP O-COMMANDS YP provided XP locally o-commands YP,
        or a phrase containing YP.

In essence this says that you bind things which are, or are contained
in things which are, more oblique than you.

Next, binding is defined as in (26):

(26) Binding

     Given two referential phrases XP and YP, XP (LOCALLY) BINDS YP
     provided:

     a. XP (LOCALLY) o-commands YP, and
     b. XP and YP are coindexed.

Finally, the binding theory is as stated in (27):

(27) Binding theory (for American English)

     a. An anaphor (ana) which is locally o-commanded must be locally 
        bound.
     b. A personal pronoun (ppro) must not be locally bound.

This theory explains the basic facts about English pronoun binding, as
in (28), as well as a great many other facts that are unexplained by
the so-called standard binding theory of Chomsky 1985, such as the
ones in (29):

(28) Basic predictions of Pollard & Sag's binding theory

     a. John_i admires himself_i.
     b. *John_i admires him_i.
     c. John_i thinks Bill_j admires him_i.
     d. *John_i thinks Bill_j admires himself_i.

(29) Predictions of Pollard and Sag's binding theory unexplained
     by Chomsky 1985

     a. [John and Mary]_i hoped that the journal would reject
        [each other's]_i papers.
     b. John suggested that tiny portraits of [each other]_i
        would make ideal gifts for [the twins]_i.
     c. The agreement that [Iran and Iraq]_i reached guaranteed
        [each other's]_i fishing rights until the year 2010.
     d. John's_i campaign requires that pictures of himself_i
        be placed all over town.
     e. Mary couldn't decide on birthday presents for [the twins]_i.
        Maybe tiny portraits of [each other]_i would do.
     f. Iran_i agreed with Iraq_j that [each other's]_k fishing
        rights must be respected.   (k = i+j)
     g. John_i asked Mary_j to send reminders to everyone except
        themselves_k.   (k = i+j)

The basic idea in explaining the facts in (29) is that these are all
anaphors that are not locally o-commanded. Therefore, condition a of
the binding theory in (27) does not apply to them, and they are not
required to have locally o-commanding binders. For such "exempt"
anaphors, we assume that the antecedent is determined by nonsyntactic
factors, such as narrative point of view.

However, this theory does not work very well when we look at certain
British and literary dialects, where we commonly find examples like
the ones in (30):

(30) "Lit/Brit" binding facts unexplained by Pollard & Sag's 
      binding theory (after Zribi-Hertz 1989)

     a. because Desiree ... had undoubtedly explained to them the
        precise nature of her relationship with himself 
        [himself = Philip]
     b. his_i wife ... suspected himself_i to be the cause of
        her distress
     c. But Rupert_i was not unduly worried about Peter's 
        opinion of himself_i

These facts show that under appropriate discourse conditions,
involving such factors as logophoricity, point of view, and emphasis,
in these dialects even a locally o-commanded reflexive pronoun need
not be locally bound. 

Turning next to Chinese, we find still a different situation.  First
of all, in the absence of special discourse conditions, the Chinese
reflexive ZIJI is bound by a subject, either the local one or a higher
one:

(31) Basic Chinese binding facts: subject binding

     a. Zhangsan_i zhidao Lisi_j xihuan ziji_i/j.
        Zhangsan   know   Lisi   like   self

     b. Ta_i zhidao tamen_j dui    ziji_i/j mei  xinxin.
        S/he know   they    toward self     lack confidence

     c. Zhangsan_i gei-le   Lisi_j yizhang ziji_i/*j de xiangpian.
        Zhangsan   gave-ASP Lisi   one-CL  self      DE photo

However, there are some unexplained exceptions to this, as shown in
(32):

(32) ZIJI contained in an adjunct clause (after Xue & Pollard ms.)
   
     a. Zhangsan_i shuo [Wangwu_j bu  hui  qu, [yinwei  Lisi_k
        Zhangsan   say   Wangwu   not will go   because Lisi
  
        mei      yaoqing ziji_i/*j/k
        have-not invite  self

     b. Zhangsan_i shuo [ruguo Lisi_j piping    ziji_i/j/*k, 
        Zhangsan   say   if    Lisi   criticize self

        [Wangwu_k jiu bu  hui  qu.
         Wangwu   jiu not will go

     c. Lisi_i zhidao [Zhangsan_j bu  xihuan [neixie 
        Lisi   know    Zhangsan   not like    those

        [e_k piping    ziji_i/j/k] de ren_k].
             criticize self       DE person

Here the mystery is why the ZIJI in the relative clause in (32)c can
be bound by the next subject up, but not the ZIJI in the adjunct
clauses in (32)a-b.

We also have to consider facts like those in (33), which show
that sometimes ZIJI can have an antecedent which is not a
superordinate subject:

(33) a. [Zhangsan_i de jiaoao]_j hai-le   ziji_i/*j.
        Zhangsan    DE pride     harm-ASP self

     b. [Zhangsan_i de xin]_j biaoming Lisi hai-le   ziji*i/*j/k.
         Zhangsan   DE letter indicate Lisi harm-ASP self

     c.  Ziji_i de xiaohai mei      de  jiang de xiaoxi shi  Lisi_i
         self   DE child   have-not get prize DE news   make  Lisi

         hen  nanguo.
         very sad

Here (33)a-b are examples of so-called "subcommanding" antecedents,
while in (33)c the antecedent is an experiencer object of a causative.

Now let me sketch an analysis of these facts, which is based partly on
some ongoing work on Chinese with Ping Xue, and partly on some work in
progress on English by Karin Golde.

The first step is to eliminate the classification of NP types in (22)
and replace it with the new one in (34):

(34) Referential types for NPs (After Golde in progress)

     nom-obj
       nonpronoun (npro)
       pronoun    (pron)
         personal-pronoun  (ppro)  [e.g., SHE, HIM, TA]
         self-pro                  [e.g. HIMSELF, ZIJI]

As before, pronouns are divided into two subtypes, but the new
classification is based on morphological properties, not binding
properties. The binding theory for English, both American and
Lit/Brit, now takes the form in (35):

(35) Binding theory for English (American and Lit/Brit)
     a. Self-pronouns which are not locally bound are subject
        to certain discourse constraints [which remain to be
        specified].
     b. A personal pronoun (ppro) must not be locally bound.

This permits examples like the ones in (30), as long as the
appropriate discourse conditions are satisfied. But American English
has the further constraint in (36):

(36) Additional constraint for American English

     Self-pronouns which are locally o-commanded must be
     locally bound.

Because of (36), examples like (30) are excluded in American English.
Thus in American English, the only self-pronouns which are REQUIRED to
satisfy discourse constraints are the ones that are NOT locally
o-commanded.

And finally, the binding theory we propose for Chinese is as
shown in (37):

(37) Binding theory for Chinese
     
     a. Self-pronouns which are not bound by a subject are subject
        to certain discourse constraints [which remain to be
        specified].
     b. [same as for English]

This reanalysis reveals a striking similarity between Chinese and
English, especially "Lit/Brit" English. The key difference is that in
English, the only way a self-pronoun can escape the discourse
constraints is to be locally bound. In Chinese, by contrast, it is
sufficient to be bound, but the binder must be a subject; otherwise,
as with examples (33a,c), further discourse conditions must be
present.

The only thing left to explain is the mysterious contrast in (32): why
can't WANGWU be the antecedent of ZIJI in (32)a? The answer is that
WANGWU fails to o-command ZIJI, because the subordinate clause is an
adjunct, not a valent. By contrast, in (32)c, ZHANGSAN can be the
antecedent of ZIJI because it o-commands ZIJI. This is because ZIJI is
contained is contained in the direct object of the lower clause, which
IS a valent.