Exercise 4, Grammar Engineering

 

Coordination

Exercise: Expand grammar4.lfg to include coordination. (Make sure to also download english.infl.patch.full.fst, which is a file required by this grammar.) Read up on how to do it in the section on the Starter Grammar in the XLE documentation as well as in the documentation on Regular Expression Macros.

The following sentences should work:

 

METARULEMACRO

XLE has a built-in functionality called the METARULEMACRO. It is a special macro that allows the grammar writer to express generalizations that hold for all the rules of a grammar. In the ParGram grammars, this is generally used at least for coordination.

Exercise: Modify your treatment of coordination by commenting out the calls to the regular-expression macros introduced earlier in the individual c-structure rules and instead have the METARULEMACRO call those regular-expression macros.

 

Adding a Finite-State Morphological Analyzer

Preliminaries

In this exercise, you will practice integrating a morphological analyzer into your grammar. Up to date, you have been working with a full form lexicon. This means you have full control over the lexical entries within your grammar, but it is also very tedious as you have to write a separate lexical entry for each inflected (or derived) form.

In this exercise we will work with a version of the finite-state morphological analyzer called english.infl.patch.full.fst, which is part of the English ParGram grammar.

Alternatively, you can also build your own finite-state morphological analyzer and hook it up, or use a different type of analyzer altogether. Details as to the morphology-grammar interface can be found in the Starter Notes and in the XLE Morphology Section.

You can use grammar4.lfg as a starting point. It already contains a sublexical rule for verbs as well as several lexical entries for morphological tags.

Extending the Grammar: Nouns and Adjectives

grammar4.lfg already interacts with the morphological analyzer with respect to verbs.

Now expand the grammar so that nouns and adjectives are also coming out of the morphological analyzer.

Generally proceed in the following way:

  1. In this exercise, we are working with a morphological analyzer that is a "black box" for us. That is, we know what the input is, but we don't know the inner workings of the morphological analyzer. In order to see what the output of the morphological analyzer is from within XLE, type "morphemes some-word". For example:
% morphemes bananas
analyzing {bananas}
{bananas "+Token"|banana "+Noun"  "+Pl"}
  1. If this works, it is a sign that the morphological analyzer is part of the grammar.
  2. The "morphemes" command shows us what the output of the morphological analyzer is. Use this knowledge to integrate the relevant information into the grammar.
    1. Make sure you have an entry for all the tags that are produced as output in the MORPH ENGLISH LEXICON section, "+Noun" and "+Pl" in the example above. If not, add the missing ones.
    2. Decide what functional information you want associated with any given tag. Where possible, use existing templates from your grammar.
    3. You can also decide to have no f-annotation associated with a tag, for example: "+Verb V-POS XLE ."
    4. Now write sublexical rules that can parse all the tags in the right order (MORPH ENGLISH RULES).

You should make sure that the following sentences work, with the nouns and the verbs coming from the morphological analyzer:

    1. Sophie educated the bright robot.
    2. lazy leopards sleep.
    3. the curious leprechaun stole beer in Ireland.

Note:

    1. For verbs you need to specify the head word (lemma) and the relevant subcatgorization information in the lexicon.
    2. For nouns and adjectives, this is not necessary, as the "unknown" guesser in the morph-lex.lfg file guesses words not in the lexicon to be either nouns or adjectives. So, unless you wanted to specify extra information for a particular lemma, you do not need to have extra entries for nouns and adjectives in your lexicon. Try deleting (or commenting out) all the ones you have entered and see if your testsuite still works.

The unknown entry

In the previous exercise, you added a finite-state morphological analyzer to your grammar. In the MORPH ENGLISH SECTION, there is an "unknown" entry which allows the morphological analyzer to pass its knowledge about lexical items into the grammar.

 
-unknown  ADJ-S XLE (^ PRED) = '%stem';
          N-S XLE (^ PRED) = '%stem'.

This entry has the effect that any word not present in the lexicon sections of grammar4.lfg will be guessed to be either a noun or an adjective. If this word then follows the sublexical rules specified in the ENGLISH MORPH RULES section, then it can be parsed by the grammar. So, all count nouns and all adjectives should now be parseable by your grammar without further ado. You should delete all your existing count noun and adjective entries and try it out.

Lexicon Edit Entries

You have several entries in your grammars that could be several different parts-of-speech. (For example, "dog" has been encoded as a verb and as a noun). When you delete your noun entry for "dog", you need to make sure that the remaining entry interacts properly with the -unknown entry in the morph-lex-10.lfg file. The way to do this is to specify that the verb entry is not the only entry.

 
dog       +V-S XLE @(TRANS dog);
          ETC.

Read more about the interactions between lexical entries and lexicons in XLE Lexicon Entries and Lookup Model.


Please submit your exericses and your testsuite to Martin Forst by 8 pm.


Relevant Reading Material

The Grammar Writer's Cookbook, Ch. 12

Kaplan, Ron, John T. Maxwell III, Tracy Holloway King and Richard Crouch. 2004. Integrating Finite-state Technology with Deep LFG Grammars. In Proceedings of the ESSLLI04 Workshop on Combining Shallow and Deep Processing for NLP.

Starter Notes

XLE Morphology Section

XLE Lexicon Entries and Lookup Model