Exercise 7, Grammar Development

Towards South Asian Grammars

Solving the Script Problem

The following sections in the XLE Documentation tell you how to set up XLE and emacs for different types of scripts.

Our experience has been that setting the character encoding in the configuration file of your grammar and creating testsuites with a utf-8 friendly editor works best.

CHARACTERENCODING utf-8.

You should also tell XLE that it will be receiving utf-8 input as shown below. You can set this in XLE everytime you start a process, or you can set up an XLE configuration file and specify it there. The current configuration file for the Urdu grammar is provided in xlerc as a model. You need to name the file "xlerc" for XLE to find it. When an xlerc file exists in the directory where you are starting XLE from (where your grammar files are), it will read that in and configure XLE accordingly, cf. the documentation on the Tcl Shell Interface for a description.

set-character-encoding stdio utf-8

Rather than opening the testsuite within emacs, what then works well is to use the parse-testfile. To get information on this command, while in XLE type:

help parse-testfile

The version we want takes the name of a testsuite as a parameter and the number of the sentence you want to parse.

parse-testfile your-testsuite.lfg 3

Keep working on re-engineering the toy English grammar into the language you are interested in. Add some lexical items in your language and see if you can parse a simple sentence.

Implementing Case Marking

The grammar in grammar7.lfg contains some case markers. The approach taken in this grammar is the less elegant, but conceptually simpler one by which the possible range of case options is specified within the GF template.

The grammar in grammar7-iofu.lfg contains the same case markers, but they are now specifying which grammatical function they need to appear with via inside-out funtional application. The GF template is kept simple.

The case markers in each file have been represented abstractly in that the lexical entries are ERG and ACC. Substitute the actual forms your language uses for these in your grammar. Following either of the two examples, introduce case markers into your grammar for your language. If your language uses morphological case, then add the information to the full form of your lexical entry (if you had a morphological analyzer, the information could instead be introduced by a relevant tag).

Extend the analysis by including ditransitive verbs, i.e., include a treatment of dative OBJ2.

If you dare, work on dative subjects as well (if not, add other types of clauses and case markers to your grammar).

Please submit your grammar and your testsuite to Miriam Butt ( at uni-konstanz dot de) by the 10th of September at 9 am.