Course Materials

GENERAL READING

Grammar Engineering

Grammar Writer’s Cookbook (Butt et al. 1999)

Lexical-Functional Grammar (Dalrymple 2001)

XFST – The Finite-State Morphology Book (Beesley and Karttunen 1994)

Day 01

Grammar Engineering

Lecture slides: colombo2014_01

Exercises: ex1-grammardev grammar1

Information Extraction

Lecture slides: 01_introduction_motivation

Sarawagi, Information Extraction: sarawagi  (table of contents)

Manning, Raghavan, Schuetze Information Retrieval: website

Day 02

Grammar Engineering

Lecture slides: colombo2014_02

Exercises: ex2-grammardev grammar2

Information Extraction

Reading Assignment: Chapter 1 of the Sunita Sarawagi paper. Chapter 2 would be desirable to read too if time permits.

Day 03

Grammar Engineering

Lecture slides: colombo2014_03

Exercises: ex3-grammardev grammar3

Information Extraction

Lecture slides: 02_IE_scenario_source_regular_classes

 

Day 04

Grammar Engineering

Lecture slides: colombo2014_04

Exercises: ex4-grammardev grammar4

Information Extraction

Lecture slides (two sets): 03_rule_based_NER

.         04_NER_lecture_2

Exercise:

civil_rights_assign

aacrm_text

civil_rights_notes (look here for sort examples)

civil_rights_multiple_dates_py

civil_rights_solution_pl

 

Day 05

Grammar Engineering

Lecture slides: colombo2014_05

Exercises: ex5-grammardev grammar5

Information Extraction

Lecture slides: 05_NER_lecture_3

 

Day 06

Grammar Engineering

Lecture slides:colombo2014_06

Exercises:ex6-grammardev grammar6

Information Extraction

Lecture slides: 06_decision_trees

Extractor: extract_001_pl

Shell scripts:

get_minitest_sh

get_minitrain_sh

get_smalltrain_sh

label_small_simple_sh

train_mini_simple_sh

train_small_simple_sh

(Wapiti CRF and Andrew McCallum’s data page, CMU Seminar Announcements)

 

DAY 07

 

GRAMMAR ENGINEERING

Lecture slides:colombo2014_07

Exercises:ex7-grammardev grammar7 grammar7-iofu

xlerc file: xlerc

INFORMATION EXTRACTION

Lecture slides: 07_relation_extraction

Tar file with new extractor, splits, and experiment script: export_20140909.tar

New files: unigram_bigram_pattern 

extract_003.pl

label_dev_sh

Command line: sh run_train_test.sh >& run_train_test.sh.log

Do not forget to change run_train_test.sh to use the new extractor and the new pattern file!

showlines script (used in creating the split, rename to “showlines”): showlines

 

DAY 08

 

GRAMMAR ENGINEERING

Lecture slides:colombo2014_08

Exercises:ex8-grammardev grammar8

INFORMATION EXTRACTION

Lecture slides: 08_ontological_and_open_IE

 

DAY 09

 

GRAMMAR ENGINEERING

Lecture slides:colombo2014_09

Exercises:ex9-grammardev grammar9

INFORMATION EXTRACTION

Lecture slides: 09_multilingual_extraction

Sequence labeling: export_20140911.tar

 

 

DAY 10

 

GRAMMAR ENGINEERING

Lecture slides:colombo2014_10   urdu-syntax

grammar solution for Exercise 9: grammar10

INFORMATION EXTRACTION

Lecture slides: SMT_morphologically_rich

Description of machine learning assignments: machine_learning_assignments

Learning perl:

http://qntm.org/files/perl/perl.html (learn perl in 2 hours 30 minutes)

www.mawode.com/~waltman/talks/nlp_ppw.pdf (natural language processing in perl)

http://learn.perl.org/tutorials/

http://perl-tutorial.org/

Machine learning:

Tom Mitchell “Machine Learning” (text book)

http://www.meta-net.eu/meta-research/training/machine-learning-tutorial/

DAY 11

 

optical character recognition

Lecture-1 Slides: OCR Module Lecture -1 0

DAY 12

 

optical character recognition

Lecture-2 Slides: OCR Module Lecture-2 1

DAY 13

 

optical character recognition

Lecture-3 Slides: OCR Module Lecture-3

DAY 14

 

optical character recognition

Lecture-4 Slides: OCR Module Lecture-4