Course Materials B

General Reading

corpus-based Lexical Semantics

R

  • Obtaining R and related software
  • A selection of textbooks on R programming
    • Wickham, Hadley (2014). Advanced R. The R Series. Chapman & Hall/CRC. Free online version at http://adv-r.had.co.nz/. [textbook for programmers with free online version]
    • Chambers, John M. (2008). Software for Data Analysis: Programming with R. Statistics and Computing. Springer, 1st edition.
    • Field, Andy; Miles, Jeremy; Field, Zoë (2012). Discovering Statistics Using R. SAGE Publications, Thousand Oaks. [focus on statistics, but seems to be very popular]
    • Kabacoff, Robert I. (2011). R in Action: Data analysis and graphics with R. Manning, Shelter Island, NY.
    • Teetor, Paul (2011). R Cookbook. O’Reilly Media, Sebastopol, CA.
    • Bloomfield, Victor A. (2014). Using R for Numerical Analysis in Science and Engineering. Chapman & Hall/CRC.
    • Wickham, Hadley (2009). ggplot2: Elegant Graphics for Data Analysis. Springer, Heidelberg, New York.
    • Chang, Winston (2012). R Graphics Cookbook. O’Reilly Media, Sebastopol, CA.
    • Murray, Steven (2013). Learn R in a Day. SJ Murray. [reasonably cheap Kindle ebook]
    • Knell, Robert J. (2013). Introductory R: A Beginner’s Guide to Data Visualisation and Analysis using R. Book homepage: http://www.introductoryr.co.uk/. [reasonably cheap ebook]
  • Free online tutorials and guides
    • Kuhnert, Petra and Venables, Bill (2005). An introduction to R: Software for statistical modelling & computing. Lecture notes, CSIRO Mathematical and Information Sciences. Script & Data (ZIP archive)
    • Andrew Robinson’s IcebreakeR is a very compact introduction to R: IcebreakeR (PDF)
    • The R Inferno by Patrick Burns is a must-read for any serious R programmer: R Inferno (PDF)
    • Tom Short’s R Reference Card is a bit old, but very useful during the first weeks: Reference Card (PDF)
    • Reference card “Data Mining” by Yanchang Zao: Reference Card (PDF)

Distributional Semantics

  • Review papers
    • Turney, Peter D. and Pantel, Patrick (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188. (PDF)
    • Clark, Stephen (in press). Vector space models of lexical meaning. In S. Lappin and C. Fox (eds.), Handbook of Contemporary Semantics. Wiley-Blackwell, 2nd edition. (PDF)
    • Erk, Katrin (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10), 635–653.
  • Some well-known distributional semantic models
    • Landauer, Thomas K. and Dumais, Susan T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 211–240.
    • Schütze, Hinrich (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123. (PDF)
    • Baroni, Marco and Lenci, Alessandro (2010). Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–712. (PDF)

other Software Packages

We will install the software together in the second week. Corpora and other large data sets will be provided through the server or on a USB memory stick.

DAY 01

optical character recognition

 

Day 02

Corpus-Based Lexical Semantics

  • Distributional semantics tutorial – part 1: SlidesHandout (PDF)

optical character recognition

DAY 03

Corpus-Based Lexical Semantics

  • First steps in R: first_steps_in_R (ZIP)
  • Exercise: exercise_1 (PDF)
    – install package corpora from CRAN first
    – additional R commands (plots, data analysis) are described here: SIGIL Unit #1 (PDF)

optical character recognition

DAY 04

Corpus-Based Lexical Semantics

  • Using the wordspace package: using_wordspace (ZIP)
  • Exercise: try to solve the questions (Q:) and exercises (Exercise:, Homework:) included in the R scripts

optical character recognition

 

Day 05

Corpus-BAsed Lexical Semantics

  • Distributional semantics tutorial – part3: SlidesHandout (PDF)
  • Additional evaluation experiments on paradigmatic vs. syntagmatic relations: Lapesa et al. (2014) (PDF)
  • Homework: carry out your own DSM evaluation experiments using tasks included in wordspace and wordspaceEval packages and the BNC co-occurrence data provided on the server.

optical character recognition

Lecture-5 Slides:OCR Module Lecture-5

DAY 06

Corpus-Based Lexical Semantics

 

optical character recognition

Lecture-6 Slides: OCR Module Lecture-6

Day 07

Corpus-Based Lexical Semantics

(continued topic on WordNet and WSD from DAY 06)

optical character recognition

Lecture-7 Slides: OCR Module Lecture-7

Day 08

Corpus-Based Lexical Semantics

 

optical character recognition

Lecture-8 Slides: OCR Module Lecture-8 0.1

DAY 09

Corpus-Based Lexical Semantics

optical character recognition

Nepali Group OCR: npl_Group – Nepali OCR

Sinhala Group OCR: sin_Group – Sinhala OCR

Sinhala Group-2 OCR: sin_Group2 – Sinhala OCR

Sindhi Group OCR: snd_Group – Sindhi OCR

Tamil Group OCR: tam_Group-Tamil OCR

Urdu Group OCR: urd_Group-Urdu OCR