Contact Us

Instructions

Frequently Asked Questions

ETD Help

Policies and Procedures

Copyright and Patents

Access Restrictions

Search ETDs:
Advanced Search
Browse by:
Browse ProQuest
Search ProQuest

Laney Graduate School

Rollins School of Public Health

Candler School of Theology

Emory College

Emory Libraries

New ETD website is now LIVE and located here: etd.library.emory.edu

Advances in Methods and Evaluations for Distributional Semantic Models using Computational Lexicons

Hahn, Meera Satya (2016)
Honors Thesis (59 pages)
Committee Chair / Thesis Adviser: Choi, Jinho
Committee Members: Xiong, Li ; Seitaridou, Effrosyni
Research Fields: Computer science; Artificial intelligence; Engineering
Keywords: Natural Language Processing; Word Embedding; Word Vector; Artificial Intelligence; Distributional Semantics; Computational Linguistics; Sentiment Analysis
Program: College Honors Program, Computer Science
Permanent url: http://pid.emory.edu/ark:/25593/rj67f

Abstract

Word embedding has drastically changed the field of natural language processing and has become the norm for distributional semantic models. Previous methods for generating word embeddings did not take advantage of the semantic information in sentence structures. In this work we create a new approach to word embedding that leverages structural data from sentences to produce higher quality word embeddings. We also introduce a framework to evaluate word embeddings from any part of speech. We use this framework to assess the quality of word embeddings produced with different semantic contexts and show that sentence structure is rich with semantic information. Our evaluations show that our new word embeddings far out preform the original word embeddings in all parts of speech. Furthermore we examine the task of sentiment analysis in order to demonstrate the superiority of our system's word embeddings.

Table of Contents

1 Introduction .............................................................................. 1

1.1 Thesis Statement ............................................................ 3

2 Background ............................................................................... 4

2.1 Natural Language Structures ............................................ 4

2.1.1 Parts of Speech ........................................................... 4

2.1.2 Dependency Structure .................................................. 5

2.1.3 Predicate Argument Structure ........................................ 7

2.2 Lexicon Databases .......................................................... 8

2.2.1 WordNet ..................................................................... 9

2.2.2 VerbNet ...................................................................... 9

2.3 Distributional Semantics .................................................. 10

2.4 Related Work ................................................................. 11

2.4.1 Skip Gram and Continuous Bag of Words ........................ 11

2.4.2 Dependency Based Word Embeddings ............................ 13

3 Approach ................................................................................. 16

3.1 System Overview ........................................................... 16

4 Experiments ............................................................................. 21

4.1 WordNet Evaluations....................................................... 21

4.1.1 WordNet Similarity....................................................... 21

4.1.2 Evaluation Methodology ................................................ 22

4.1.3 Ranking Correlation ..................................................... 25

4.1.4 Result Analysis ............................................................ 26

4.1.5 Context Window Size Evaluation ..................................... 29

4.2 VerbNet Evaluations ........................................................ 29

4.2.1 Evaluation Methodology ................................................ 29

4.2.2 Result Analysis ............................................................ 32

4.3 Extrinsic Evaluations on Sentiment Analysis ....................... 34

4.3.1 Background on Convolutional Neural Networks ................. 35

4.3.2 System Overview ......................................................... 36

4.3.3 Result Analysis ............................................................ 38

5 Conclusions and Future Work ...................................................... 39

Appendix A - Tables of Evaluation Results ....................................... 41

Appendix B - Semantic Roles and Their Functions ............................. 44

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.