Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Go to file
2019-04-02 13:36:14 +01:00
assets fixed wording and updated assets for notebook 4 2019-03-10 16:28:21 +00:00
custom_embeddings added appendix c - handling embeddings 2019-04-02 13:36:14 +01:00
data added optional appendix for how to use your own dataset with torchtext 2018-06-06 16:06:25 +01:00
.gitignore added parameter count and epoch timer functions to all notebooks. also added cnn1d imlementation to notebook 4 2019-03-21 22:48:21 +00:00
1 - Simple Sentiment Analysis.ipynb mentioned how notebook 2 will introduce packed padded sequences 2019-04-01 16:08:30 +01:00
2 - Upgraded Sentiment Analysis.ipynb added packed padded sequences to notebook 2 2019-04-01 16:08:53 +01:00
3 - Faster Sentiment Analysis.ipynb changed some wording due to rerunning notebook 2 with packed padded sequences 2019-04-01 16:11:52 +01:00
4 - Convolutional Sentiment Analysis.ipynb removed erroneous statement about the cnn model being the fastest model 2019-04-01 16:40:55 +01:00
5 - Multi-class Sentiment Analysis.ipynb lots of formatting changes 2019-03-29 16:57:00 +00:00
A - Using TorchText with Your Own Datasets.ipynb mention how using sort key is preferred over sort = False 2019-03-23 14:48:52 +00:00
B - A Closer Look at Word Embeddings.ipynb updated appendix B - formatting and typos 2019-04-01 17:08:38 +01:00
C - Loading, Saving and Freezing Embeddings.ipynb added appendix c - handling embeddings 2019-04-02 13:36:14 +01:00
LICENSE Initial commit 2017-12-13 13:36:41 +00:00
README.md updated readme for some more details on appendix b 2019-04-01 16:19:07 +01:00

PyTorch Sentiment Analysis

This repo contains tutorials covering how to do sentiment analysis using PyTorch 0.4 and TorchText 0.3 using Python 3.6.

The first 2 tutorials will cover getting started with the de facto approach to sentiment analysis: recurrent neural networks (RNNs). The third notebook covers the FastText model and the final covers a convolutional neural network (CNN) model.

There are also 2 bonus "appendix" notebooks. The first covers loading your own datasets with TorchText, while the second contains a brief look at the pre-trained word embeddings provided by TorchText.

If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!

Getting Started

To install PyTorch, see installation instructions on the PyTorch website.

To install TorchText:

pip install torchtext

We'll also make use of spaCy to tokenize our data. To install spaCy, follow the instructions here making sure to install the English models with:

python -m spacy download en

Tutorials

  • 1 - Simple Sentiment Analysis

    This tutorial covers the workflow of a PyTorch with TorchText project. We'll learn how to: load data, create train/test/validation splits, build a vocabulary, create data iterators, define a model and implement the train/evaluate/test loop. The model will be simple and achieve poor performance, but this will be improved in the subsequent tutorials.

  • 2 - Upgraded Sentiment Analysis

    Now we have the basic workflow covered, this tutorial will focus on improving our results. We'll cover: using packed padded sequences, loading and using pre-trained word embeddings, different optimizers, different RNN architectures, bi-directional RNNs, multi-layer (aka deep) RNNs and regularization.

  • 3 - Faster Sentiment Analysis

    After we've covered all the fancy upgrades to RNNs, we'll look at a different approach that does not use RNNs. More specifically, we'll implement the model from Bag of Tricks for Efficient Text Classification. This simple model achieves comparable performance as the Upgraded Sentiment Analysis, but trains much faster.

  • 4 - Convolutional Sentiment Analysis

    Next, we'll cover convolutional neural networks (CNNs) for sentiment analysis. This model will be an implementation of Convolutional Neural Networks for Sentence Classification.

  • 5 - Multi-class Sentiment Analysis

    Finally, we'll cover the case where we have more than 2 classes, as is common in NLP. We'll be using the CNN model from the previous notebook and a new dataset which has 6 classes.

Appendices

  • A - Using TorchText with your Own Datasets

    The tutorials use TorchText's built in datasets. This first appendix notebook covers how to load your own datasets using TorchText.

  • B - A Closer Look at Word Embeddings

    This appendix notebook covers a brief look at exploring the pre-trained word embeddings provided by TorchText by using them to look at similar words as well as implementing a basic spelling error corrector based entirely on word embeddings.