Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

differentiable-neural-computers dnc pytorch rnn

Go to file

ixaxaar d1443e1ef9 Put dropout in RNNs, clip controller outputs, cleanup		2017-11-11 22:21:04 +05:30
dnc	Put dropout in RNNs, clip controller outputs, cleanup	2017-11-11 22:21:04 +05:30
docs	Change dashboard image	2017-11-02 20:41:41 +05:30
tasks	Merge read vectors correctly, option to not forward pass through memory, correctly apply linear layers	2017-11-10 14:30:37 +05:30
test	RNNs with CUDNN implementations, make whether to forward pass thorugh memory a controllable, update readme	2017-11-10 21:29:48 +05:30
.gitignore	Remove tasks from setup and add checkpoints dir to gitignore	2017-10-27 16:43:00 +05:30
.travis.yml	Add travis ci	2017-10-29 22:02:31 +05:30
LICENSE	Initial commit	2017-10-26 20:09:56 +05:30
README.md	udpate readme	2017-11-10 21:32:14 +05:30
release.sh	Bump version and include release script	2017-10-26 21:30:30 +05:30
setup.cfg	Initial commit, pushed into pypi	2017-10-26 20:59:05 +05:30
setup.py	Bump version and add more docs	2017-11-01 12:56:24 +05:30

README.md

Differentiable Neural Computer, for Pytorch

This is an implementation of Differentiable Neural Computers, described in the paper Hybrid computing using a neural network with dynamic external memory, Graves et al.

Install

pip install dnc

Architecure

Usage

Parameters:

Following are the constructor parameters:

Argument	Default	Description
input_size	`None`	Size of the input vectors
hidden_size	`None`	Size of hidden units
rnn_type	`'lstm'`	Type of recurrent cells used in the controller
num_layers	`1`	Number of layers of recurrent units in the controller
num_hidden_layers	`2`	Number of hidden layers per layer of the controller
bias	`True`	Bias
batch_first	`True`	Whether data is fed batch first
dropout	`0`	Dropout between layers in the controller
bidirectional	`False`	If the controller is bidirectional (Not yet implemented)
nr_cells	`5`	Number of memory cells
read_heads	`2`	Number of read heads
cell_size	`10`	Size of each memory cell
nonlinearity	`'tanh'`	If using 'rnn' as `rnn_type`, non-linearity of the RNNs
gpu_id	`-1`	ID of the GPU, -1 for CPU
independent_linears	`False`	Whether to use independent linear units to derive interface vector
share_memory	`True`	Whether to share memory between controller layers

Following are the forward pass parameters:

Argument	Default	Description
input	-	The input vector `(BTX)` or `(TBX)`
hidden	`(None,None,None)`	Hidden states `(controller hidden, memory hidden, read vectors)`
reset_experience	`False`	Whether to reset memory (This is a parameter for the forward pass)
pass_through_memory	`True`	Whether to pass through memory (This is a parameter for the forward pass)

Example usage:

from dnc import DNC

rnn = DNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  batch_first=True,
  gpu_id=0
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors) = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))

Example copy task

The copy task, as descibed in the original paper, is included in the repo.

From the project root:

python ./tasks/copy_task.py -cuda 0

The copy task can be used to debug memory using Visdom.

Additional step required:

pip install visdom
python -m visdom.server

Open http://localhost:8097/ on your browser, and execute the copy task:

python ./tasks/copy_task.py -cuda 0

The visdom dashboard shows memory as a heatmap for batch 0 every -summarize_freq iteration:

General noteworthy stuff

DNCs converge with Adam and RMSProp learning rules, SGD generally causes them to diverge.

Repos referred to for creation of this repo: