Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch
Go to file
2017-10-27 16:41:56 +05:30
dnc Move and cleanup copy tasks 2017-10-27 16:41:56 +05:30
tasks Move and cleanup copy tasks 2017-10-27 16:41:56 +05:30
.gitignore Initial commit, pushed into pypi 2017-10-26 20:59:05 +05:30
LICENSE Initial commit 2017-10-26 20:09:56 +05:30
README.md Update reset_experience in READMe 2017-10-26 21:34:35 +05:30
release.sh Bump version and include release script 2017-10-26 21:30:30 +05:30
setup.cfg Initial commit, pushed into pypi 2017-10-26 20:59:05 +05:30
setup.py Bump version and include release script 2017-10-26 21:30:45 +05:30

Differentiable Neural Computer, for Pytorch

This is an implementation of Differentiable Neural Computers, described in the paper Hybrid computing using a neural network with dynamic external memory, Graves et al.

Install

pip install dnc

Usage

Parameters:

Argument Default Description
input_size None Size of the input vectors
hidden_size None Size of hidden units
rnn_type 'lstm' Type of recurrent cells used in the controller
num_layers 1 Number of layers of recurrent units in the controller
bias True Bias
batch_first True Whether data is fed batch first
dropout 0 Dropout between layers in the controller (Not yet implemented)
bidirectional False If the controller is bidirectional (Not yet implemented)
nr_cells 5 Number of memory cells
read_heads 2 Number of read heads
cell_size 10 Size of each memory cell
nonlinearity 'tanh' If using 'rnn' as rnn_type, non-linearity of the RNNs
gpu_id -1 ID of the GPU, -1 for CPU
independent_linears False Whether to use independent linear units to derive interface vector
share_memory True Whether to share memory between controller layers
reset_experience False Whether to reset memory (This is a parameter for the forward pass)

Example usage:

from dnc import DNC

rnn = DNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  batch_first=True,
  gpu_id=0
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors) = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))

Example copy task

The copy task, as descibed in the original paper, is included in the repo.

python ./copy_task.py -cuda 0

General noteworthy stuff

  1. DNCs converge with Adam and RMSProp learning rules, SGD generally causes them to diverge.
  2. Using a large batch size (> 100, recommended 1000) prevents gradients from becoming NaN.