pytorch-dnc/README.md

# Differentiable Neural Computers and Sparse Differentiable Neural Computers, for Pytorch

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

- [Differentiable Neural Computers and Sparse Differentiable Neural Computers, for Pytorch](#differentiable-neural-computers-and-sparse-differentiable-neural-computers-for-pytorch)
  - [Install](#install)
    - [From source](#from-source)
  - [Architecure](#architecure)
  - [Usage](#usage)
    - [DNC](#dnc)
      - [Example usage:](#example-usage)
      - [Debugging:](#debugging)
    - [SDNC](#sdnc)
      - [Example usage:](#example-usage-1)
      - [Debugging:](#debugging-1)
  - [Example copy task](#example-copy-task)
  - [General noteworthy stuff](#general-noteworthy-stuff)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

[![Build Status](https://travis-ci.org/ixaxaar/pytorch-dnc.svg?branch=master)](https://travis-ci.org/ixaxaar/pytorch-dnc) [![PyPI version](https://badge.fury.io/py/dnc.svg)](https://badge.fury.io/py/dnc)

This is an implementation of [Differentiable Neural Computers](http://people.idsia.ch/~rupesh/rnnsymposium2016/slides/graves.pdf), described in the paper [Hybrid computing using a neural network with dynamic external memory, Graves et al.](https://www.nature.com/articles/nature20101)
and the Sparse version of the DNC (the SDNC) described in [Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes](http://papers.nips.cc/paper/6298-scaling-memory-augmented-neural-networks-with-sparse-reads-and-writes.pdf).

## Install

```bash
pip install dnc
```

### From source

```
git clone https://github.com/ixaxaar/pytorch-dnc
cd pytorch-dnc
pip install -r ./requirements.txt
pip install -e .
```

`pytest` is required to run the test

## Architecure

<img src="./docs/dnc.png" height="600" />

## Usage

### DNC

**Constructor Parameters**:

Following are the constructor parameters:

Following are the constructor parameters:

| Argument | Default | Description |
| --- | --- | --- |
| input_size | `None` | Size of the input vectors |
| hidden_size | `None` | Size of hidden units |
| rnn_type | `'lstm'` | Type of recurrent cells used in the controller |
| num_layers | `1` | Number of layers of recurrent units in the controller |
| num_hidden_layers | `2` | Number of hidden layers per layer of the controller |
| bias | `True` | Bias |
| batch_first | `True` | Whether data is fed batch first |
| dropout | `0` | Dropout between layers in the controller |
| bidirectional | `False` | If the controller is bidirectional (Not yet implemented |
| nr_cells | `5` | Number of memory cells |
| read_heads | `2` | Number of read heads |
| cell_size | `10` | Size of each memory cell |
| nonlinearity | `'tanh'` | If using 'rnn' as `rnn_type`, non-linearity of the RNNs |
| gpu_id | `-1` | ID of the GPU, -1 for CPU |
| independent_linears | `False` | Whether to use independent linear units to derive interface vector |
| share_memory | `True` | Whether to share memory between controller layers |

Following are the forward pass parameters:

| Argument | Default | Description |
| --- | --- | --- |
| input | - | The input vector `(B*T*X)` or `(T*B*X)` |
| hidden | `(None,None,None)` | Hidden states `(controller hidden, memory hidden, read vectors)` |
| reset_experience | `False` | Whether to reset memory |
| pass_through_memory | `True` | Whether to pass through memory |


#### Example usage:

```python
from dnc import DNC

rnn = DNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  batch_first=True,
  gpu_id=0
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors) = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
```


#### Debugging:

The `debug` option causes the network to return its memory hidden vectors (numpy `ndarray`s) for the first batch each forward step.
These vectors can be analyzed or visualized, using visdom for example.

```python
from dnc import DNC

rnn = DNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  batch_first=True,
  gpu_id=0,
  debug=True
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors), debug_memory = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
```

Memory vectors returned by forward pass (`np.ndarray`):

| Key | Y axis (dimensions) | X axis (dimensions) |
| --- | --- | --- |
| `debug_memory['memory']` | layer * time | nr_cells * cell_size
| `debug_memory['link_matrix']` | layer * time | nr_cells * nr_cells
| `debug_memory['precedence']` | layer * time | nr_cells
| `debug_memory['read_weights']` | layer * time | read_heads * nr_cells
| `debug_memory['write_weights']` | layer * time | nr_cells
| `debug_memory['usage_vector']` | layer * time | nr_cells


### SDNC

**Constructor Parameters**:

Following are the constructor parameters:

| Argument | Default | Description |
| --- | --- | --- |
| input_size | `None` | Size of the input vectors |
| hidden_size | `None` | Size of hidden units |
| rnn_type | `'lstm'` | Type of recurrent cells used in the controller |
| num_layers | `1` | Number of layers of recurrent units in the controller |
| num_hidden_layers | `2` | Number of hidden layers per layer of the controller |
| bias | `True` | Bias |
| batch_first | `True` | Whether data is fed batch first |
| dropout | `0` | Dropout between layers in the controller |
| bidirectional | `False` | If the controller is bidirectional (Not yet implemented |
| nr_cells | `5000` | Number of memory cells |
| read_heads | `4` | Number of read heads |
| sparse_reads | `4` | Number of sparse memory reads per read head |
| temporal_reads | `4` | Number of temporal reads |
| cell_size | `10` | Size of each memory cell |
| nonlinearity | `'tanh'` | If using 'rnn' as `rnn_type`, non-linearity of the RNNs |
| gpu_id | `-1` | ID of the GPU, -1 for CPU |
| independent_linears | `False` | Whether to use independent linear units to derive interface vector |
| share_memory | `True` | Whether to share memory between controller layers |

Following are the forward pass parameters:

| Argument | Default | Description |
| --- | --- | --- |
| input | - | The input vector `(B*T*X)` or `(T*B*X)` |
| hidden | `(None,None,None)` | Hidden states `(controller hidden, memory hidden, read vectors)` |
| reset_experience | `False` | Whether to reset memory |
| pass_through_memory | `True` | Whether to pass through memory |


#### Example usage:

```python
from dnc import SDNC

rnn = SDNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  sparse_reads=4,
  batch_first=True,
  gpu_id=0
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors) = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
```


#### Debugging:

The `debug` option causes the network to return its memory hidden vectors (numpy `ndarray`s) for the first batch each forward step.
These vectors can be analyzed or visualized, using visdom for example.

```python
from dnc import SDNC

rnn = SDNC(
  input_size=64,
  hidden_size=128,
  rnn_type='lstm',
  num_layers=4,
  nr_cells=100,
  cell_size=32,
  read_heads=4,
  batch_first=True,
  sparse_reads=4,
  temporal_reads=4,
  gpu_id=0,
  debug=True
)

(controller_hidden, memory, read_vectors) = (None, None, None)

output, (controller_hidden, memory, read_vectors), debug_memory = \
  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
```

Memory vectors returned by forward pass (`np.ndarray`):

| Key | Y axis (dimensions) | X axis (dimensions) |
| --- | --- | --- |
| `debug_memory['memory']` | layer * time | nr_cells * cell_size
| `debug_memory['visible_memory']` | layer * time | sparse_reads+2*temporal_reads+1 * nr_cells
| `debug_memory['read_positions']` | layer * time | sparse_reads+2*temporal_reads+1
| `debug_memory['link_matrix']` | layer * time | sparse_reads+2*temporal_reads+1 * sparse_reads+2*temporal_reads+1
| `debug_memory['rev_link_matrix']` | layer * time | sparse_reads+2*temporal_reads+1 * sparse_reads+2*temporal_reads+1
| `debug_memory['precedence']` | layer * time | nr_cells
| `debug_memory['read_weights']` | layer * time | read_heads * nr_cells
| `debug_memory['write_weights']` | layer * time | nr_cells
| `debug_memory['usage']` | layer * time | nr_cells

## Example copy task

The copy task, as descibed in the original paper, is included in the repo.

From the project root:
```bash
python ./tasks/copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 # (like original implementation)

python3 ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 32 -batch_size 1000 -optim adam -sequence_max_length 8 # (faster convergence)

For SDNCs:
python3 -B ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10  -read_heads 1 -sparse_reads 10 -batch_size 20 -optim adam -sequence_max_length 10

and for curriculum learning for SDNCs:
python3 -B ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10  -read_heads 1 -sparse_reads 4 -temporal_reads 4 -batch_size 20 -optim adam -sequence_max_length 4 -curriculum_increment 2 -curriculum_freq 10000
```

For the full set of options, see:
```
python ./tasks/copy_task.py --help
```

The copy task can be used to debug memory using [Visdom](https://github.com/facebookresearch/visdom).

Additional step required:

```bash
pip install visdom
python -m visdom.server
```

Open http://localhost:8097/ on your browser, and execute the copy task:

```bash
python ./tasks/copy_task.py -cuda 0
```

The visdom dashboard shows memory as a heatmap for batch 0 every `-summarize_freq` iteration:

![Visdom dashboard](./docs/dnc-mem-debug.png)


## General noteworthy stuff

1. DNCs converge faster with Adam and RMSProp learning rules, SGD generally converges extremely slowly.
The copy task, for example, takes 25k iterations on SGD with lr 1 compared to 3.5k for adam with lr 0.01.
2. `nan`s in the gradients are common, try with different batch sizes

Repos referred to for creation of this repo:

- [deepmind/dnc](https://github.com/deepmind/dnc)
- [ypxie/pytorch-NeuCom](https://github.com/ypxie/pytorch-NeuCom)
- [jingweiz/pytorch-dnc](https://github.com/jingweiz/pytorch-dnc)
Use FAISS instead of FLANN 2017-11-29 18:11:50 +08:00			`# Differentiable Neural Computers and Sparse Differentiable Neural Computers, for Pytorch`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00
Add toc 2017-12-11 03:34:26 +08:00			`<!-- START doctoc generated TOC please keep comment here to allow auto update -->`
			`<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->`

			`- [Differentiable Neural Computers and Sparse Differentiable Neural Computers, for Pytorch](#differentiable-neural-computers-and-sparse-differentiable-neural-computers-for-pytorch)`
			`- [Install](#install)`
			`- [From source](#from-source)`
			`- [Architecure](#architecure)`
			`- [Usage](#usage)`
			`- [DNC](#dnc)`
			`- [Example usage:](#example-usage)`
			`- [Debugging:](#debugging)`
			`- [SDNC](#sdnc)`
			`- [Example usage:](#example-usage-1)`
			`- [Debugging:](#debugging-1)`
			`- [Example copy task](#example-copy-task)`
			`- [General noteworthy stuff](#general-noteworthy-stuff)`

			`<!-- END doctoc generated TOC please keep comment here to allow auto update -->`
add badges aww yeah 2017-10-30 01:23:28 +08:00
add badges aww yeah 2017-10-30 01:23:28 +08:00			`[![Build Status](https://travis-ci.org/ixaxaar/pytorch-dnc.svg?branch=master)](https://travis-ci.org/ixaxaar/pytorch-dnc) [![PyPI version](https://badge.fury.io/py/dnc.svg)](https://badge.fury.io/py/dnc)`

Fixes #3 2017-10-27 22:43:01 +08:00			`This is an implementation of [Differentiable Neural Computers](http://people.idsia.ch/~rupesh/rnnsymposium2016/slides/graves.pdf), described in the paper [Hybrid computing using a neural network with dynamic external memory, Graves et al.](https://www.nature.com/articles/nature20101)`
Use FAISS instead of FLANN 2017-11-29 18:11:50 +08:00			`and the Sparse version of the DNC (the SDNC) described in [Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes](http://papers.nips.cc/paper/6298-scaling-memory-augmented-neural-networks-with-sparse-reads-and-writes.pdf).`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00
			`## Install`

			```bash
			`pip install dnc`
			```

update installing from source and requirements.txt 2017-12-11 03:24:18 +08:00			`### From source`
save work 2017-11-27 14:32:41 +08:00
update installing from source and requirements.txt 2017-12-11 03:24:18 +08:00			```
			`git clone https://github.com/ixaxaar/pytorch-dnc`
			`cd pytorch-dnc`
			`pip install -r ./requirements.txt`
			`pip install -e .`
			```
Use FAISS instead of FLANN 2017-11-29 18:11:50 +08:00
update installing from source and requirements.txt 2017-12-11 03:24:18 +08:00			`pytest` is required to run the test
Use FAISS instead of FLANN 2017-11-29 18:11:50 +08:00
Bump version and add more docs 2017-11-01 15:26:24 +08:00			`## Architecure`

			`<img src="./docs/dnc.png" height="600" />`

Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			`## Usage`

Update readme 2017-12-11 03:13:52 +08:00			`### DNC`

			`Constructor Parameters:`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00
RNNs with CUDNN implementations, make whether to forward pass thorugh memory a controllable, update readme 2017-11-10 23:59:48 +08:00			`Following are the constructor parameters:`

RNNs with CUDNN implementations, make whether to forward pass thorugh memory a controllable, update readme 2017-11-10 23:59:48 +08:00			`Following are the constructor parameters:`

RNNs with CUDNN implementations, make whether to forward pass thorugh memory a controllable, update readme 2017-11-10 23:59:48 +08:00			`\| Argument \| Default \| Description \|`
			`\| --- \| --- \| --- \|`
			\| input_size \| `None` \| Size of the input vectors \|
			\| hidden_size \| `None` \| Size of hidden units \|
			\| rnn_type \| `'lstm'` \| Type of recurrent cells used in the controller \|
			\| num_layers \| `1` \| Number of layers of recurrent units in the controller \|
			\| num_hidden_layers \| `2` \| Number of hidden layers per layer of the controller \|
			\| bias \| `True` \| Bias \|
			\| batch_first \| `True` \| Whether data is fed batch first \|
			\| dropout \| `0` \| Dropout between layers in the controller \|
Update readme 2017-11-13 03:04:59 +08:00			\| bidirectional \| `False` \| If the controller is bidirectional (Not yet implemented \|
RNNs with CUDNN implementations, make whether to forward pass thorugh memory a controllable, update readme 2017-11-10 23:59:48 +08:00			\| nr_cells \| `5` \| Number of memory cells \|
			\| read_heads \| `2` \| Number of read heads \|
			\| cell_size \| `10` \| Size of each memory cell \|
			\| nonlinearity \| `'tanh'` \| If using 'rnn' as `rnn_type`, non-linearity of the RNNs \|
			\| gpu_id \| `-1` \| ID of the GPU, -1 for CPU \|
			\| independent_linears \| `False` \| Whether to use independent linear units to derive interface vector \|
			\| share_memory \| `True` \| Whether to share memory between controller layers \|

			`Following are the forward pass parameters:`

Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			`\| Argument \| Default \| Description \|`
			`\| --- \| --- \| --- \|`
udpate readme 2017-11-11 00:02:14 +08:00			\| input \| - \| The input vector `(BTX)` or `(TBX)` \|
			\| hidden \| `(None,None,None)` \| Hidden states `(controller hidden, memory hidden, read vectors)` \|
various tweaks, influence distinct write positions, condition read weights with usage 2017-12-06 17:19:52 +08:00			\| reset_experience \| `False` \| Whether to reset memory \|
			\| pass_through_memory \| `True` \| Whether to pass through memory \|
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00

Update readme 2017-12-11 03:13:52 +08:00			`#### Example usage:`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00
			```python
			`from dnc import DNC`

			`rnn = DNC(`
			`input_size=64,`
			`hidden_size=128,`
			`rnn_type='lstm',`
			`num_layers=4,`
			`nr_cells=100,`
			`cell_size=32,`
			`read_heads=4,`
			`batch_first=True,`
			`gpu_id=0`
			`)`

			`(controller_hidden, memory, read_vectors) = (None, None, None)`

			`output, (controller_hidden, memory, read_vectors) = \`
Update reset_experience in READMe 2017-10-27 00:04:35 +08:00			`rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			```

Update readme 2017-12-11 03:13:52 +08:00
			`#### Debugging:`
Update readme 2017-11-13 03:04:59 +08:00
			The `debug` option causes the network to return its memory hidden vectors (numpy `ndarray`s) for the first batch each forward step.
			`These vectors can be analyzed or visualized, using visdom for example.`

			```python
			`from dnc import DNC`

			`rnn = DNC(`
			`input_size=64,`
			`hidden_size=128,`
			`rnn_type='lstm',`
			`num_layers=4,`
			`nr_cells=100,`
			`cell_size=32,`
			`read_heads=4,`
			`batch_first=True,`
			`gpu_id=0,`
			`debug=True`
			`)`

			`(controller_hidden, memory, read_vectors) = (None, None, None)`

			`output, (controller_hidden, memory, read_vectors), debug_memory = \`
			`rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))`
			```

			Memory vectors returned by forward pass (`np.ndarray`):

			`\| Key \| Y axis (dimensions) \| X axis (dimensions) \|`
			`\| --- \| --- \| --- \|`
			\| `debug_memory['memory']` \| layer * time \| nr_cells * cell_size
			\| `debug_memory['link_matrix']` \| layer * time \| nr_cells * nr_cells
			\| `debug_memory['precedence']` \| layer * time \| nr_cells
			\| `debug_memory['read_weights']` \| layer * time \| read_heads * nr_cells
			\| `debug_memory['write_weights']` \| layer * time \| nr_cells
			\| `debug_memory['usage_vector']` \| layer * time \| nr_cells

Update readme 2017-12-11 03:13:52 +08:00
			`### SDNC`

			`Constructor Parameters:`

			`Following are the constructor parameters:`

			`\| Argument \| Default \| Description \|`
			`\| --- \| --- \| --- \|`
			\| input_size \| `None` \| Size of the input vectors \|
			\| hidden_size \| `None` \| Size of hidden units \|
			\| rnn_type \| `'lstm'` \| Type of recurrent cells used in the controller \|
			\| num_layers \| `1` \| Number of layers of recurrent units in the controller \|
			\| num_hidden_layers \| `2` \| Number of hidden layers per layer of the controller \|
			\| bias \| `True` \| Bias \|
			\| batch_first \| `True` \| Whether data is fed batch first \|
			\| dropout \| `0` \| Dropout between layers in the controller \|
			\| bidirectional \| `False` \| If the controller is bidirectional (Not yet implemented \|
			\| nr_cells \| `5000` \| Number of memory cells \|
			\| read_heads \| `4` \| Number of read heads \|
Preliminary working temporal tracking 2017-12-11 19:46:47 +08:00			\| sparse_reads \| `4` \| Number of sparse memory reads per read head \|
			\| temporal_reads \| `4` \| Number of temporal reads \|
Update readme 2017-12-11 03:13:52 +08:00			\| cell_size \| `10` \| Size of each memory cell \|
			\| nonlinearity \| `'tanh'` \| If using 'rnn' as `rnn_type`, non-linearity of the RNNs \|
			\| gpu_id \| `-1` \| ID of the GPU, -1 for CPU \|
			\| independent_linears \| `False` \| Whether to use independent linear units to derive interface vector \|
			\| share_memory \| `True` \| Whether to share memory between controller layers \|

			`Following are the forward pass parameters:`

			`\| Argument \| Default \| Description \|`
			`\| --- \| --- \| --- \|`
			\| input \| - \| The input vector `(BTX)` or `(TBX)` \|
			\| hidden \| `(None,None,None)` \| Hidden states `(controller hidden, memory hidden, read vectors)` \|
			\| reset_experience \| `False` \| Whether to reset memory \|
			\| pass_through_memory \| `True` \| Whether to pass through memory \|


			`#### Example usage:`

			```python
			`from dnc import SDNC`

			`rnn = SDNC(`
			`input_size=64,`
			`hidden_size=128,`
			`rnn_type='lstm',`
			`num_layers=4,`
			`nr_cells=100,`
			`cell_size=32,`
			`read_heads=4,`
			`sparse_reads=4,`
			`batch_first=True,`
			`gpu_id=0`
			`)`

			`(controller_hidden, memory, read_vectors) = (None, None, None)`

			`output, (controller_hidden, memory, read_vectors) = \`
			`rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))`
			```


			`#### Debugging:`

			The `debug` option causes the network to return its memory hidden vectors (numpy `ndarray`s) for the first batch each forward step.
			`These vectors can be analyzed or visualized, using visdom for example.`

			```python
			`from dnc import SDNC`

			`rnn = SDNC(`
			`input_size=64,`
			`hidden_size=128,`
			`rnn_type='lstm',`
			`num_layers=4,`
			`nr_cells=100,`
			`cell_size=32,`
			`read_heads=4,`
			`batch_first=True,`
			`sparse_reads=4,`
Preliminary working temporal tracking 2017-12-11 19:46:47 +08:00			`temporal_reads=4,`
Update readme 2017-12-11 03:13:52 +08:00			`gpu_id=0,`
			`debug=True`
			`)`

			`(controller_hidden, memory, read_vectors) = (None, None, None)`

			`output, (controller_hidden, memory, read_vectors), debug_memory = \`
			`rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))`
			```

			Memory vectors returned by forward pass (`np.ndarray`):

			`\| Key \| Y axis (dimensions) \| X axis (dimensions) \|`
			`\| --- \| --- \| --- \|`
			\| `debug_memory['memory']` \| layer * time \| nr_cells * cell_size
Preliminary working temporal tracking 2017-12-11 19:46:47 +08:00			\| `debug_memory['visible_memory']` \| layer * time \| sparse_reads+2temporal_reads+1 nr_cells
			\| `debug_memory['read_positions']` \| layer * time \| sparse_reads+2*temporal_reads+1
			\| `debug_memory['link_matrix']` \| layer * time \| sparse_reads+2temporal_reads+1 sparse_reads+2*temporal_reads+1
			\| `debug_memory['rev_link_matrix']` \| layer * time \| sparse_reads+2temporal_reads+1 sparse_reads+2*temporal_reads+1
			\| `debug_memory['precedence']` \| layer * time \| nr_cells
Update readme 2017-12-11 03:13:52 +08:00			\| `debug_memory['read_weights']` \| layer * time \| read_heads * nr_cells
			\| `debug_memory['write_weights']` \| layer * time \| nr_cells
			\| `debug_memory['usage']` \| layer * time \| nr_cells

Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			`## Example copy task`

			`The copy task, as descibed in the original paper, is included in the repo.`

Add command line option for rnn_type and update readme 2017-10-27 19:17:19 +08:00			`From the project root:`
Bump version and update README 2017-10-28 02:44:12 +08:00			```bash
Update readme 2017-11-14 23:46:49 +08:00			`python ./tasks/copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 # (like original implementation)`
Update copy task defaults and readme 2017-11-12 18:26:33 +08:00
Update readme 2017-11-14 23:46:49 +08:00			`python3 ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 32 -batch_size 1000 -optim adam -sequence_max_length 8 # (faster convergence)`
Update readme 2017-12-11 03:13:52 +08:00
			`For SDNCs:`
			`python3 -B ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10 -read_heads 1 -sparse_reads 10 -batch_size 20 -optim adam -sequence_max_length 10`

			`and for curriculum learning for SDNCs:`
Preliminary working temporal tracking 2017-12-11 19:46:47 +08:00			`python3 -B ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10 -read_heads 1 -sparse_reads 4 -temporal_reads 4 -batch_size 20 -optim adam -sequence_max_length 4 -curriculum_increment 2 -curriculum_freq 10000`
Update copy task defaults and readme 2017-11-12 18:26:33 +08:00			```

			`For the full set of options, see:`
			```
			`python ./tasks/copy_task.py --help`
Bump version and update README 2017-10-28 02:44:12 +08:00			```

			`The copy task can be used to debug memory using [Visdom](https://github.com/facebookresearch/visdom).`

			`Additional step required:`

			```bash
			`pip install visdom`
			`python -m visdom.server`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			```
Bump version and update README 2017-10-28 02:44:12 +08:00
			`Open http://localhost:8097/ on your browser, and execute the copy task:`

			```bash
Add command line option for rnn_type and update readme 2017-10-27 19:17:19 +08:00			`python ./tasks/copy_task.py -cuda 0`
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			```

Bump version and update README 2017-10-28 02:44:12 +08:00			The visdom dashboard shows memory as a heatmap for batch 0 every `-summarize_freq` iteration:

Add new debug images 2017-11-01 17:49:39 +08:00			`![Visdom dashboard](./docs/dnc-mem-debug.png)`
Bump version and update README 2017-10-28 02:44:12 +08:00

Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00			`## General noteworthy stuff`

Add more learning rules to the copy task 2017-11-16 04:16:56 +08:00			`1. DNCs converge faster with Adam and RMSProp learning rules, SGD generally converges extremely slowly.`
			`The copy task, for example, takes 25k iterations on SGD with lr 1 compared to 3.5k for adam with lr 0.01.`
			2. `nan`s in the gradients are common, try with different batch sizes
Initial commit, pushed into pypi 2017-10-26 23:29:05 +08:00
Bump version and add more docs 2017-11-01 15:26:24 +08:00			`Repos referred to for creation of this repo:`
Format attribution links 2017-11-01 15:38:32 +08:00
			`- [deepmind/dnc](https://github.com/deepmind/dnc)`
			`- [ypxie/pytorch-NeuCom](https://github.com/ypxie/pytorch-NeuCom)`
			`- [jingweiz/pytorch-dnc](https://github.com/jingweiz/pytorch-dnc)`
Bump version and add more docs 2017-11-01 15:26:24 +08:00