Advanced Differentiable Neural Computer (ADNC) with application to bAbI task and CNN RC task.
Go to file
2018-07-05 00:46:41 +02:00
adnc add analyzer and plot functions 2018-07-05 00:46:41 +02:00
experiments/pre_trained add cnn model 2018-07-04 23:59:51 +02:00
images add images for README 2018-07-02 23:06:15 +02:00
scripts add inference scripts 2018-07-04 23:59:24 +02:00
test add bucket for preparing variables for plotting 2018-06-26 23:01:08 +02:00
.gitattributes update gitignore and add gitattributes 2018-06-25 13:19:54 +02:00
.gitignore update gitignore and add gitattributes 2018-06-25 13:19:54 +02:00
.travis.yml update requirements, add gpu requirements and add setup.py 2018-06-19 23:33:07 +02:00
LICENSE update README and add folder structure 2018-06-17 11:21:17 +02:00
README.md update README 2018-07-02 23:12:09 +02:00
requirements-gpu.txt update boilerplate 2018-06-20 23:55:28 +02:00
requirements.txt update boilerplate 2018-06-20 23:55:28 +02:00
setup.py update setup.py 2018-07-02 23:07:42 +02:00

Advanced Differentiable Neural Computer

Build Status Python TensorFLow

THIS REPOSITORY IS IN CONSTRUCTION, NOT EVERYTHING IS WORKING FINE YET

This repository contains a implementation of a Differentiable Neural Computer (DNC) with advancements for a more robust and scalable usage in Question Answering. It is applied to:

This repository is the groundwork for the MRQA 2018 paper submission "Robust and Scalable Differentiable Neural Computer for Question Answering". It contains a modular and fully configurable DNC with the following advancements:

drawing drawing drawing drawing
Bypass Dropout DNC Normalization Content Based Memory Unit Bidirectional Controller
  • Dropout to reduce the bypass connectivity
  • Forces an earlier memory usage during training
  • Normalizes the memory unit's input %like layer normalization
  • Increases the model stability during training
  • Memory Unit without temporal linkage mechanism
  • Reduces memory consumption by up to 70
  • Bidirectional DNC Architecture
  • Allows to handle variable requests and rich information extraction

The plot below shows the impact of the different advancements in the word error rate with the bAbI task 1.

diff_advancements

Furthermore, it contains a set of rich analysis tools to get a deeper insight in the functionality of the ADNC. For example that the advancements lead to a more meaningful gate usage of the memory cell.

DNC ADNC
process_dnc process_adnc

How to use:

Setup ADNC

To install ADNC and setup an virtual environment:

git clone https://github.com/joergfranke/ADNC.git
cd ADNC/
python3 -m venv venv
source venv/bin/activate
pip install -e .

Inference

For bAbI inference, choose pre-trained model (DNC, ADNC, BiADNC) in scripts/inference_babi_task.py and run:

python scripts/inference_babi_task.py

For CNN inference, choose pre-trained model (ADNC, BiADNC) in scripts/inference_cnn_task.py and run:

python scripts/inference_babi_task.py

Training

t.b.a.

Plots

t.b.a.

Repository Structure

t.b.a.

bAbI Results

Task DNC EntNet SDNC ADNC BiADNC BiADNC
+aug16
1: 1 supporting fact 9.0 ± 12.6 0.0 ± 0.1 0.0 ± 0.0 0.1 ± 0.0 0.1 ± 0.1 0.1 ± 0.0
2: 2 supporting facts 39.2 ± 20.5 15.3 ± 15.7 7.1 ± 14.6 0.8 ± 0.5 0.8 ± 0.2 0.5 ± 0.2
3: 3 supporting facts 39.6 ± 16.4 29.3 ± 26.3 9.4 ± 16.7 6.5 ± 4.6 2.4 ± 0.6 1.6 ± 0.8
4: 2 argument relations 0.4 ± 0.7 0.1 ± 0.1 0.1 ± 0.1 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
5: 3 argument relations 1.5 ± 1.0 0.4 ± 0.3 0.9 ± 0.3 1.0 ± 0.4 0.7 ± 0.1 0.8 ± 0.4
6: yes/no questions 6.9 ± 7.5 0.6 ± 0.8 0.1 ± 0.2 0.0 ± 0.1 0.0 ± 0.0 0.0 ± 0.0
7: counting 9.8 ± 7.0 1.8 ± 1.1 1.6 ± 0.9 1.0 ± 0.7 1.0 ± 0.5 1.0 ± 0.7
8: lists/sets 5.5 ± 5.9 1.5 ± 1.2 0.5 ± 0.4 0.2 ± 0.2 0.5 ± 0.3 0.6 ± 0.3
9: simple negation 7.7 ± 8.3 0.0 ± 0.1 0.0 ± 0.1 0.0 ± 0.0 0.1 ± 0.2 0.0 ± 0.0
10: indefinite knowledge 9.6 ± 11.4 0.1 ± 0.2 0.3 ± 0.2 0.1 ± 0.2 0.0 ± 0.0 0.0 ± 0.1
11: basic coreference 3.3 ± 5.7 0.2 ± 0.2 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
12: conjunction 5 ± 6.3 0.0 ± 0.0 0.2 ± 0.3 0.0 ± 0.0 0.0 ± 0.1 0.0 ± 0.0
13: compound coreference 3.1 ± 3.6 0.0 ± 0.1 0.1 ± 0.1 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0
14: time reasoning 11 ± 7.5 7.3 ± 4.5 5.6 ± 2.9 0.2 ± 0.1 0.8 ± 0.7 0.3 ± 0.1
15: basic deduction 27.2 ± 20.1 3.6 ± 8.1 3.6 ± 10.3 0.1 ± 0.1 0.1 ± 0.1 0.1 ± 0.1
16: basic induction 53.6 ± 1.9 53.3 ± 1.2 53.0 ± 1.3 52.1 ± 0.9 52.6 ± 1.6 0.0 ± 0.0
17: positional reasoning 32.4 ± 8 8.8 ± 3.8 12.4 ± 5.9 18.5 ± 8.8 4.8 ± 4.8 1.5 ± 1.8
18: size reasoning 4.2 ± 1.8 1.3 ± 0.9 1.6 ± 1.1 1.1 ± 0.5 0.4 ± 0.4 0.9 ± 0.5
19: path finding 64.6 ± 37.4 70.4 ± 6.1 30.8 ± 24.2 43.3 ± 36.7 0.0 ± 0.0 0.1 ± 0.1
20: agents motivation 0.0 ± 0.1 0.0 ± 0.0 0.0 ± 0.0 0.1 ± 0.1 0.1 ± 0.1 0.1 ± 0.1
Mean WER: 16.7 ± 7.6 9.7 ± 2.6 6.4 ± 2.5 6.3 ± 2.7 3.2 ± 0.5 0.4 ± 0.3
Failed Tasks (<5%): 11.2 ± 5.4 5.0 ± 1.2 4.1 ± 1.6 3.2 ± 0.8 1.4 ± 0.5 0.0 ± 0.0