init_v0.0

2018-11-17 12:30:56 +00:00 · 2018-11-17 12:30:56 +00:00 · 857859f2c4
commit 857859f2c4
parent 0634d25c6a
44 changed files with 14224 additions and 2 deletions
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2017 THUNLP
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -1,2 +1,240 @@
-# OpenANE
-Attributed Network Embedding
+# OpenNE: An open source toolkit for Network Embedding
+
+This repository provides a standard NE/NRL(Network Representation Learning）training and testing framework. In this framework, we unify the input and output interfaces of different NE models and provide scalable options for each model. Moreover, we implement typical NE models under this framework based on tensorflow, which enables these models to be trained with GPUs.
+
+We develop this toolkit according to the settings of DeepWalk. The implemented or modified models include [DeepWalk](https://github.com/phanein/deepwalk), [LINE](https://github.com/tangjianpku/LINE), [node2vec](https://github.com/aditya-grover/node2vec), [GraRep](https://github.com/ShelsonCao/GraRep), [TADW](https://github.com/thunlp/TADW) and [GCN](https://github.com/tkipf/gcn). We will implement more representative NE models continuously according to our released [NRL paper list](https://github.com/thunlp/nrlpapers). Specifically, we welcome other researchers to contribute NE models into this toolkit based on our framework. We will announce the contribution in this project.
+
+## Requirements
+
+-  numpy==1.13.1
+-  networkx==2.0
+-  scipy==0.19.1
+-  tensorflow==1.3.0
+-  gensim==3.0.1
+-  scikit-learn==0.19.0
+
+## Usage
+
+#### General Options
+
+You can check out the other options available to use with *OpenNE* using:
+
+    python src/main.py --help
+
+
+- --input, the input file of a network;
+- --graph-format, the format of input graph, adjlist or edgelist;
+- --output, the output file of representation (GCN doesn't need it);
+- --representation-size, the number of latent dimensions to learn for each node; the default is 128
+- --method, the NE model to learn, including deepwalk, line, node2vec, grarep, tadw and gcn;
+- --directed, treat the graph as directed; this is an action;
+- --weighted, treat the graph as weighted; this is an action;
+- --label-file, the file of node label; ignore this option if not testing;
+- --clf-ratio, the ratio of training data for node classification; the default is 0.5;
+- --epochs, the training epochs of LINE and GCN; the default is 5;
+
+#### Example
+
+To run "node2vec" on BlogCatalog network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:
+
+    python src/main.py --method node2vec --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt --q 0.25 --p 0.25
+
+To run "gcn" on Cora network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:
+
+    python src/main.py --method gcn --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features  --epochs 200 --output vec_all.txt --clf-ratio 0.1
+
+#### Specific Options
+
+DeepWalk and node2vec:
+
+- --number-walks, the number of random walks to start at each node; the default is 10;
+- --walk-length, the length of random walk started at each node; the default is 80;
+- --workers, the number of parallel processes; the default is 8;
+- --window-size, the window size of skip-gram model; the default is 10;
+- --q, only for node2vec; the default is 1.0;
+- --p, only for node2vec; the default is 1.0;
+
+LINE:
+
+- --negative-ratio, the default is 5;
+- --order, 1 for the 1st-order, 2 for the 2nd-order and 3 for 1st + 2nd; the default is 3;
+- --no-auto-save, no early save when training LINE; this is an action; when training LINE, we will calculate F1 scores every epoch. If current F1 is the best F1, the embeddings will be saved.
+
+GraRep:
+
+- --kstep, use k-step transition probability matrix（make sure representation-size%k-step == 0).
+
+TADW:
+
+- --lamb, lamb is a hyperparameter in TADW that controls the weight of regularization terms.
+
+GCN:
+
+- --feature-file, The file of node features;
+- --epochs, the training epochs of GCN; the default is 5;
+- --dropout, dropout rate;
+- --weight-decay, weight for l2-loss of embedding matrix;
+- --hidden, number of units in the first hidden layer.
+
+#### Input
+The supported input format is an edgelist or an adjlist:
+
+    edgelist: node1 node2 <weight_float, optional>
+    adjlist: node n1 n2 n3 ... nk
+The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags.
+
+If the model needs additional features, the supported feature input format is as follow (**feature_i** should be a float number):
+
+    node feature_1 feature_2 ... feature_n
+
+
+#### Output
+The output file has *n+1* lines for a graph with *n* nodes. 
+The first line has the following format:
+
+    num_of_nodes dim_of_representation
+
+The next *n* lines are as follows:
+    
+    node_id dim1 dim2 ... dimd
+
+where dim1, ... , dimd is the *d*-dimensional representation learned by *OpenNE*.
+
+#### Evaluation
+
+If you want to evaluate the learned node representations, you can input the node labels. It will use a portion (default: 50%) of nodes to train a classifier and calculate F1-score on the rest dataset.
+
+The supported input label format is
+
+    node label1 label2 label3...
+
+## Comparisons with other implementations
+
+Running environment:  <br />
+BlogCatalog: CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz. <br />
+Wiki, Cora: CPU: Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz. <br />
+
+We show the node classification results of various methods in different datasets. We set representation dimension to 128, **kstep=4** in GraRep. 
+
+Note that, both GCN(a semi-supervised NE model) and TADW need additional text features as inputs. Thus, we evaluate these two models on Cora in which each node has text information. We use 10% labeled data to train GCN.
+
+[BlogCatalog](http://leitang.net/social_dimension.html): 10312 nodes, 333983 edges, 39 labels,  undirected:
+
+- data/blogCatalog/bc_adjlist.txt
+- data/blogCatalog/bc_edgelist.txt
+- data/blogCatalog/bc_labels.txt
+
+|Algorithm | Time| Micro-F1 | Macro-F1|
+|:------------|-------------:|------------:|-------:|
+|[DeepWalk](https://github.com/phanein/deepwalk) | 271s | 0.385 | 0.238|
+|[LINE 1st+2nd](https://github.com/tangjianpku/LINE) | 2008s | 0.398 | 0.235|
+|[Node2vec](https://github.com/aditya-grover/node2vec) | 2623s  | 0.404| 0.264|
+|[GraRep](https://github.com/ShelsonCao/GraRep) | - | - | - |
+|OpenNE(DeepWalk) | 986s  | 0.394 | 0.249|
+|OpenNE(LINE 1st+2nd) | 1555s | 0.390 | 0.253|
+|OpenNE(node2vec) | 3501s  | 0.405 | 0.275|
+|OpenNE(GraRep) | 4178s | 0.393 | 0.230 |
+
+[Wiki](https://github.com/thunlp/MMDW/tree/master/data) (Wiki dataset is provided by [LBC project](http://www.cs.umd.edu/~sen/lbc-proj/LBC.html). But the original link failed.): 2405 nodes, 17981 edges, 19 labels, directed:
+
+- data/wiki/Wiki_edgelist.txt
+- data/wiki/Wiki_category.txt
+
+|Algorithm | Time| Micro-F1 | Macro-F1|
+|:------------|-------------:|------------:|-------:|
+|[DeepWalk](https://github.com/phanein/deepwalk) | 52s | 0.669 | 0.560|
+|[LINE 2nd](https://github.com/tangjianpku/LINE) | 70s | 0.576 | 0.387|
+|[node2vec](https://github.com/aditya-grover/node2vec) | 32s  | 0.651 | 0.541|
+|[GraRep](https://github.com/ShelsonCao/GraRep) | 19.6s | 0.633 | 0.476|
+|OpenNE(DeepWalk) | 42s  | 0.658 | 0.570|
+|OpenNE(LINE 2nd) | 90s | 0.661 | 0.521|
+|OpenNE(Node2vec) | 33s  | 0.655 | 0.538|
+|OpenNE(GraRep) | 23.7s | 0.649 | 0.507 |
+
+
+[Cora](https://linqs.soe.ucsc.edu/data): 2708 nodes, 5429 edges, 7 labels, directed:
+
+- data/cora/cora_edgelist.txt
+- data/cora/cora.features
+- data/cora/cora_labels.txt
+
+|Algorithm | Dropout | Weight_decay | Hidden | Dimension | Time| Accuracy |
+|:------------|-------------:|-------:|-------:|-------:|-------:|-------:|
+| [TADW](https://github.com/thunlp/TADW) | - | - | - | 80*2 | 13.9s | 0.780 |
+| [GCN](https://github.com/tkipf/gcn) | 0.5 | 5e-4 | 16 | - | 4.0s | 0.790 |
+| OpenNE(TADW) | - | - | - | 80*2 | 20.8s | 0.791 |
+| OpenNE(GCN) | 0.5 | 5e-4 | 16 | - | 5.5s | 0.789 |
+| OpenNE(GCN) | 0 | 5e-4 | 16 | - | 6.1s | 0.779 |
+| OpenNE(GCN) | 0.5 | 1e-4 | 16 | - | 5.4s | 0.783 |
+| OpenNE(GCN) | 0.5 | 5e-4 | 64 | - | 6.5s | 0.779 |
+
+
+## Citing
+
+If you find *OpenNE* is useful for your research, please consider citing the following papers:
+
+    @InProceedings{perozzi2014deepwalk,
+      Title                    = {Deepwalk: Online learning of social representations},
+      Author                   = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
+      Booktitle                = {Proceedings of KDD},
+      Year                     = {2014},
+      Pages                    = {701--710}
+    }
+
+    @InProceedings{tang2015line,
+      Title                    = {Line: Large-scale information network embedding},
+      Author                   = {Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu},
+      Booktitle                = {Proceedings of WWW},
+      Year                     = {2015},
+      Pages                    = {1067--1077}
+    }
+
+    @InProceedings{grover2016node2vec,
+      Title                    = {node2vec: Scalable feature learning for networks},
+      Author                   = {Grover, Aditya and Leskovec, Jure},
+      Booktitle                = {Proceedings of KDD},
+      Year                     = {2016},
+      Pages                    = {855--864}
+    }
+
+    @article{kipf2016semi,
+      Title                    = {Semi-Supervised Classification with Graph Convolutional Networks},
+      Author                   = {Kipf, Thomas N and Welling, Max},
+      journal                  = {arXiv preprint arXiv:1609.02907},
+      Year                     = {2016}
+    }
+
+    @InProceedings{cao2015grarep,
+      Title                    = {Grarep: Learning graph representations with global structural information},
+      Author                   = {Cao, Shaosheng and Lu, Wei and Xu, Qiongkai},
+      Booktitle                = {Proceedings of CIKM},
+      Year                     = {2015},
+      Pages                    = {891--900}
+    }
+
+    @InProceedings{yang2015network,
+      Title                    = {Network representation learning with rich text information},
+      Author                   = {Yang, Cheng and Liu, Zhiyuan and Zhao, Deli and Sun, Maosong and Chang, Edward},
+      Booktitle                = {Proceedings of IJCAI},
+      Year                     = {2015}
+    }
+
+    @Article{tu2017network,
+      Title                    = {Network representation learning: an overview},
+      Author                   = {TU, Cunchao and YANG, Cheng and LIU, Zhiyuan and SUN, Maosong},
+      Journal                  = {SCIENTIA SINICA Informationis},
+      Volume                   = {47},
+      Number                   = {8},
+      Pages                    = {980--996},
+      Year                     = {2017}
+    }
+
+## Sponsor
+
+This research is supported by Tencent, MSRA and NSFC.
+
+<img src="http://logonoid.com/images/tencent-logo.png" width = "300" height = "30" alt="tencent" align=center />
+
+<img src="http://net.pku.edu.cn/~xjl/images/msra.png" width = "200" height = "100" alt="MSRA" align=center />
+
+<img src="http://www.dragon-star.eu/wp-content/uploads/2014/04/NSFC_logo.jpg" width = "100" height = "80" alt="NSFC" align=center />
--- a/data/cora/cora_adjlist.txt
+++ b/data/cora/cora_adjlist.txt
--- a/data/cora/cora_attr.txt
+++ b/data/cora/cora_attr.txt
--- a/data/cora/cora_label.txt
+++ b/data/cora/cora_label.txt
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,38 @@
+setuptools==39.1.0  #tensorflow 1.10.0 has requirement setuptools<=39.1.0, but you'll have setuptools 39.2.0 which is incompatible
+absl-py==0.2.2
+astor==0.6.2
+backports.weakref==1.0.post1
+bleach==1.5.0
+decorator==4.3.0
+#enum34==1.1.6  # enum34 is not necessary for python > 3.4
+funcsigs==1.0.2
+gast==0.2.0
+grpcio==1.12.1
+html5lib==0.9999999
+Markdown==2.6.11
+mock==2.0.0
+numpy==1.14.5
+pbr==4.0.4
+protobuf==3.6.0
+scipy==1.1.0
+six==1.11.0
+#sklearn==0.0
+termcolor==1.1.0
+Werkzeug==0.14.1
+# we update to the latest version of the following packages @18 Oct 2018
+# the orignal one: https://github.com/williamleif/GraphSAGE
+#futures==3.2.0
+networkx==2.2
+tensorflow==1.10.0
+tensorboard==1.10.0
+gensim==3.0.1
+scikit-learn==0.19.0  #0.20.0 is OK but may get some warnings
+# if your want utilize your gpu for speeding up, try simply use the following conda command
+# tested in python==3.6.6
+
+# either -> conda install tensorflow-gpu==1.10.0  #this version will help you to install cuda and cudnn
+# for cuda driver compatibility: https://docs.nvidia.com/deploy/cuda-compatibility/index.html
+# e.g. if driver 384.xx -> conda install tensorflow-gpu=1.10.0 cudatoolkit=9.0
+
+# or     -> simply build from docker image: docker pull tensorflow/tensorflow:1.10.0-gpu-py3
+# ref: https://www.tensorflow.org/install/docker#gpu_support
--- a/src/libnrl/init.py
+++ b/src/libnrl/init.py
@ -0,0 +1,2 @@
+from __future__ import print_function
+from __future__ import division
--- a/src/libnrl/aane.py
+++ b/src/libnrl/aane.py
@ -0,0 +1,153 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+from scipy import sparse
+from scipy.sparse import csc_matrix
+from scipy.sparse.linalg import svds
+from math import ceil
+
+'''
+#-----------------------------------------------------------------------------
+# modified by Chengbin Hou 2018
+# part of code was originally forked from https://github.com/xhuang31/AANE_Python
+#-----------------------------------------------------------------------------
+'''
+
+class AANE:
+    """Jointly embed Net and Attri into embedding representation H
+    H = AANE(Net,Attri,d).function()
+    H = AANE(Net,Attri,d,lambd,rho).function()
+    H = AANE(Net,Attri,d,lambd,rho,maxiter).function()
+    H = AANE(Net,Attri,d,lambd,rho,maxiter,'Att').function()
+    H = AANE(Net,Attri,d,lambd,rho,maxiter,'Att',splitnum).function()
+    :param Net: the weighted adjacency matrix
+    :param Attri: the attribute information matrix with row denotes nodes
+    :param d: the dimension of the embedding representation
+    :param lambd: the regularization parameter
+    :param rho: the penalty parameter
+    :param maxiter: the maximum number of iteration
+    :param 'Att': refers to conduct Initialization from the SVD of Attri
+    :param splitnum: the number of pieces we split the SA for limited cache
+    :return: the embedding representation H
+    Copyright 2017 & 2018, Xiao Huang and Jundong Li.
+    $Revision: 1.0.2 $  $Date: 2018/02/19 00:00:00 $
+    """
+    def __init__(self, graph, dim=100, lambd=0.05, rho=5, mode='comb', *varargs):  #paper said lambd should not too large; suggest [0, 0.1]; lambd=0 -> attrpure
+        self.d = dim
+        self.look_back_list = graph.look_back_list #look back node id for A and X
+        if mode == 'comb':
+            print('==============AANE-comb mode: jointly learn emb from both structure and attribute info========')
+            Net = sparse.csr_matrix(graph.getA())
+            Attri = sparse.csr_matrix(graph.getX())
+        elif mode == 'pure':
+            print('======================AANE-pure mode: learn emb from structure info purely====================')
+            Net = graph.getA()
+            Attri = Net
+        else:
+            exit(0)
+        
+        self.maxiter = 2  # Max num of iteration
+        [self.n, m] = Attri.shape  # n = Total num of nodes, m = attribute category num
+        Net = sparse.lil_matrix(Net)
+        Net.setdiag(np.zeros(self.n))
+        Net = csc_matrix(Net)
+        Attri = csc_matrix(Attri)
+        self.lambd = 0.05  # Initial regularization parameter
+        self.rho = 5  # Initial penalty parameter
+        splitnum = 1  # number of pieces we split the SA for limited cache
+        if len(varargs) >= 4 and varargs[3] == 'Att':
+            sumcol = np.arange(m)
+            np.random.shuffle(sumcol)
+            self.H = svds(Attri[:, sumcol[0:min(10 * d, m)]], d)[0]
+        else:
+            sumcol = Net.sum(0)
+            self.H = svds(Net[:, sorted(range(self.n), key=lambda k: sumcol[0, k], reverse=True)[0:min(10 * self.d, self.n)]], self.d)[0]
+
+        if len(varargs) > 0:
+            self.lambd = varargs[0]
+            self.rho = varargs[1]
+            if len(varargs) >= 3:
+                self.maxiter = varargs[2]
+                if len(varargs) >= 5:
+                    splitnum = varargs[4]
+        self.block = min(int(ceil(float(self.n) / splitnum)), 7575)  # Treat at least each 7575 nodes as a block
+        self.splitnum = int(ceil(float(self.n) / self.block))
+        with np.errstate(divide='ignore'):  # inf will be ignored
+            self.Attri = Attri.transpose() * sparse.diags(np.ravel(np.power(Attri.power(2).sum(1), -0.5)))
+        self.Z = self.H.copy()
+        self.affi = -1  # Index for affinity matrix sa
+        self.U = np.zeros((self.n, self.d))
+        self.nexidx = np.split(Net.indices, Net.indptr[1:-1])
+        self.Net = np.split(Net.data, Net.indptr[1:-1])
+
+        self.vectors = {}
+        self.function()  #run aane
+
+
+    '''################# Update functions #################'''
+    def updateH(self):
+        xtx = np.dot(self.Z.transpose(), self.Z) * 2 + self.rho * np.eye(self.d)
+        for blocki in range(self.splitnum):  # Split nodes into different Blocks
+            indexblock = self.block * blocki  # Index for splitting blocks
+            if self.affi != blocki:
+                self.sa = self.Attri[:, range(indexblock, indexblock + min(self.n - indexblock, self.block))].transpose() * self.Attri
+                self.affi = blocki
+            sums = self.sa.dot(self.Z) * 2
+            for i in range(indexblock, indexblock + min(self.n - indexblock, self.block)):
+                neighbor = self.Z[self.nexidx[i], :]  # the set of adjacent nodes of node i
+                for j in range(1):
+                    normi_j = np.linalg.norm(neighbor - self.H[i, :], axis=1)  # norm of h_i^k-z_j^k
+                    nzidx = normi_j != 0  # Non-equal Index
+                    if np.any(nzidx):
+                        normi_j = (self.lambd * self.Net[i][nzidx]) / normi_j[nzidx]
+                        self.H[i, :] = np.linalg.solve(xtx + normi_j.sum() * np.eye(self.d), sums[i - indexblock, :] + (
+                                    neighbor[nzidx, :] * normi_j.reshape((-1, 1))).sum(0) + self.rho * (
+                                                                   self.Z[i, :] - self.U[i, :]))
+                    else:
+                        self.H[i, :] = np.linalg.solve(xtx, sums[i - indexblock, :] + self.rho * (
+                                    self.Z[i, :] - self.U[i, :]))
+    def updateZ(self):
+        xtx = np.dot(self.H.transpose(), self.H) * 2 + self.rho * np.eye(self.d)
+        for blocki in range(self.splitnum):  # Split nodes into different Blocks
+            indexblock = self.block * blocki  # Index for splitting blocks
+            if self.affi != blocki:
+                self.sa = self.Attri[:, range(indexblock, indexblock + min(self.n - indexblock, self.block))].transpose() * self.Attri
+                self.affi = blocki
+            sums = self.sa.dot(self.H) * 2
+            for i in range(indexblock, indexblock + min(self.n - indexblock, self.block)):
+                neighbor = self.H[self.nexidx[i], :]  # the set of adjacent nodes of node i
+                for j in range(1):
+                    normi_j = np.linalg.norm(neighbor - self.Z[i, :], axis=1)  # norm of h_i^k-z_j^k
+                    nzidx = normi_j != 0  # Non-equal Index
+                    if np.any(nzidx):
+                        normi_j = (self.lambd * self.Net[i][nzidx]) / normi_j[nzidx]
+                        self.Z[i, :] = np.linalg.solve(xtx + normi_j.sum() * np.eye(self.d), sums[i - indexblock, :] + (
+                                    neighbor[nzidx, :] * normi_j.reshape((-1, 1))).sum(0) + self.rho * (
+                                                                   self.H[i, :] + self.U[i, :]))
+                    else:
+                        self.Z[i, :] = np.linalg.solve(xtx, sums[i - indexblock, :] + self.rho * (
+                                    self.H[i, :] + self.U[i, :]))
+
+    def function(self):
+        self.updateH()
+        '''################# Iterations #################'''
+        for __ in range(self.maxiter - 1):
+            self.updateZ()
+            self.U = self.U + self.H - self.Z
+            self.updateH()
+        #-------save emb to self.vectors and return
+        ind = 0
+        for id in self.look_back_list:
+            self.vectors[id] = self.H[ind]
+            ind += 1
+        return self.vectors
+    
+    def save_embeddings(self, filename):
+        '''
+        save embeddings to file
+        '''
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,' '.join([str(x) for x in vec])))
+        fout.close()
--- a/src/libnrl/abrw.py
+++ b/src/libnrl/abrw.py
@ -0,0 +1,131 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+import time
+from numpy import linalg as la
+import warnings
+warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
+import gensim
+from gensim.models import Word2Vec
+from . import walker
+import networkx as nx
+from libnrl.utils import *
+import multiprocessing
+
+'''
+#-----------------------------------------------------------------------------
+# author: Chengbin Hou @ SUSTech 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+def multiprocessor_argpartition(vec):
+    topk = 20
+    print('len of vec...',len(vec))
+    return np.argpartition(vec, -topk)[-topk:]
+
+
+class ABRW(object):
+
+    def __init__(self, graph, dim, alpha, topk, path_length, num_paths, **kwargs):
+        self.g = graph
+        self.alpha = float(alpha)
+        self.topk = int(topk)
+        kwargs["workers"] = kwargs.get("workers", 1)
+
+        self.P = self.biasedTransProb() #obtain biased transition probs mat
+        weighted_walker = walker.BiasedWalker(g=self.g, P=self.P, workers=kwargs["workers"]) #instance weighted walker
+        #generate sentences according to biased transition probs mat P
+        sentences = weighted_walker.simulate_walks(num_walks=num_paths, walk_length=path_length)
+        
+        #skip-gram parameters
+        kwargs["sentences"] = sentences
+        kwargs["min_count"] = kwargs.get("min_count", 0)
+        kwargs["size"] = kwargs.get("size", dim)
+        kwargs["sg"] = 1  #use skip-gram; but see deepwalk which uses 'hs' = 1
+        self.size = kwargs["size"]
+        #learning embedding by skip-gram model
+        print("Learning representation...")
+        word2vec = Word2Vec(**kwargs)
+        #save emb for later eval
+        self.vectors = {}
+        for word in self.g.G.nodes():
+            self.vectors[word] = word2vec.wv[word] #save emb
+        del word2vec
+
+#----------------------------------------key of our method---------------------------------------------
+    def biasedTransProb(self):
+        '''
+        given: A and X --> P_A and P_X
+        research question: how to combine A and X in a more principled way
+        genral idea: Attribute Biased Random Walk
+        i.e. a walker based on a mixed transition matrix by P=alpha*P_A + (1-alpha)*P_X
+        result: ABRW-trainsition matrix; P
+        *** questions: 1) what about if we have some single nodes i.e. some rows of P_A gives 0s
+                       2) the similarity/distance metric to obtain P_X
+                       3) alias sampling as used in node2vec for speeding up, but this is the case 
+                            if each row of P gives many 0s 
+                            --> how to make each row of P is a pdf and meanwhile is sparse
+        '''
+
+        print("obtaining biased transition probs mat...")
+        t1 = time.time()
+
+        A = self.g.get_adj_mat()   #adj/struc info mat
+        P_A = row_as_probdist(A)  #if single node, return [0, 0, 0 ..] we will fix this later
+
+        X = self.g.get_attr_mat()   #attr info mat
+        X_compressed = X    #if need speed up, try to use svd or pca for compression, but will loss some acc
+        #X_compressed = self.g.preprocessAttrInfo(X=X, dim=200, method='pca')  #svd or pca for dim reduction; follow TADW setting use svd with dim=200
+        from sklearn.metrics.pairwise import linear_kernel, cosine_similarity, cosine_distances, euclidean_distances  # we may try diff metrics
+        #ref http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise
+        #t1=time.time()
+        X_sim = cosine_similarity(X_compressed, X_compressed)
+        #t2=time.time()
+        #print('======no need pre proce', t2-t1)
+
+        
+        #way5: a faster implementation of way5 by Zeyu Dong
+        topk = self.topk
+        print('way5 remain self---------topk = ', topk)
+        t1 = time.time()
+        cutoff = np.partition(X_sim, -topk, axis=1)[:,-topk:].min(axis=1)
+        X_sim[(X_sim < cutoff)] = 0
+        t2 = time.time()
+
+
+        P_X = row_as_probdist(X_sim)
+        t3 = time.time()
+        for i in range(P_X.shape[0]):
+            sum_row = P_X[i].sum()
+            if sum_row != 1.0:          #to avoid some numerical issue...
+                delta = 1.0 - sum_row   #delta is very very samll number say 1e-10 or even less...
+                P_X[i][i] = P_X[i][i] + delta  #the diagnoal must be largest of the that row + delta --> almost no effect
+        t4 = time.time()
+        print('topk time: ',t2-t1 ,'row normlize time: ',t3-t2, 'dealing numerical issue time: ', t4-t3)
+        del A, X, X_compressed, X_sim
+        
+        #=====================================core of our idea========================================
+        print('------alpha for P = alpha * P_A + (1-alpha) * P_X----: ', self.alpha)
+        n = self.g.get_num_nodes()
+        P = np.zeros((n,n), dtype=float)
+        for i in range(n):
+            if (P_A[i] == 0).all():  #single node case if the whole row are 0s
+            #if P_A[i].sum() == 0:
+                P[i] = P_X[i]        #use 100% attr info to compensate 
+            else:                    #non-single node case; use (1.0-self.alpha) attr info to compensate
+                P[i] = self.alpha * P_A[i] + (1.0-self.alpha) * P_X[i]
+        print('# of single nodes for P_A: ', n - P_A.sum(axis=1).sum(), ' # of non-zero entries of P_A: ', np.count_nonzero(P_A))
+        print('# of single nodes for P_X: ', n - P_X.sum(axis=1).sum(), ' # of non-zero entries of P_X: ', np.count_nonzero(P_X))
+        t5 = time.time()
+        print('ABRW biased transition prob preprocessing time: {:.2f}s'.format(t5-t4))
+        return P
+
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.size))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,
+                                        ' '.join([str(x) for x in vec])))
+        fout.close()
--- a/src/libnrl/asne.py
+++ b/src/libnrl/asne.py
@ -0,0 +1,244 @@
+# -*- coding: utf-8 -*-
+'''
+Tensorflow implementation of Social Network Embedding framework (SNE)
+@author: Lizi Liao (liaolizi.llz@gmail.com)
+part of code was originally forked from https://github.com/lizi-git/ASNE
+
+modified by Chengbin Hou 2018
+1) convert OpenANE data format to ASNE data format
+2) compatible with latest tensorflow 1.2
+3) add more comments
+4) support eval testing set during each xx epoches
+5) as ASNE paper stated, we add two hidden layers with softsign activation func
+'''
+
+import math
+import numpy as np
+import tensorflow as tf
+from sklearn.base import BaseEstimator, TransformerMixin
+from .classify import ncClassifier, lpClassifier, read_node_label
+from sklearn.linear_model import LogisticRegression
+
+def format_data_from_OpenANE_to_ASNE(g, dim):
+    '''
+    convert OpenANE data format to ASNE data format
+    g: OpenANE graph data structure
+    dim: final embedding dim
+    '''
+    attr_Matrix = g.getX()
+    #attr_Matrix = g.preprocessAttrInfo(attr_Matrix, dim=200, method='svd') #similar to aane, the same preprocessing
+    #print('with this preprocessing, ASNE can get better result, as well as, faster speed----------------')
+    id_N = attr_Matrix.shape[0]    #n nodes
+    attr_M = attr_Matrix.shape[1]  #m features
+
+    edge_num = len(g.G.edges)                           #total edges for traning
+    X={}                                                #one-to-one correspondence
+    X['data_id_list'] = np.zeros(edge_num)              #start node list for traning
+    X['data_label_list'] = np.zeros(edge_num)           #end node list for training
+    X['data_attr_list'] = np.zeros([edge_num, attr_M])  #attr corresponds to start node
+    edgelist = [edge for edge in g.G.edges]
+    i = 0
+    for edge in edgelist:      #traning sample = start node, end node, start node attr
+        X['data_id_list'][i] = edge[0]
+        X['data_label_list'][i] = edge[1]
+        X['data_attr_list'][i] = attr_Matrix[ g.look_up_dict[edge[0]] ][:]
+        i += 1
+    X['data_id_list'] = X['data_id_list'].reshape(-1).astype(int)
+    X['data_label_list'] = X['data_label_list'].reshape(-1,1).astype(int)
+
+    nodes={}                                 #one-to-one correspondence
+    nodes['node_id'] = g.look_back_list      #n nodes
+    nodes['node_attr'] = list(attr_Matrix)   #m features -> n*m
+
+    id_embedding_size = int(dim/2)
+    attr_embedding_size = int(dim/2)
+    print('id_embedding_size', id_embedding_size, 'attr_embedding_size', attr_embedding_size)
+    return X, nodes, id_N, attr_M, id_embedding_size, attr_embedding_size
+
+
+def add_layer(inputs, in_size, out_size, activation_function=None):
+   # add one more layer and return the output of this layer
+   Weights = tf.Variable(tf.random_uniform([in_size, out_size], -1.0, 1.0)) #init as paper stated
+   biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
+   Wx_plus_b = tf.matmul(inputs, Weights) + biases
+   if activation_function is None:
+       outputs = Wx_plus_b
+   else:
+       outputs = activation_function(Wx_plus_b)
+   return outputs
+
+
+class ASNE(BaseEstimator, TransformerMixin):
+    def __init__(self, graph, dim, alpha = 1.0, batch_size=128, learning_rate=0.001, 
+                  n_neg_samples=10, epoch=100, random_seed=2018, X_test=0, Y_test=0, task='nc', nc_ratio=0.5, lp_ratio=0.9, label_file=''):
+        # bind params to class
+        X, nodes, id_N, attr_M, id_embedding_size, attr_embedding_size = format_data_from_OpenANE_to_ASNE(g=graph, dim=dim)
+        self.node_N = id_N      #n
+        self.attr_M = attr_M    #m
+        self.X_train = X        #{'data_id_list': [], 'data_label_list': [], 'data_attr_list': []}
+        self.nodes = nodes      #{'node_id': [], 'node_attr: []'}
+        self.id_embedding_size = id_embedding_size      # set to dim/2
+        self.attr_embedding_size = attr_embedding_size  # set to dim/2
+        self.vectors = {}
+        self.dim = dim
+        self.look_back_list = graph.look_back_list  #from OpenANE data stcuture
+        
+        self.alpha = alpha                   #set to 1.0 by default
+        self.n_neg_samples = n_neg_samples   #set to 10 by default
+        self.batch_size = batch_size         #set to 128 by default
+        self.learning_rate = learning_rate
+        self.epoch = epoch                   #set to 20 by default
+        self.random_seed = random_seed   
+        self._init_graph()   #init all variables in a tensorflow graph
+
+        self.task = task
+        self.nc_ratio = nc_ratio
+        self.lp_ratio = lp_ratio
+        if self.task == 'lp':         #if not lp task, we do not need to keep testing edges
+            self.X_test = X_test
+            self.Y_test = Y_test
+            self.train()         #train our tf asne model-----------------
+        elif self.task == 'nc' or self.task == 'nclp':
+            self.X_nc_label, self.Y_nc_label = read_node_label(label_file)
+            self.train()         #train our tf asne model-----------------
+
+    def _init_graph(self):
+        '''
+        Init a tensorflow Graph containing: input data, variables, model, loss, optimizer
+        '''
+        self.graph = tf.Graph()
+        #with self.graph.as_default(), tf.device('/gpu:0'):
+        with self.graph.as_default():
+            # Set graph level random seed
+            tf.set_random_seed(self.random_seed)
+            # Input data.
+            self.train_data_id = tf.placeholder(tf.int32, shape=[None])                   # batch_size * 1
+            self.train_data_attr = tf.placeholder(tf.float32, shape=[None, self.attr_M])  # batch_size * attr_M
+            self.train_labels = tf.placeholder(tf.int32, shape=[None, 1])                 # batch_size * 1
+
+            # Variables.
+            network_weights = self._initialize_weights()
+            self.weights = network_weights
+            
+            # Model.
+            # Look up embeddings for node_id.
+            self.id_embed =  tf.nn.embedding_lookup(self.weights['in_embeddings'], self.train_data_id) # batch_size * id_dim
+            self.attr_embed =  tf.matmul(self.train_data_attr, self.weights['attr_embeddings'])        # batch_size * attr_dim
+            self.embed_layer = tf.concat([self.id_embed, self.alpha * self.attr_embed], 1)             # batch_size * (id_dim + attr_dim) #an error due to old tf!
+
+            
+            ## can add hidden_layers component here!
+            #0) no hidden layer
+            #1) 128
+            #2) 256+128  ##--------paper stated it used two hidden layers with activation function softsign....
+            #3) 512+256+128
+            len_h1_in = self.id_embedding_size+self.attr_embedding_size
+            len_h1_out = 256
+            len_h2_in = len_h1_out
+            len_h2_out = 128
+            self.h1 = add_layer(inputs=self.embed_layer, in_size=len_h1_in, out_size=len_h1_out, activation_function=tf.nn.softsign)
+            self.h2 = add_layer(inputs=self.h1, in_size=len_h2_in, out_size=len_h2_out, activation_function=tf.nn.softsign)
+            
+
+            # Compute the loss, using a sample of the negative labels each time.
+            self.loss =  tf.reduce_mean(tf.nn.sampled_softmax_loss(weights = self.weights['out_embeddings'], biases = self.weights['biases'], 
+                              inputs = self.h2, labels = self.train_labels, num_sampled = self.n_neg_samples, num_classes=self.node_N))
+            # Optimizer.
+            self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-8).minimize(self.loss)  #tune these parameters?
+            # print("AdamOptimizer")
+
+            # init
+            init = tf.initialize_all_variables()
+            self.sess = tf.Session(config=tf.ConfigProto(log_device_placement=False))
+            self.sess.run(init)
+
+    def _initialize_weights(self):
+        all_weights = dict()
+        all_weights['in_embeddings'] = tf.Variable(tf.random_uniform([self.node_N, self.id_embedding_size], -1.0, 1.0))    # id_N * id_dim
+        all_weights['attr_embeddings'] = tf.Variable(tf.random_uniform([self.attr_M,self.attr_embedding_size], -1.0, 1.0)) # attr_M * attr_dim
+        all_weights['out_embeddings'] = tf.Variable(tf.truncated_normal([self.node_N, self.id_embedding_size + self.attr_embedding_size],
+                                    stddev=1.0 / math.sqrt(self.id_embedding_size + self.attr_embedding_size)))
+        all_weights['biases'] = tf.Variable(tf.zeros([self.node_N]))
+        return all_weights
+
+    def partial_fit(self, X): # fit a batch
+        feed_dict = {self.train_data_id: X['batch_data_id'], self.train_data_attr: X['batch_data_attr'],
+                     self.train_labels: X['batch_data_label']}
+        loss, opt = self.sess.run((self.loss, self.optimizer), feed_dict=feed_dict)
+        return loss
+
+    def get_random_block_from_data(self, data, batch_size):  #useless for a moment...
+        start_index = np.random.randint(0, len(data) - batch_size)
+        return data[start_index:(start_index + batch_size)]
+
+    def train(self): # fit a dataset
+        self.Embeddings = []
+        print('Using in + out embedding')
+
+        for epoch in range( self.epoch ):
+            total_batch = int( len(self.X_train['data_id_list']) / self.batch_size) #total_batch*batch_size = numOFlinks??
+            # print('total_batch in 1 epoch: ', total_batch)
+            # Loop over all batches
+            for i in range(total_batch):
+                # generate a batch data
+                batch_xs = {}
+                start_index = np.random.randint(0, len(self.X_train['data_id_list']) - self.batch_size)  
+                batch_xs['batch_data_id'] = self.X_train['data_id_list'][start_index:(start_index + self.batch_size)] #generate batch data
+                batch_xs['batch_data_attr'] = self.X_train['data_attr_list'][start_index:(start_index + self.batch_size)]
+                batch_xs['batch_data_label'] = self.X_train['data_label_list'][start_index:(start_index + self.batch_size)]
+
+                # Fit training using batch data
+                cost = self.partial_fit(batch_xs)
+            
+            # Display logs per epoch
+            Embeddings_out = self.getEmbedding('out_embedding', self.nodes)
+            Embeddings_in = self.getEmbedding('embed_layer', self.nodes)
+            self.Embeddings = Embeddings_out + Embeddings_in  #simply mean them and as final embedding; try concat? to do...
+            #print('training tensorflow asne model, epoc: ', epoch+1 , ' / ', self.epoch)
+            #to save training time, we delete eval testing data @ each epoch
+        
+            #-----------for each xx epoches; save embeddings {node_id1: [], node_id2: [], ...}----------
+            if (epoch+1)%1 == 0 and epoch != 0:   #for every xx epoches, try eval
+                print('@@@ epoch ------- ', epoch+1 , ' / ', self.epoch)
+                ind = 0
+                for id in self.nodes['node_id']:   #self.nodes['node_id']=self.look_back_list
+                    self.vectors[id] = self.Embeddings[ind]
+                    ind += 1
+                #self.eval(vectors=self.vectors)
+        print('please note that: the fianl embedding returned and its output file are not the best embedding!')
+        print('for the best embeddings, please check which epoch got the best eval metric(s)......')
+
+
+    def getEmbedding(self, type, nodes):
+        if type == 'embed_layer':
+            feed_dict = {self.train_data_id: nodes['node_id'], self.train_data_attr: nodes['node_attr']}
+            Embedding = self.sess.run(self.embed_layer, feed_dict=feed_dict)
+            return Embedding
+        if type == 'out_embedding':
+            Embedding = self.sess.run(self.weights['out_embeddings']) #sess.run to get embeddings from tf
+            return Embedding  # nodes_number * (id_dim + attr_dim)
+    
+    def save_embeddings(self, filename):
+        '''
+        save embeddings to file
+        '''
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,' '.join([str(x) for x in vec])))
+        fout.close()
+
+    def eval(self, vectors):
+        #------nc task
+        if self.task == 'nc' or self.task == 'nclp':
+            print("Training nc classifier using {:.2f}% node labels...".format(self.nc_ratio*100))
+            clf = ncClassifier(vectors=vectors, clf=LogisticRegression())   #use Logistic Regression as clf; we may choose SVM or more advanced ones
+            clf.split_train_evaluate(self.X_nc_label, self.Y_nc_label, self.nc_ratio)
+        #------lp task
+        if self.task == 'lp':
+            #X_test, Y_test = read_edge_label(args.label_file)  #enable this if you want to load your own lp testing data, see classfiy.py
+            print("During embedding we have used {:.2f}% links and the remaining will be left for lp evaluation...".format(self.lp_ratio*100))
+            clf = lpClassifier(vectors=vectors)     #similarity/distance metric as clf; basically, lp is a binary clf probelm
+            clf.evaluate(self.X_test, self.Y_test)
+
--- a/src/libnrl/attrcomb.py
+++ b/src/libnrl/attrcomb.py
@ -0,0 +1,96 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+import time
+import networkx as nx
+from . import node2vec, line, grarep
+
+'''
+#-----------------------------------------------------------------------------
+# author: Chengbin Hou 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+class ATTRCOMB(object):
+
+    def __init__(self, graph, dim, comb_method='concat', num_paths=10, comb_with='deepWalk'):
+        self.g = graph
+        self.dim = dim
+        self.num_paths = num_paths
+        
+        print("Learning representation...")
+        self.vectors = {}
+
+        print('attr naively combined method ', comb_method, '=====================')
+        if comb_method == 'concat':
+            print('comb_method == concat by default; dim/2 from attr and dim/2 from nrl.............')
+            attr_embeddings = self.train_attr(dim=int(self.dim/2))
+            nrl_embeddings = self.train_nrl(dim=int(self.dim/2), comb_with='deepWalk')
+            embeddings = np.concatenate((attr_embeddings, nrl_embeddings), axis=1)
+            print('shape of embeddings', embeddings.shape)
+        
+        elif comb_method == 'elementwise-mean':
+            print('comb_method == elementwise-mean.............')
+            attr_embeddings = self.train_attr(dim=self.dim)
+            nrl_embeddings = self.train_nrl(dim=self.dim, comb_with='deepWalk') #we may try deepWalk, node2vec, line and etc...
+            embeddings = np.add(attr_embeddings, nrl_embeddings)/2.0
+            print('shape of embeddings', embeddings.shape)
+
+        elif comb_method == 'elementwise-max':
+            print('comb_method == elementwise-max.............')
+            attr_embeddings = self.train_attr(dim=self.dim)
+            nrl_embeddings = self.train_nrl(dim=self.dim, comb_with='deepWalk') #we may try deepWalk, node2vec, line and etc...
+            embeddings = np.zeros(shape=(attr_embeddings.shape[0],attr_embeddings.shape[1]))
+            for i in range(attr_embeddings.shape[0]):       #size(attr_embeddings) = size(nrl_embeddings)
+                for j in range(attr_embeddings.shape[1]):
+                    if attr_embeddings[i][j] > nrl_embeddings[i][j]:
+                        embeddings[i][j] = attr_embeddings[i][j]
+                    else:
+                        embeddings[i][j] = nrl_embeddings[i][j]
+            print('shape of embeddings', embeddings.shape)
+
+        else:
+            print('error, no comb_method was found....')
+            exit(0)
+
+        for key, ind in self.g.look_up_dict.items():
+            self.vectors[key] = embeddings[ind]
+
+
+    def train_attr(self, dim):
+        X = self.g.getX()
+        X_compressed = self.g.preprocessAttrInfo(X=X, dim=dim, method='svd')  #svd or pca for dim reduction
+        print('X_compressed shape: ', X_compressed.shape)
+        return np.array(X_compressed)    #n*dim matrix, each row corresponding to node ID stored in graph.look_back_list
+
+
+    def train_nrl(self, dim, comb_with):
+        print('attr naively combined with ', comb_with, '=====================')
+        if comb_with == 'deepWalk':
+            model = node2vec.Node2vec(graph=self.g, path_length=80, num_paths=self.num_paths, dim=dim, workers=4, window=10, dw=True)
+            nrl_embeddings = []
+            for key in self.g.look_back_list:
+                nrl_embeddings.append(model.vectors[key])
+            return np.array(nrl_embeddings)
+
+        elif args.method == 'node2vec':
+            model = node2vec.Node2vec(graph=self.g, path_length=80, num_paths=self.num_paths, dim=dim, workers=4, p=0.8, q=0.8, window=10)
+            nrl_embeddings = []
+            for key in self.g.look_back_list:
+                nrl_embeddings.append(model.vectors[key])
+            return np.array(nrl_embeddings)
+
+        else:
+            print('error, no comb_with was found....')
+            print('to do.... line, grarep, and etc...')
+            exit(0)
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,
+                                        ' '.join([str(x) for x in vec])))
+        fout.close()     
+
--- a/src/libnrl/attrpure.py
+++ b/src/libnrl/attrpure.py
@ -0,0 +1,38 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+import time
+import networkx as nx
+
+'''
+#-----------------------------------------------------------------------------
+# author: Chengbin Hou 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+class ATTRPURE(object):
+
+    def __init__(self, graph, dim):
+        self.g = graph
+        self.dim = dim
+        
+        print("Learning representation...")
+        self.vectors = {}
+        embeddings = self.train()
+        for key, ind in self.g.look_up_dict.items():
+            self.vectors[key] = embeddings[ind]
+
+    def train(self):
+        X = self.g.getX()
+        X_compressed = self.g.preprocessAttrInfo(X=X, dim=self.dim, method='svd')  #svd or pca for dim reduction
+        return X_compressed    #n*dim matrix, each row corresponding to node ID stored in graph.look_back_list
+
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,
+                                        ' '.join([str(x) for x in vec])))
+        fout.close()     
--- a/src/libnrl/classify.py
+++ b/src/libnrl/classify.py
@ -0,0 +1,235 @@
+# -*- coding: utf-8 -*-
+from __future__ import print_function
+import numpy as np
+import math
+import random
+import networkx as nx
+import warnings
+warnings.filterwarnings(action='ignore', category=UserWarning, module='sklearn')
+from sklearn.multiclass import OneVsRestClassifier
+from sklearn.metrics import f1_score, accuracy_score, roc_auc_score, classification_report, roc_curve, auc
+from sklearn.preprocessing import MultiLabelBinarizer
+
+'''
+#-----------------------------------------------------------------------------
+# part of code was originally forked from https://github.com/thunlp/OpenNE
+
+# modified by Chengbin Hou 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+# node classification classifier
+class ncClassifier(object):
+
+    def __init__(self, vectors, clf):
+        self.embeddings = vectors
+        self.clf = TopKRanker(clf)  #here clf is LR
+        self.binarizer = MultiLabelBinarizer(sparse_output=True)
+
+    def split_train_evaluate(self, X, Y, train_precent, seed=0):
+        state = np.random.get_state()
+        training_size = int(train_precent * len(X))
+        #np.random.seed(seed) 
+        shuffle_indices = np.random.permutation(np.arange(len(X)))
+        X_train = [X[shuffle_indices[i]] for i in range(training_size)]
+        Y_train = [Y[shuffle_indices[i]] for i in range(training_size)]
+        X_test = [X[shuffle_indices[i]] for i in range(training_size, len(X))]
+        Y_test = [Y[shuffle_indices[i]] for i in range(training_size, len(X))]
+
+        self.train(X_train, Y_train, Y)
+        np.random.set_state(state)  #why??? for binarizer.transform?? 
+        return self.evaluate(X_test, Y_test)
+
+    def train(self, X, Y, Y_all):
+        self.binarizer.fit(Y_all)  #to support multi-labels, fit means dict mapping {orig cat: binarized vec}
+        X_train = [self.embeddings[x] for x in X]
+        Y = self.binarizer.transform(Y)  #since we have use Y_all fitted, then we simply transform
+        self.clf.fit(X_train, Y)
+
+    def predict(self, X, top_k_list):
+        X_ = np.asarray([self.embeddings[x] for x in X])
+        # see TopKRanker(OneVsRestClassifier)
+        Y = self.clf.predict(X_, top_k_list=top_k_list)  # the top k probs to be output...
+        return Y
+
+    def evaluate(self, X, Y):
+        top_k_list = [len(l) for l in Y]  #multi-labels, diff len of labels of each node
+        Y_ = self.predict(X, top_k_list)  #pred val of X_test i.e. Y_pred
+        Y = self.binarizer.transform(Y)   #true val i.e. Y_test
+        averages = ["micro", "macro", "samples", "weighted"]
+        results = {}
+        for average in averages:
+            results[average] = f1_score(Y, Y_, average=average)
+        # print('Results, using embeddings of dimensionality', len(self.embeddings[X[0]]))
+        print(results)
+        return results
+
+class TopKRanker(OneVsRestClassifier):  #orignal LR or SVM is for binary clf
+    def predict(self, X, top_k_list):   #re-define predict func of OneVsRestClassifier
+        probs = np.asarray(super(TopKRanker, self).predict_proba(X))
+        all_labels = []
+        for i, k in enumerate(top_k_list):
+            probs_ = probs[i, :]
+            labels = self.classes_[probs_.argsort()[-k:]].tolist() #denote labels
+            probs_[:] = 0      #reset probs_ to all 0
+            probs_[labels] = 1 #reset probs_ to 1 if labels denoted...
+            all_labels.append(probs_)
+        return np.asarray(all_labels)
+
+'''
+#note: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true labels
+#see: https://stackoverflow.com/questions/43162506/undefinedmetricwarning-f-score-is-ill-defined-and-being-set-to-0-0-in-labels-wi
+'''
+
+'''
+import matplotlib.pyplot as plt
+def plt_roc(y_test, y_score):
+    """
+    calculate AUC value and plot the ROC curve
+    """
+    fpr, tpr, threshold = roc_curve(y_test, y_score)
+    roc_auc = auc(fpr, tpr)
+    plt.figure()
+    plt.stackplot(fpr, tpr, color='steelblue', alpha = 0.5, edgecolor = 'black')
+    plt.plot(fpr, tpr, color='black', lw = 1)
+    plt.plot([0,1],[0,1], color = 'red', linestyle = '--')
+    plt.text(0.5,0.3,'ROC curve (area = %0.3f)' % roc_auc)
+    plt.xlabel('False Positive Rate')
+    plt.ylabel('True Positive Rate')
+    plt.show()
+    return roc_auc
+'''
+
+# link prediction binary classifier
+class lpClassifier(object):
+
+    def __init__(self, vectors):
+        self.embeddings = vectors
+
+    def evaluate(self, X_test, Y_test, seed=0):  #clf here is simply a similarity/distance metric
+        state = np.random.get_state()
+        #np.random.seed(seed)
+        test_size = len(X_test)
+        #shuffle_indices = np.random.permutation(np.arange(test_size))
+        #X_test = [X_test[shuffle_indices[i]] for i in range(test_size)]
+        #Y_test = [Y_test[shuffle_indices[i]] for i in range(test_size)]
+
+        Y_true = [int(i) for i in Y_test]
+        Y_probs = []
+        for i in range(test_size):
+            start_node_emb = np.array(self.embeddings[X_test[i][0]]).reshape(-1,1)
+            end_node_emb = np.array(self.embeddings[X_test[i][1]]).reshape(-1,1)
+            score = cosine_similarity(start_node_emb, end_node_emb) #ranging from [-1, +1]
+            Y_probs.append( (score+1)/2.0 )     #switch to prob... however, we may also directly y_score = score 
+                                                #in sklearn roc... which yields the same reasult
+        roc = roc_auc_score(y_true = Y_true, y_score = Y_probs)
+        if roc < 0.5:
+            roc = 1.0 - roc    #since lp is binary clf task, just predict the opposite if<0.5
+        print("roc=", "{:.9f}".format(roc))
+        #plt_roc(Y_true, Y_probs) #enable to plot roc curve and return auc value
+
+def norm(a):
+    sum = 0.0
+    for i in range(len(a)):
+        sum = sum + a[i] * a[i]
+    return math.sqrt(sum)
+
+def cosine_similarity(a, b):
+    sum = 0.0
+    for i in range(len(a)):
+        sum = sum + a[i] * b[i]
+    #return sum/(norm(a) * norm(b))
+    return sum/(norm(a) * norm(b) + 1e-20)  #fix numerical issue 1e-20 almost = 0!
+
+'''
+#cosine_similarity realized by use...
+#or try sklearn....
+        from sklearn.metrics.pairwise import linear_kernel, cosine_similarity, cosine_distances, euclidean_distances  # we may try diff metrics
+        #ref http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise
+'''
+
+def lp_train_test_split(graph, ratio=0.5, neg_pos_link_ratio=1.0, test_pos_links_ratio=0.1):
+    #randomly split links/edges into training set and testing set
+    #*** note: we do not assume every node must be connected after removing links
+    #*** hence, the resulting graph might have few single nodes --> more realistic scenario
+    #*** e.g. a user just sign in a website has no link to others
+    
+    #graph: OpenANE graph data strcture
+    #ratio: perc of links for training; ranging [0, 1]
+    #neg_pos_link_ratio: 1.0 means neg-links/pos-links = 1.0 i.e. balance case; raning [0, +inf)
+    g = graph
+    test_pos_links = int(nx.number_of_edges(g.G) * test_pos_links_ratio)
+
+    print("test_pos_links_ratio {:.2f}, test_pos_links {:.2f}, neg_pos_link_ratio is {:.2f}, links for training {:.2f}%,".format(test_pos_links_ratio, test_pos_links, neg_pos_link_ratio, ratio*100))
+    test_pos_sample = []
+    test_neg_sample = []
+
+    #random.seed(2018) #generate testing set that contains both pos and neg samples
+    test_pos_sample = random.sample(g.G.edges(), test_pos_links)
+    #test_neg_sample = random.sample(list(nx.classes.function.non_edges(g.G)), int(test_size * neg_pos_link_ratio)) #using nx build-in func, not efficient, to do...
+    #more efficient way: 
+    test_neg_sample = []
+    num_neg_sample = int(test_pos_links * neg_pos_link_ratio)
+    num = 0
+    while num < num_neg_sample:
+        pair_nodes = np.random.choice(g.look_back_list, size=2, replace=False)
+        if pair_nodes not in g.G.edges():
+            num += 1
+            test_neg_sample.append(list(pair_nodes))
+    
+    test_edge_pair = test_pos_sample + test_neg_sample 
+    test_edge_label = list(np.ones(len(test_pos_sample))) + list(np.zeros(len(test_neg_sample)))
+
+    print('before removing, the # of links: ', nx.number_of_edges(g.G), ';   the # of single nodes: ', g.numSingleNodes())
+    g.G.remove_edges_from(test_pos_sample)  #training set should NOT contain testing set i.e. delete testing pos samples
+    g.simulate_sparsely_linked_net(link_reserved = ratio)  #simulate sparse net
+    print('after removing,  the # of links: ', nx.number_of_edges(g.G), ';   the # of single nodes: ', g.numSingleNodes())
+    print("# training links {0}; # positive testing links {1}; # negative testing links {2},".format(nx.number_of_edges(g.G), len(test_pos_sample), len(test_neg_sample)))
+    return g.G, test_edge_pair, test_edge_label
+
+#---------------------------------ulits for downstream tasks--------------------------------
+def load_embeddings(filename):   
+    fin = open(filename, 'r')
+    node_num, size = [int(x) for x in fin.readline().strip().split()]
+    vectors = {} 
+    while 1:
+        l = fin.readline()
+        if l == '':
+            break
+        vec = l.strip().split(' ')
+        assert len(vec) == size+1
+        vectors[vec[0]] = [float(x) for x in vec[1:]]
+    fin.close()
+    assert len(vectors) == node_num
+    return vectors
+
+def read_node_label(filename):
+    fin = open(filename, 'r')
+    X = []
+    Y = []
+    while 1:
+        l = fin.readline()
+        if l == '':
+            break
+        vec = l.strip().split(' ')
+        X.append(vec[0])
+        Y.append(vec[1:])
+    fin.close()
+    return X, Y
+
+
+def read_edge_label(filename):
+    fin = open(filename, 'r')
+    X = []
+    Y = []
+    while 1:
+        l = fin.readline()
+        if l == '':
+            break
+        vec = l.strip().split(' ')
+        X.append(vec[:2])
+        Y.append(vec[2])
+    fin.close()
+    return X, Y
+    
--- a/src/libnrl/downstream.py
+++ b/src/libnrl/downstream.py
@ -0,0 +1,193 @@
+# -*- coding: utf-8 -*-
+from __future__ import print_function
+
+import math
+import random
+import warnings
+
+import networkx as nx
+import numpy as np
+from sklearn.metrics import (accuracy_score, auc, classification_report,
+                             f1_score, roc_auc_score, roc_curve)
+from sklearn.model_selection import train_test_split
+from sklearn.multiclass import OneVsRestClassifier
+from sklearn.preprocessing import MultiLabelBinarizer
+
+warnings.filterwarnings(
+    action='ignore', category=UserWarning, module='sklearn')
+
+'''
+#-----------------------------------------------------------------------------
+# by Chengbin Hou 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+# node classification classifier
+
+
+class ncClassifier(object):
+
+    def __init__(self, vectors, clf):
+        self.embeddings = vectors
+        self.clf = TopKRanker(clf)  # here clf is LR
+        self.binarizer = MultiLabelBinarizer(sparse_output=True)
+
+    def split_train_evaluate(self, X, Y, train_precent, seed=0):
+        state = np.random.get_state()
+        training_size = int(train_precent * len(X))
+        # np.random.seed(seed)
+        shuffle_indices = np.random.permutation(np.arange(len(X)))
+        X_train = [X[shuffle_indices[i]] for i in range(training_size)]
+        Y_train = [Y[shuffle_indices[i]] for i in range(training_size)]
+        X_test = [X[shuffle_indices[i]] for i in range(training_size, len(X))]
+        Y_test = [Y[shuffle_indices[i]] for i in range(training_size, len(X))]
+
+        self.train(X_train, Y_train, Y)
+        np.random.set_state(state)  # why??? for binarizer.transform??
+        return self.evaluate(X_test, Y_test)
+
+    def train(self, X, Y, Y_all):
+        # to support multi-labels, fit means dict mapping {orig cat: binarized vec}
+        self.binarizer.fit(Y_all)
+        X_train = [self.embeddings[x] for x in X]
+        # since we have use Y_all fitted, then we simply transform
+        Y = self.binarizer.transform(Y)
+        self.clf.fit(X_train, Y)
+
+    def predict(self, X, top_k_list):
+        X_ = np.asarray([self.embeddings[x] for x in X])
+        # see TopKRanker(OneVsRestClassifier)
+        # the top k probs to be output...
+        Y = self.clf.predict(X_, top_k_list=top_k_list)
+        return Y
+
+    def evaluate(self, X, Y):
+        # multi-labels, diff len of labels of each node
+        top_k_list = [len(l) for l in Y]
+        Y_ = self.predict(X, top_k_list)  # pred val of X_test i.e. Y_pred
+        Y = self.binarizer.transform(Y)  # true val i.e. Y_test
+        averages = ["micro", "macro", "samples", "weighted"]
+        results = {}
+        for average in averages:
+            results[average] = f1_score(Y, Y_, average=average)
+        # print('Results, using embeddings of dimensionality', len(self.embeddings[X[0]]))
+        print(results)
+        return results
+
+
+class TopKRanker(OneVsRestClassifier):  # orignal LR or SVM is for binary clf
+    def predict(self, X, top_k_list):  # re-define predict func of OneVsRestClassifier
+        probs = np.asarray(super(TopKRanker, self).predict_proba(X))
+        all_labels = []
+        for i, k in enumerate(top_k_list):
+            probs_ = probs[i, :]
+            labels = self.classes_[
+                probs_.argsort()[-k:]].tolist()  # denote labels
+            probs_[:] = 0  # reset probs_ to all 0
+            probs_[labels] = 1  # reset probs_ to 1 if labels denoted...
+            all_labels.append(probs_)
+        return np.asarray(all_labels)
+
+
+# link prediction binary classifier
+class lpClassifier(object):
+
+    def __init__(self, vectors):
+        self.embeddings = vectors
+
+    # clf here is simply a similarity/distance metric
+    def evaluate(self, X_test, Y_test, seed=0):
+        state = np.random.get_state()
+        # np.random.seed(seed)
+        test_size = len(X_test)
+        # shuffle_indices = np.random.permutation(np.arange(test_size))
+        # X_test = [X_test[shuffle_indices[i]] for i in range(test_size)]
+        # Y_test = [Y_test[shuffle_indices[i]] for i in range(test_size)]
+
+        Y_true = [int(i) for i in Y_test]
+        Y_probs = []
+        for i in range(test_size):
+            start_node_emb = np.array(
+                self.embeddings[X_test[i][0]]).reshape(-1, 1)
+            end_node_emb = np.array(
+                self.embeddings[X_test[i][1]]).reshape(-1, 1)
+            # ranging from [-1, +1]
+            score = cosine_similarity(start_node_emb, end_node_emb)
+            # switch to prob... however, we may also directly y_score = score
+            Y_probs.append((score + 1) / 2.0)
+            # in sklearn roc... which yields the same reasult
+        roc = roc_auc_score(y_true=Y_true, y_score=Y_probs)
+        if roc < 0.5:
+            roc = 1.0 - roc  # since lp is binary clf task, just predict the opposite if<0.5
+        print("roc=", "{:.9f}".format(roc))
+        # plt_roc(Y_true, Y_probs) #enable to plot roc curve and return auc value
+
+
+def norm(a):
+    sum = 0.0
+    for i in range(len(a)):
+        sum = sum + a[i] * a[i]
+    return math.sqrt(sum)
+
+
+def cosine_similarity(a, b):
+    sum = 0.0
+    for i in range(len(a)):
+        sum = sum + a[i] * b[i]
+    # return sum/(norm(a) * norm(b))
+    # fix numerical issue 1e-100 almost = 0!
+    return sum / (norm(a) * norm(b) + 1e-100)
+
+
+'''
+#cosine_similarity realized by use...
+#or try sklearn....
+        from sklearn.metrics.pairwise import linear_kernel, cosine_similarity, cosine_distances, euclidean_distances  # we may try diff metrics
+        #ref http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise
+'''
+
+
+def lp_train_test_split(graph, ratio=0.8, neg_pos_link_ratio=1.0):
+    # randomly split links/edges into training set and testing set
+    # *** note: we do not assume every node must be connected after removing links
+    # *** hence, the resulting graph might have few single nodes --> more realistic scenario
+    # *** e.g. a user just sign in a website has no link to others
+
+    # graph: OpenANE graph data strcture
+    # ratio: perc of links for training; ranging [0, 1]
+    # neg_pos_link_ratio: 1.0 means neg-links/pos-links = 1.0 i.e. balance case; raning [0, +inf)
+    g = graph
+    print("links for training {:.2f}%, and links for testing {:.2f}%, neg_pos_link_ratio is {:.2f}".format(
+        ratio * 100, (1 - ratio) * 100, neg_pos_link_ratio))
+    test_pos_sample = []
+    test_neg_sample = []
+    train_size = int(ratio * len(g.G.edges))
+    test_size = len(g.G.edges) - train_size
+
+    # random.seed(2018) #generate testing set that contains both pos and neg samples
+    test_pos_sample = random.sample(g.G.edges(), int(test_size))
+    # test_neg_sample = random.sample(list(nx.classes.function.non_edges(g.G)), int(test_size * neg_pos_link_ratio)) #using nx build-in func, not efficient, to do...
+    # more efficient way:
+    test_neg_sample = []
+    num_neg_sample = int(test_size * neg_pos_link_ratio)
+    num = 0
+    while num < num_neg_sample:
+        pair_nodes = np.random.choice(g.look_back_list, size=2, replace=False)
+        if pair_nodes not in g.G.edges():
+            num += 1
+            test_neg_sample.append(list(pair_nodes))
+
+    test_edge_pair = test_pos_sample + test_neg_sample
+    test_edge_label = list(np.ones(len(test_pos_sample))) + \
+        list(np.zeros(len(test_neg_sample)))
+
+    print('before removing, the # of links: ', g.numDiEdges(),
+          ';   the # of single nodes: ', g.numSingleNodes())
+    # training set should NOT contain testing set i.e. delete testing pos samples
+    g.G.remove_edges_from(test_pos_sample)
+    print('after removing,  the # of links: ', g.numDiEdges(),
+          ';   the # of single nodes: ', g.numSingleNodes())
+    print("# training links {0}; # positive testing links {1}; # negative testing links {2},".format(
+        g.numDiEdges(), len(test_pos_sample), len(test_neg_sample)))
+    return g.G, test_edge_pair, test_edge_label
--- a/src/libnrl/gcn/init.py
+++ b/src/libnrl/gcn/init.py
@ -0,0 +1,2 @@
+from __future__ import print_function
+from __future__ import division
--- a/src/libnrl/gcn/gcnAPI.py
+++ b/src/libnrl/gcn/gcnAPI.py
@ -0,0 +1,168 @@
+import numpy as np
+from .utils import *
+from . import models
+import time
+import scipy.sparse as sp
+import tensorflow as tf
+
+class GCN(object):
+
+    def __init__(self, graph, learning_rate=0.01, epochs=200,
+                 hidden1=16, dropout=0.5, weight_decay=5e-4, early_stopping=10,
+                 max_degree=3, clf_ratio=0.1):
+        """
+                        learning_rate: Initial learning rate
+                        epochs: Number of epochs to train
+                        hidden1: Number of units in hidden layer 1
+                        dropout: Dropout rate (1 - keep probability)
+                        weight_decay: Weight for L2 loss on embedding matrix
+                        early_stopping: Tolerance for early stopping (# of epochs)
+                        max_degree: Maximum Chebyshev polynomial degree
+        """
+        self.graph = graph
+        self.clf_ratio = clf_ratio
+        self.learning_rate = learning_rate
+        self.epochs = epochs
+        self.hidden1 = hidden1
+        self.dropout = dropout
+        self.weight_decay = weight_decay
+        self.early_stopping = early_stopping
+        self.max_degree = max_degree
+
+        self.preprocess_data()
+        self.build_placeholders()
+        # Create model
+        self.model = models.GCN(self.placeholders, input_dim=self.features[2][1], hidden1=self.hidden1, weight_decay=self.weight_decay, logging=True)
+        # Initialize session
+        self.sess = tf.Session()
+        # Init variables
+        self.sess.run(tf.global_variables_initializer())
+
+        cost_val = []
+
+        # Train model
+        for epoch in range(self.epochs):
+
+            t = time.time()
+            # Construct feed dictionary
+            feed_dict = self.construct_feed_dict(self.train_mask)
+            feed_dict.update({self.placeholders['dropout']: self.dropout})
+
+            # Training step
+            outs = self.sess.run([self.model.opt_op, self.model.loss, self.model.accuracy], feed_dict=feed_dict)
+
+            # Validation
+            cost, acc, duration = self.evaluate(self.val_mask)
+            cost_val.append(cost)
+
+            # Print results
+            print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
+                  "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
+                  "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t))
+            '''  #something wrong for early stoppting?? to do...
+            if epoch > self.early_stopping and cost_val[-1] > np.mean(cost_val[-(self.early_stopping+1):-1]):
+                print("Early stopping...")
+                break
+            '''
+        print("Optimization Finished!")
+
+        # Testing
+        test_cost, test_acc, test_duration = self.evaluate(self.test_mask)
+        print("Test set results:", "cost=", "{:.5f}".format(test_cost),
+              "accuracy=", "{:.5f}".format(test_acc), "time=", "{:.5f}".format(test_duration))
+
+
+
+    # Define model evaluation function
+    def evaluate(self, mask):
+        t_test = time.time()
+        feed_dict_val = self.construct_feed_dict(mask)
+        outs_val = self.sess.run([self.model.loss, self.model.accuracy], feed_dict=feed_dict_val)
+        return outs_val[0], outs_val[1], (time.time() - t_test)
+
+    def build_placeholders(self):
+        num_supports = 1
+        self.placeholders = {
+            'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
+            'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(self.features[2], dtype=tf.int64)),
+            'labels': tf.placeholder(tf.float32, shape=(None, self.labels.shape[1])),
+            'labels_mask': tf.placeholder(tf.int32),
+            'dropout': tf.placeholder_with_default(0., shape=()),
+            # helper variable for sparse dropout
+            'num_features_nonzero': tf.placeholder(tf.int32)
+        }
+
+    def build_label(self):
+        g = self.graph.G
+        look_up = self.graph.look_up_dict
+        labels = []
+        label_dict = {}
+        label_id = 0
+        for node in g.nodes():
+            labels.append((node, g.nodes[node]['label']))
+            for l in g.nodes[node]['label']:
+                if l not in label_dict:
+                    label_dict[l] = label_id
+                    label_id += 1
+        self.labels = np.zeros((len(labels), label_id))
+        self.label_dict = label_dict
+        for node, l in labels:
+            node_id = look_up[node]
+            for ll in l:
+                l_id = label_dict[ll]
+                self.labels[node_id][l_id] = 1
+
+    def build_train_val_test(self):
+        """
+            build train_mask test_mask val_mask
+        """
+        train_precent = self.clf_ratio
+        training_size = int(train_precent * self.graph.G.number_of_nodes())
+        state = np.random.get_state()
+        np.random.seed(0)
+        shuffle_indices = np.random.permutation(np.arange(self.graph.G.number_of_nodes()))
+        np.random.set_state(state)
+
+        look_up = self.graph.look_up_dict
+        g = self.graph.G
+        def sample_mask(begin, end):
+            mask = np.zeros(g.number_of_nodes())
+            for i in range(begin, end):
+                mask[shuffle_indices[i]] = 1
+            return mask
+
+        # nodes_num = len(self.labels)
+        # self.train_mask = sample_mask('train', nodes_num)
+        # self.val_mask = sample_mask('valid', nodes_num)
+        # self.test_mask = sample_mask('test', nodes_num)
+        self.train_mask = sample_mask(0, training_size-100)
+        self.val_mask = sample_mask(training_size-100, training_size)
+        self.test_mask = sample_mask(training_size, g.number_of_nodes())
+
+    def preprocess_data(self):
+        """
+            adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask
+            y_train, y_val, y_test can merge to y
+        """
+        g = self.graph.G
+        look_back = self.graph.look_back_list
+        self.features = np.vstack([g.nodes[look_back[i]]['feature']
+            for i in range(g.number_of_nodes())]) 
+        self.features = preprocess_features(self.features)
+        self.build_label()
+        self.build_train_val_test()
+        adj = nx.adjacency_matrix(g) # the type of graph
+        self.support = [preprocess_adj(adj)]
+
+
+    def construct_feed_dict(self, labels_mask):
+        """Construct feed dictionary."""
+        feed_dict = dict()
+        feed_dict.update({self.placeholders['labels']: self.labels})
+        feed_dict.update({self.placeholders['labels_mask']: labels_mask})
+        feed_dict.update({self.placeholders['features']: self.features})
+        feed_dict.update({self.placeholders['support'][i]: self.support[i] for i in range(len(self.support))})
+        feed_dict.update({self.placeholders['num_features_nonzero']: self.features[1].shape})
+        return feed_dict
+
+
--- a/src/libnrl/gcn/inits.py
+++ b/src/libnrl/gcn/inits.py
@ -0,0 +1,27 @@
+import tensorflow as tf
+import numpy as np
+
+
+def uniform(shape, scale=0.05, name=None):
+    """Uniform init."""
+    initial = tf.random_uniform(shape, minval=-scale, maxval=scale, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+
+def glorot(shape, name=None):
+    """Glorot & Bengio (AISTATS 2010) init."""
+    init_range = np.sqrt(6.0/(shape[0]+shape[1]))
+    initial = tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+
+def zeros(shape, name=None):
+    """All zeros."""
+    initial = tf.zeros(shape, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+
+def ones(shape, name=None):
+    """All ones."""
+    initial = tf.ones(shape, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
--- a/src/libnrl/gcn/layers.py
+++ b/src/libnrl/gcn/layers.py
@ -0,0 +1,188 @@
+from .inits import *
+import tensorflow as tf
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+# global unique layer ID dictionary for layer name assignment
+_LAYER_UIDS = {}
+
+
+def get_layer_uid(layer_name=''):
+    """Helper function, assigns unique layer IDs."""
+    if layer_name not in _LAYER_UIDS:
+        _LAYER_UIDS[layer_name] = 1
+        return 1
+    else:
+        _LAYER_UIDS[layer_name] += 1
+        return _LAYER_UIDS[layer_name]
+
+
+def sparse_dropout(x, keep_prob, noise_shape):
+    """Dropout for sparse tensors."""
+    random_tensor = keep_prob
+    random_tensor += tf.random_uniform(noise_shape)
+    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool)
+    pre_out = tf.sparse_retain(x, dropout_mask)
+    return pre_out * (1./keep_prob)
+
+
+def dot(x, y, sparse=False):
+    """Wrapper for tf.matmul (sparse vs dense)."""
+    if sparse:
+        res = tf.sparse_tensor_dense_matmul(x, y)
+    else:
+        res = tf.matmul(x, y)
+    return res
+
+
+class Layer(object):
+    """Base layer class. Defines basic API for all layer objects.
+    Implementation inspired by keras (http://keras.io).
+
+    # Properties
+        name: String, defines the variable scope of the layer.
+        logging: Boolean, switches Tensorflow histogram logging on/off
+
+    # Methods
+        _call(inputs): Defines computation graph of layer
+            (i.e. takes input, returns output)
+        __call__(inputs): Wrapper for _call()
+        _log_vars(): Log all variables
+    """
+
+    def __init__(self, **kwargs):
+        allowed_kwargs = {'name', 'logging'}
+        for kwarg in kwargs.keys():
+            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
+        name = kwargs.get('name')
+        if not name:
+            layer = self.__class__.__name__.lower()
+            name = layer + '_' + str(get_layer_uid(layer))
+        self.name = name
+        self.vars = {}
+        logging = kwargs.get('logging', False)
+        self.logging = logging
+        self.sparse_inputs = False
+
+    def _call(self, inputs):
+        return inputs
+
+    def __call__(self, inputs):
+        with tf.name_scope(self.name):
+            if self.logging and not self.sparse_inputs:
+                tf.summary.histogram(self.name + '/inputs', inputs)
+            outputs = self._call(inputs)
+            if self.logging:
+                tf.summary.histogram(self.name + '/outputs', outputs)
+            return outputs
+
+    def _log_vars(self):
+        for var in self.vars:
+            tf.summary.histogram(self.name + '/vars/' + var, self.vars[var])
+
+
+class Dense(Layer):
+    """Dense layer."""
+    def __init__(self, input_dim, output_dim, placeholders, dropout=0., sparse_inputs=False,
+                 act=tf.nn.relu, bias=False, featureless=False, **kwargs):
+        super(Dense, self).__init__(**kwargs)
+
+        if dropout:
+            self.dropout = placeholders['dropout']
+        else:
+            self.dropout = 0.
+
+        self.act = act
+        self.sparse_inputs = sparse_inputs
+        self.featureless = featureless
+        self.bias = bias
+
+        # helper variable for sparse dropout
+        self.num_features_nonzero = placeholders['num_features_nonzero']
+
+        with tf.variable_scope(self.name + '_vars'):
+            self.vars['weights'] = glorot([input_dim, output_dim],
+                                          name='weights')
+            if self.bias:
+                self.vars['bias'] = zeros([output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+    def _call(self, inputs):
+        x = inputs
+
+        # dropout
+        if self.sparse_inputs:
+            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
+        else:
+            x = tf.nn.dropout(x, 1-self.dropout)
+
+        # transform
+        output = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+
+        return self.act(output)
+
+
+class GraphConvolution(Layer):
+    """Graph convolution layer."""
+    def __init__(self, input_dim, output_dim, placeholders, dropout=0.,
+                 sparse_inputs=False, act=tf.nn.relu, bias=False,
+                 featureless=False, **kwargs):
+        super(GraphConvolution, self).__init__(**kwargs)
+
+        if dropout:
+            self.dropout = placeholders['dropout']
+        else:
+            self.dropout = 0.
+
+        self.act = act
+        self.support = placeholders['support']
+        self.sparse_inputs = sparse_inputs
+        self.featureless = featureless
+        self.bias = bias
+
+        # helper variable for sparse dropout
+        self.num_features_nonzero = placeholders['num_features_nonzero']
+
+        with tf.variable_scope(self.name + '_vars'):
+            for i in range(len(self.support)):
+                self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],
+                                                        name='weights_' + str(i))
+            if self.bias:
+                self.vars['bias'] = zeros([output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+    def _call(self, inputs):
+        x = inputs
+
+        # dropout
+        if self.sparse_inputs:
+            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
+        else:
+            x = tf.nn.dropout(x, 1-self.dropout)
+
+        # convolve
+        supports = list()
+        for i in range(len(self.support)):
+            if not self.featureless:
+                pre_sup = dot(x, self.vars['weights_' + str(i)],
+                              sparse=self.sparse_inputs)
+            else:
+                pre_sup = self.vars['weights_' + str(i)]
+            support = dot(self.support[i], pre_sup, sparse=True)
+            supports.append(support)
+        output = tf.add_n(supports)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+
+        return self.act(output)
--- a/src/libnrl/gcn/metrics.py
+++ b/src/libnrl/gcn/metrics.py
@ -0,0 +1,20 @@
+import tensorflow as tf
+
+
+def masked_softmax_cross_entropy(preds, labels, mask):
+    """Softmax cross-entropy loss with masking."""
+    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.reduce_mean(mask)
+    loss *= mask
+    return tf.reduce_mean(loss)
+
+
+def masked_accuracy(preds, labels, mask):
+    """Accuracy with masking."""
+    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(labels, 1))
+    accuracy_all = tf.cast(correct_prediction, tf.float32)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.reduce_mean(mask)
+    accuracy_all *= mask
+    return tf.reduce_mean(accuracy_all)
--- a/src/libnrl/gcn/models.py
+++ b/src/libnrl/gcn/models.py
@ -0,0 +1,179 @@
+from .layers import *
+from .metrics import *
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+
+class Model(object):
+    def __init__(self, **kwargs):
+        allowed_kwargs = {'name', 'logging'}
+        for kwarg in kwargs.keys():
+            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
+        name = kwargs.get('name')
+        if not name:
+            name = self.__class__.__name__.lower()
+        self.name = name
+
+        logging = kwargs.get('logging', False)
+        self.logging = logging
+
+        self.vars = {}
+        self.placeholders = {}
+
+        self.layers = []
+        self.activations = []
+
+        self.inputs = None
+        self.outputs = None
+
+        self.loss = 0
+        self.accuracy = 0
+        self.optimizer = None
+        self.opt_op = None
+
+    def _build(self):
+        raise NotImplementedError
+
+    def build(self):
+        """ Wrapper for _build() """
+        with tf.variable_scope(self.name):
+            self._build()
+
+        # Build sequential layer model
+        self.activations.append(self.inputs)
+        for layer in self.layers:
+            hidden = layer(self.activations[-1])
+            self.activations.append(hidden)
+        self.outputs = self.activations[-1]
+
+        # Store model variables for easy access
+        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
+        self.vars = {var.name: var for var in variables}
+
+        # Build metrics
+        self._loss()
+        self._accuracy()
+
+        self.opt_op = self.optimizer.minimize(self.loss)
+
+    def predict(self):
+        pass
+
+    def _loss(self):
+        raise NotImplementedError
+
+    def _accuracy(self):
+        raise NotImplementedError
+
+    def save(self, sess=None):
+        if not sess:
+            raise AttributeError("TensorFlow session not provided.")
+        saver = tf.train.Saver(self.vars)
+        save_path = saver.save(sess, "tmp/%s.ckpt" % self.name)
+        print("Model saved in file: %s" % save_path)
+
+    def load(self, sess=None):
+        if not sess:
+            raise AttributeError("TensorFlow session not provided.")
+        saver = tf.train.Saver(self.vars)
+        save_path = "tmp/%s.ckpt" % self.name
+        saver.restore(sess, save_path)
+        print("Model restored from file: %s" % save_path)
+
+
+class MLP(Model):
+    def __init__(self, placeholders, input_dim, **kwargs):
+        super(MLP, self).__init__(**kwargs)
+
+        self.inputs = placeholders['features']
+        self.input_dim = input_dim
+        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions
+        self.output_dim = placeholders['labels'].get_shape().as_list()[1]
+        self.placeholders = placeholders
+
+        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
+
+        self.build()
+
+    def _loss(self):
+        # Weight decay loss
+        for var in self.layers[0].vars.values():
+            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)
+
+        # Cross entropy error
+        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
+                                                  self.placeholders['labels_mask'])
+
+    def _accuracy(self):
+        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],
+                                        self.placeholders['labels_mask'])
+
+    def _build(self):
+        self.layers.append(Dense(input_dim=self.input_dim,
+                                 output_dim=FLAGS.hidden1,
+                                 placeholders=self.placeholders,
+                                 act=tf.nn.relu,
+                                 dropout=True,
+                                 sparse_inputs=True,
+                                 logging=self.logging))
+
+        self.layers.append(Dense(input_dim=FLAGS.hidden1,
+                                 output_dim=self.output_dim,
+                                 placeholders=self.placeholders,
+                                 act=lambda x: x,
+                                 dropout=True,
+                                 logging=self.logging))
+
+    def predict(self):
+        return tf.nn.softmax(self.outputs)
+
+
+class GCN(Model):
+    def __init__(self, placeholders, input_dim, hidden1, weight_decay, **kwargs):
+        super(GCN, self).__init__(**kwargs)
+
+        self.inputs = placeholders['features']
+        self.hidden1 = hidden1
+        self.weight_decay = weight_decay
+        self.input_dim = input_dim
+        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions
+        self.output_dim = placeholders['labels'].get_shape().as_list()[1]
+        self.placeholders = placeholders
+
+        self.optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
+
+        self.build()
+
+    def _loss(self):
+        # Weight decay loss
+        for var in self.layers[0].vars.values():
+            self.loss += self.weight_decay * tf.nn.l2_loss(var)
+
+        # Cross entropy error
+        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
+                                                  self.placeholders['labels_mask'])
+
+    def _accuracy(self):
+        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],
+                                        self.placeholders['labels_mask'])
+
+    def _build(self):
+
+        self.layers.append(GraphConvolution(input_dim=self.input_dim,
+                                            output_dim=self.hidden1,
+                                            placeholders=self.placeholders,
+                                            act=tf.nn.relu,
+                                            dropout=True,
+                                            sparse_inputs=True,
+                                            logging=self.logging))
+
+        self.layers.append(GraphConvolution(input_dim=self.hidden1,
+                                            output_dim=self.output_dim,
+                                            placeholders=self.placeholders,
+                                            act=lambda x: x,
+                                            dropout=True,
+                                            logging=self.logging))
+
+    def predict(self):
+        return tf.nn.softmax(self.outputs)
--- a/src/libnrl/gcn/train.py
+++ b/src/libnrl/gcn/train.py
@ -0,0 +1,107 @@
+from __future__ import division
+from __future__ import print_function
+
+import time
+import tensorflow as tf
+
+from gcn.utils import *
+from gcn.models import GCN, MLP
+
+# Set random seed
+seed = 123
+np.random.seed(seed)
+tf.set_random_seed(seed)
+
+# Settings
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_string('dataset', 'cora', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'
+flags.DEFINE_string('model', 'gcn', 'Model string.')  # 'gcn', 'gcn_cheby', 'dense'
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')
+flags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')
+flags.DEFINE_float('dropout', 0.5, 'Dropout rate (1 - keep probability).')
+flags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')
+flags.DEFINE_integer('early_stopping', 10, 'Tolerance for early stopping (# of epochs).')
+flags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')
+
+# Load data
+adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)
+
+# Some preprocessing
+features = preprocess_features(features)
+if FLAGS.model == 'gcn':
+    support = [preprocess_adj(adj)]
+    num_supports = 1
+    model_func = GCN
+elif FLAGS.model == 'gcn_cheby':
+    support = chebyshev_polynomials(adj, FLAGS.max_degree)
+    num_supports = 1 + FLAGS.max_degree
+    model_func = GCN
+elif FLAGS.model == 'dense':
+    support = [preprocess_adj(adj)]  # Not used
+    num_supports = 1
+    model_func = MLP
+else:
+    raise ValueError('Invalid argument for model: ' + str(FLAGS.model))
+
+# Define placeholders
+placeholders = {
+    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
+    'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64)),
+    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
+    'labels_mask': tf.placeholder(tf.int32),
+    'dropout': tf.placeholder_with_default(0., shape=()),
+    'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout
+}
+
+# Create model
+model = model_func(placeholders, input_dim=features[2][1], logging=True)
+
+# Initialize session
+sess = tf.Session()
+
+
+# Define model evaluation function
+def evaluate(features, support, labels, mask, placeholders):
+    t_test = time.time()
+    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)
+    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)
+    return outs_val[0], outs_val[1], (time.time() - t_test)
+
+
+# Init variables
+sess.run(tf.global_variables_initializer())
+
+cost_val = []
+
+# Train model
+for epoch in range(FLAGS.epochs):
+
+    t = time.time()
+    # Construct feed dictionary
+    feed_dict = construct_feed_dict(features, support, y_train, train_mask, placeholders)
+    feed_dict.update({placeholders['dropout']: FLAGS.dropout})
+
+    # Training step
+    outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)
+
+    # Validation
+    cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders)
+    cost_val.append(cost)
+
+    # Print results
+    print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
+          "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
+          "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t))
+
+    if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping+1):-1]):
+        print("Early stopping...")
+        break
+
+print("Optimization Finished!")
+
+# Testing
+test_cost, test_acc, test_duration = evaluate(features, support, y_test, test_mask, placeholders)
+print("Test set results:", "cost=", "{:.5f}".format(test_cost),
+      "accuracy=", "{:.5f}".format(test_acc), "time=", "{:.5f}".format(test_duration))
--- a/src/libnrl/gcn/utils.py
+++ b/src/libnrl/gcn/utils.py
@ -0,0 +1,152 @@
+import numpy as np
+import pickle as pkl
+import networkx as nx
+import scipy.sparse as sp
+from scipy.sparse.linalg.eigen.arpack import eigsh
+import sys
+
+
+def parse_index_file(filename):
+    """Parse index file."""
+    index = []
+    for line in open(filename):
+        index.append(int(line.strip()))
+    return index
+
+
+def sample_mask(idx, l):
+    """Create mask."""
+    mask = np.zeros(l)
+    mask[idx] = 1
+    return np.array(mask, dtype=np.bool)
+
+
+def load_data(dataset_str):
+    """Load data."""
+    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
+    objects = []
+    for i in range(len(names)):
+        with open("data/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
+            if sys.version_info > (3, 0):
+                objects.append(pkl.load(f, encoding='latin1'))
+            else:
+                objects.append(pkl.load(f))
+
+    x, y, tx, ty, allx, ally, graph = tuple(objects)
+    test_idx_reorder = parse_index_file("data/ind.{}.test.index".format(dataset_str))
+    test_idx_range = np.sort(test_idx_reorder)
+
+    if dataset_str == 'citeseer':
+        # Fix citeseer dataset (there are some isolated nodes in the graph)
+        # Find isolated nodes, add them as zero-vecs into the right position
+        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1)
+        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
+        tx_extended[test_idx_range-min(test_idx_range), :] = tx
+        tx = tx_extended
+        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
+        ty_extended[test_idx_range-min(test_idx_range), :] = ty
+        ty = ty_extended
+
+    features = sp.vstack((allx, tx)).tolil()
+    features[test_idx_reorder, :] = features[test_idx_range, :]
+    adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))
+
+    labels = np.vstack((ally, ty))
+    labels[test_idx_reorder, :] = labels[test_idx_range, :]
+
+    idx_test = test_idx_range.tolist()
+    idx_train = range(len(y))
+    idx_val = range(len(y), len(y)+500)
+
+    train_mask = sample_mask(idx_train, labels.shape[0])
+    val_mask = sample_mask(idx_val, labels.shape[0])
+    test_mask = sample_mask(idx_test, labels.shape[0])
+
+    y_train = np.zeros(labels.shape)
+    y_val = np.zeros(labels.shape)
+    y_test = np.zeros(labels.shape)
+    y_train[train_mask, :] = labels[train_mask, :]
+    y_val[val_mask, :] = labels[val_mask, :]
+    y_test[test_mask, :] = labels[test_mask, :]
+
+    return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask
+
+
+def sparse_to_tuple(sparse_mx):
+    """Convert sparse matrix to tuple representation."""
+    def to_tuple(mx):
+        if not sp.isspmatrix_coo(mx):
+            mx = mx.tocoo()
+        coords = np.vstack((mx.row, mx.col)).transpose()
+        values = mx.data
+        shape = mx.shape
+        return coords, values, shape
+
+    if isinstance(sparse_mx, list):
+        for i in range(len(sparse_mx)):
+            sparse_mx[i] = to_tuple(sparse_mx[i])
+    else:
+        sparse_mx = to_tuple(sparse_mx)
+
+    return sparse_mx
+
+
+def preprocess_features(features):
+    """Row-normalize feature matrix and convert to tuple representation"""
+    rowsum = np.array(features.sum(1))
+    r_inv = np.power(rowsum, -1).flatten()
+    r_inv[np.isinf(r_inv)] = 0.
+    r_mat_inv = sp.diags(r_inv)
+    features = sp.coo_matrix(features)
+    features = r_mat_inv.dot(features)
+    return sparse_to_tuple(features)
+
+
+def normalize_adj(adj):
+    """Symmetrically normalize adjacency matrix."""
+    adj = sp.coo_matrix(adj)
+    rowsum = np.array(adj.sum(1))
+    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
+    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
+    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
+    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
+
+
+def preprocess_adj(adj):
+    """Preprocessing of adjacency matrix for simple GCN model and conversion to tuple representation."""
+    adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0]))
+    return sparse_to_tuple(adj_normalized)
+
+
+def construct_feed_dict(features, support, labels, labels_mask, placeholders):
+    """Construct feed dictionary."""
+    feed_dict = dict()
+    feed_dict.update({placeholders['labels']: labels})
+    feed_dict.update({placeholders['labels_mask']: labels_mask})
+    feed_dict.update({placeholders['features']: features})
+    feed_dict.update({placeholders['support'][i]: support[i] for i in range(len(support))})
+    feed_dict.update({placeholders['num_features_nonzero']: features[1].shape})
+    return feed_dict
+
+
+def chebyshev_polynomials(adj, k):
+    """Calculate Chebyshev polynomials up to order k. Return a list of sparse matrices (tuple representation)."""
+    print("Calculating Chebyshev polynomials up to order {}...".format(k))
+
+    adj_normalized = normalize_adj(adj)
+    laplacian = sp.eye(adj.shape[0]) - adj_normalized
+    largest_eigval, _ = eigsh(laplacian, 1, which='LM')
+    scaled_laplacian = (2. / largest_eigval[0]) * laplacian - sp.eye(adj.shape[0])
+
+    t_k = list()
+    t_k.append(sp.eye(adj.shape[0]))
+    t_k.append(scaled_laplacian)
+
+    def chebyshev_recurrence(t_k_minus_one, t_k_minus_two, scaled_lap):
+        s_lap = sp.csr_matrix(scaled_lap, copy=True)
+        return 2 * s_lap.dot(t_k_minus_one) - t_k_minus_two
+
+    for i in range(2, k+1):
+        t_k.append(chebyshev_recurrence(t_k[-1], t_k[-2], scaled_laplacian))
+
+    return sparse_to_tuple(t_k)
--- a/src/libnrl/graph.py
+++ b/src/libnrl/graph.py
@ -0,0 +1,160 @@
+"""
+commonly used graph APIs based NetworkX;
+use g.xxx to access the commonly used APIs offered by us;
+use g.G.xxx to access NetworkX APIs;
+
+by Chengbin Hou 2018 <chengbin.hou10@foxmail.com>
+"""
+
+import time
+import random
+import numpy as np
+import scipy.sparse as sp
+import networkx as nx
+
+class Graph(object):
+    def __init__(self):
+        self.G = None  #to access NetworkX graph data structure
+        self.look_up_dict = {}    #use node ID to find index via g.look_up_dict['0']
+        self.look_back_list = []  #use index to find node ID via g.look_back_list[0]
+    
+    #--------------------------------------------------------------------------------------
+    #--------------------commonly used APIs that will modify graph-------------------------
+    #--------------------------------------------------------------------------------------
+    def node_mapping(self):
+        """ node id and index mapping;
+            based on the order given by networkx G.nodes();
+            NB: updating is needed if any node is added/removed;
+        """
+        i = 0 #node index
+        self.look_up_dict = {} #init
+        self.look_back_list = [] #init
+        for node_id in self.G.nodes(): #node id
+            self.look_up_dict[node_id] = i
+            self.look_back_list.append(node_id)
+            i += 1
+    
+    def read_adjlist(self, path, directed=False):
+        """ read adjacency list format graph;
+            support unweighted and (un)directed graph;
+            format: see https://networkx.github.io/documentation/stable/reference/readwrite/adjlist.html
+            NB: not supoort weighted graph
+        """
+        if directed:
+            self.G = nx.read_adjlist(path, create_using=nx.DiGraph())
+        else:
+            self.G = nx.read_adjlist(path, create_using=nx.Graph())
+        self.node_mapping() #update node id index mapping
+
+    def read_edgelist(self, path, weighted=False, directed=False):
+        """ read edge list format graph;
+            support (un)weighted and (un)directed graph;
+            format: see https://networkx.github.io/documentation/stable/reference/readwrite/edgelist.html
+        """
+        if directed:
+            self.G = nx.read_edgelist(path, create_using=nx.DiGraph())
+        else:
+            self.G = nx.read_edgelist(path, create_using=nx.Graph())
+        self.node_mapping() #update node id index mapping
+    
+    def read_node_attr(self, path):
+        """ read node attributes and store as NetworkX graph {'node_id': {'attr': values}}
+            input file format: node_id1 attr1 attr2 ... attrM 
+                               node_id2 attr1 attr2 ... attrM
+        """
+        with open(path, 'r') as fin:
+            for l in fin.readlines():
+                vec = l.split()
+                self.G.nodes[vec[0]]['attr'] = np.array([float(x) for x in vec[1:]])
+
+    def read_node_label(self, path):
+        """ todo... read node labels and store as NetworkX graph {'node_id': {'label': values}}
+            input file format: node_id1 labels
+                               node_id2 labels
+        with open(path, 'r') as fin:
+            for l in fin.readlines():
+                vec = l.split()
+                self.G.nodes[vec[0]]['label'] = np.array([float(x) for x in vec[1:]])
+        """
+        pass #to do...
+
+    def remove_edge(self, ratio=0.0):
+        """ randomly remove edges/links
+            ratio: the percentage of edges to be removed
+            edges_removed: return removed edges, each of which is a pair of nodes
+        """
+        num_edges_removed = int( ratio * self.G.number_of_edges() )
+        #random.seed(2018)
+        edges_removed = random.sample(self.G.edges(), int(num_edges_removed))
+        print('before removing, the # of edges: ', self.G.number_of_edges())
+        self.G.remove_edges_from(edges_removed)
+        print('after removing, the # of edges: ', self.G.number_of_edges())
+        return edges_removed
+
+    def remove_node_attr(self, ratio):
+        """ todo... randomly remove node attributes;
+        """
+        pass #to do...
+
+    def remove_node(self, ratio):
+        """ todo... randomly remove nodes;
+            #self.node_mapping() #update node id index mapping is needed
+        """
+        pass #to do...
+    
+    #------------------------------------------------------------------------------------------
+    #--------------------commonly used APIs that will not modify graph-------------------------
+    #------------------------------------------------------------------------------------------
+    def get_adj_mat(self, is_sparse=True):
+        """ return adjacency matrix;
+            use 'csr' format for sparse matrix
+        """
+        if is_sparse:
+            return nx.to_scipy_sparse_matrix(self.G, nodelist=self.look_back_list, format='csr', dtype='float64')
+        else:
+            return nx.to_numpy_matrix(self.G, nodelist=self.look_back_list, dtype='float64')
+
+    def get_attr_mat(self, is_sparse=True):
+        """ return attribute matrix;
+            use 'csr' format for sparse matrix
+        """
+        attr_dense_narray = np.vstack([self.G.nodes[self.look_back_list[i]]['attr'] for i in range(self.get_num_nodes())])
+        if is_sparse:
+            return sp.csr_matrix(attr_dense_narray, dtype='float64')
+        else:
+            return np.matrix(attr_dense_narray, dtype='float64')
+
+    def get_num_nodes(self):
+        """ return the number of nodes """
+        return nx.number_of_nodes(self.G)
+
+    def get_num_edges(self):
+        """ return the number of edges """
+        return nx.number_of_edges(self.G)
+
+    def get_num_isolates(self):
+        """ return the number of isolated nodes """
+        return len(list(nx.isolates(self.G)))
+
+    def get_isdirected(self):
+        """ return True if it is directed graph """
+        return nx.is_directed(self.G)
+
+    def get_isweighted(self):
+        """ return True if it is weighted graph """
+        return nx.is_weighted(self.G)
+    
+    def get_neighbors(self, node):
+        """ return neighbors connected to a node """
+        return list(nx.neighbors(self.G, node))
+
+    def get_common_neighbors(self, node1, node2):
+        """ return common neighbors of two nodes """
+        return list(nx.common_neighbors(self.G, node1, node2))
+
+    def get_centrality(self, centrality_type='degree'):
+        """ todo... return specified type of centrality
+            see https://networkx.github.io/documentation/stable/reference/algorithms/centrality.html
+        """ 
+        pass #to do...
+
--- a/src/libnrl/graphsage/README.md
+++ b/src/libnrl/graphsage/README.md
@ -0,0 +1,117 @@
+## GraphSage: Representation Learning on Large Graphs
+
+#### Authors: [William L. Hamilton](http://stanford.edu/~wleif) (wleif@stanford.edu), [Rex Ying](http://joy-of-thinking.weebly.com/) (rexying@stanford.edu)
+#### [Project Website](http://snap.stanford.edu/graphsage/)
+
+#### [Alternative reference PyTorch implementation](https://github.com/williamleif/graphsage-simple/)
+
+### Overview
+
+This directory contains code necessary to run the GraphSage algorithm.
+GraphSage can be viewed as a stochastic generalization of graph convolutions, and it is especially useful for massive, dynamic graphs that contain rich feature information.
+See our [paper](https://arxiv.org/pdf/1706.02216.pdf) for details on the algorithm.
+
+*Note:* GraphSage now also has better support for training on smaller, static graphs and graphs that don't have node features.
+The original algorithm and paper are focused on the task of inductive generalization (i.e., generating embeddings for nodes that were not present during training),
+but many benchmarks/tasks use simple static graphs that do not necessarily have features.
+To support this use case, GraphSage now includes optional "identity features" that can be used with or without other node attributes.
+Including identity features will increase the runtime, but also potentially increase performance (at the usual risk of overfitting).
+See the section on "Running the code" below.
+
+*Note:* GraphSage is intended for use on large graphs (>100,000) nodes. The overhead of subsampling will start to outweigh its benefits on smaller graphs. 
+
+The example_data subdirectory contains a small example of the protein-protein interaction data,
+which includes 3 training graphs + one validation graph and one test graph.
+The full Reddit and PPI datasets (described in the paper) are available on the [project website](http://snap.stanford.edu/graphsage/).
+
+If you make use of this code or the GraphSage algorithm in your work, please cite the following paper:
+
+     @inproceedings{hamilton2017inductive,
+	     author = {Hamilton, William L. and Ying, Rex and Leskovec, Jure},
+	     title = {Inductive Representation Learning on Large Graphs},
+	     booktitle = {NIPS},
+	     year = {2017}
+	   }
+
+### Requirements
+
+Recent versions of TensorFlow, numpy, scipy, sklearn, and networkx are required (but networkx must be <=1.11). You can install all the required packages using the following command:
+
+	$ pip install -r requirements.txt
+
+To guarantee that you have the right package versions, you can use [docker](https://docs.docker.com/) to easily set up a virtual environment. See the Docker subsection below for more info.
+
+#### Docker
+
+If you do not have [docker](https://docs.docker.com/) installed, you will need to do so. (Just click on the preceding link, the installation is pretty painless).  
+
+You can run GraphSage inside a [docker](https://docs.docker.com/) image. After cloning the project, build and run the image as following:
+
+	$ docker build -t graphsage .
+	$ docker run -it graphsage bash
+
+or start a Jupyter Notebook instead of bash:
+
+	$ docker run -it -p 8888:8888 graphsage
+
+You can also run the GPU image using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker):
+
+	$ docker build -t graphsage:gpu -f Dockerfile.gpu .
+	$ nvidia-docker run -it graphsage:gpu bash	
+
+### Running the code
+
+The example_unsupervised.sh and example_supervised.sh files contain example usages of the code, which use the unsupervised and supervised variants of GraphSage, respectively.
+
+If your benchmark/task does not require generalizing to unseen data, we recommend you try setting the "--identity_dim" flag to a value in the range [64,256].
+This flag will make the model embed unique node ids as attributes, which will increase the runtime and number of parameters but also potentially increase the performance.
+Note that you should set this flag and *not* try to pass dense one-hot vectors as features (due to sparsity).
+The "dimension" of identity features specifies how many parameters there are per node in the sparse identity-feature lookup table.
+
+Note that example_unsupervised.sh sets a very small max iteration number, which can be increased to improve performance.
+We generally found that performance continued to improve even after the loss was very near convergence (i.e., even when the loss was decreasing at a very slow rate).
+
+*Note:* For the PPI data, and any other multi-ouput dataset that allows individual nodes to belong to multiple classes, it is necessary to set the `--sigmoid` flag during supervised training. By default the model assumes that the dataset is in the "one-hot" categorical setting.
+
+
+#### Input format
+As input, at minimum the code requires that a --train_prefix option is specified which specifies the following data files:
+
+* <train_prefix>-G.json -- A networkx-specified json file describing the input graph. Nodes have 'val' and 'test' attributes specifying if they are a part of the validation and test sets, respectively.
+* <train_prefix>-id_map.json -- A json-stored dictionary mapping the graph node ids to consecutive integers.
+* <train_prefix>-class_map.json -- A json-stored dictionary mapping the graph node ids to classes.
+* <train_prefix>-feats.npy [optional] --- A numpy-stored array of node features; ordering given by id_map.json. Can be omitted and only identity features will be used.
+* <train_prefix>-walks.txt [optional] --- A text file specifying random walk co-occurrences (one pair per line) (*only for unsupervised version of graphsage)
+
+To run the model on a new dataset, you need to make data files in the format described above.
+To run random walks for the unsupervised model and to generate the <prefix>-walks.txt file)
+you can use the `run_walks` function in `graphsage.utils`.
+
+#### Model variants
+The user must also specify a --model, the variants of which are described in detail in the paper:
+* graphsage_mean -- GraphSage with mean-based aggregator
+* graphsage_seq -- GraphSage with LSTM-based aggregator
+* graphsage_maxpool -- GraphSage with max-pooling aggregator (as described in the NIPS 2017 paper)
+* graphsage_meanpool -- GraphSage with mean-pooling aggregator (a variant of the pooling aggregator, where the element-wie mean replaces the element-wise max).
+* gcn -- GraphSage with GCN-based aggregator
+* n2v -- an implementation of [DeepWalk](https://arxiv.org/abs/1403.6652) (called n2v for short in the code.)
+
+#### Logging directory
+Finally, a --base_log_dir should be specified (it defaults to the current directory).
+The output of the model and log files will be stored in a subdirectory of the base_log_dir.
+The path to the logged data will be of the form `<sup/unsup>-<data_prefix>/graphsage-<model_description>/`.
+The supervised model will output F1 scores, while the unsupervised model will train embeddings and store them.
+The unsupervised embeddings will be stored in a numpy formated file named val.npy with val.txt specifying the order of embeddings as a per-line list of node ids.
+Note that the full log outputs and stored embeddings can be 5-10Gb in size (on the full data when running with the unsupervised variant).
+
+#### Using the output of the unsupervised models
+
+The unsupervised variants of GraphSage will output embeddings to the logging directory as described above.
+These embeddings can then be used in downstream machine learning applications.
+The `eval_scripts` directory contains examples of feeding the embeddings into simple logistic classifiers.
+
+#### Acknowledgements
+
+The original version of this code base was originally forked from https://github.com/tkipf/gcn/, and we owe many thanks to Thomas Kipf for making his code available.
+We also thank Yuanfang Li and Xin Li who contributed to a course project that was based on this work.
+Please see the [paper](https://arxiv.org/pdf/1706.02216.pdf) for funding details and additional (non-code related) acknowledgements.
--- a/src/libnrl/graphsage/init.py
+++ b/src/libnrl/graphsage/init.py
@ -0,0 +1,40 @@
+from __future__ import print_function
+from __future__ import division
+import numpy as np
+import tensorflow as tf
+#default parameters
+#seed = 2018
+#np.random.seed(seed)
+#tf.set_random_seed(seed)
+log_device_placement = False
+
+# follow the orignal code by the paper author https://github.com/williamleif/GraphSAGE
+# we follow the opt parameters given by papers GCN and graphSAGE 
+# note: citeseer+pubmed all follow the same parameters as cora, see their papers)
+# tensorflow + Adam optimizer + Random weight init + row norm of attr
+
+epochs = 100
+dim_1 = 64      #dim = dim1+dim2 = 128
+dim_2 = 64
+samples_1 = 25
+samples_2 = 10
+dropout = 0.5
+weight_decay = 0.0001
+learning_rate = 0.0001
+batch_size = 128  #if run out of memory, try to reduce them, but we use the default e.g. 64, default=512
+normalize = True  #row norm of node attributes/features
+
+#other parameters that paper did not mentioned, but we also follow the defaults https://github.com/williamleif/GraphSAGE
+model_size = 'small'
+max_degree = 100
+neg_sample_size = 20
+
+random_context= True
+validate_batch_size = 64  #if run out of memory, try to reduce them, but we use the default e.g. 64, default=256
+validate_iter = 5000
+max_total_steps = 10**10
+n2v_test_epochs = 1
+identity_dim = 0
+train_prefix = ''
+base_log_dir = ''
+#print_every = 50
--- a/src/libnrl/graphsage/aggregators.py
+++ b/src/libnrl/graphsage/aggregators.py
@ -0,0 +1,450 @@
+import tensorflow as tf
+
+from libnrl.graphsage.layers import Layer, Dense
+from libnrl.graphsage.inits import glorot, zeros
+
+class MeanAggregator(Layer):
+    """
+    Aggregates via mean followed by matmul and non-linearity.
+    """
+
+    def __init__(self, input_dim, output_dim, neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, 
+            name=None, concat=False, **kwargs):
+        super(MeanAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['neigh_weights'] = glorot([neigh_input_dim, output_dim],
+                                                        name='neigh_weights')
+            self.vars['self_weights'] = glorot([input_dim, output_dim],
+                                                        name='self_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+
+        neigh_vecs = tf.nn.dropout(neigh_vecs, 1-self.dropout)
+        self_vecs = tf.nn.dropout(self_vecs, 1-self.dropout)
+        neigh_means = tf.reduce_mean(neigh_vecs, axis=1)
+       
+        # [nodes] x [out_dim]
+        from_neighs = tf.matmul(neigh_means, self.vars['neigh_weights'])
+
+        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
+         
+        if not self.concat:
+            output = tf.add_n([from_self, from_neighs])
+        else:
+            output = tf.concat([from_self, from_neighs], axis=1)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
+class GCNAggregator(Layer):
+    """
+    Aggregates via mean followed by matmul and non-linearity.
+    Same matmul parameters are used self vector and neighbor vectors.
+    """
+
+    def __init__(self, input_dim, output_dim, neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
+        super(GCNAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['weights'] = glorot([neigh_input_dim, output_dim],
+                                                        name='neigh_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+
+        neigh_vecs = tf.nn.dropout(neigh_vecs, 1-self.dropout)
+        self_vecs = tf.nn.dropout(self_vecs, 1-self.dropout)
+        means = tf.reduce_mean(tf.concat([neigh_vecs, 
+            tf.expand_dims(self_vecs, axis=1)], axis=1), axis=1)
+       
+        # [nodes] x [out_dim]
+        output = tf.matmul(means, self.vars['weights'])
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
+
+class MaxPoolingAggregator(Layer):
+    """ Aggregates via max-pooling over MLP functions.
+    """
+    def __init__(self, input_dim, output_dim, model_size="small", neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
+        super(MaxPoolingAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        if model_size == "small":
+            hidden_dim = self.hidden_dim = 512
+        elif model_size == "big":
+            hidden_dim = self.hidden_dim = 1024
+
+        self.mlp_layers = []
+        self.mlp_layers.append(Dense(input_dim=neigh_input_dim,
+                                 output_dim=hidden_dim,
+                                 act=tf.nn.relu,
+                                 dropout=dropout,
+                                 sparse_inputs=False,
+                                 logging=self.logging))
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['neigh_weights'] = glorot([hidden_dim, output_dim],
+                                                        name='neigh_weights')
+           
+            self.vars['self_weights'] = glorot([input_dim, output_dim],
+                                                        name='self_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+        self.neigh_input_dim = neigh_input_dim
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+        neigh_h = neigh_vecs
+
+        dims = tf.shape(neigh_h)
+        batch_size = dims[0]
+        num_neighbors = dims[1]
+        # [nodes * sampled neighbors] x [hidden_dim]
+        h_reshaped = tf.reshape(neigh_h, (batch_size * num_neighbors, self.neigh_input_dim))
+
+        for l in self.mlp_layers:
+            h_reshaped = l(h_reshaped)
+        neigh_h = tf.reshape(h_reshaped, (batch_size, num_neighbors, self.hidden_dim))
+        neigh_h = tf.reduce_max(neigh_h, axis=1)
+        
+        from_neighs = tf.matmul(neigh_h, self.vars['neigh_weights'])
+        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
+        
+        if not self.concat:
+            output = tf.add_n([from_self, from_neighs])
+        else:
+            output = tf.concat([from_self, from_neighs], axis=1)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
+class MeanPoolingAggregator(Layer):
+    """ Aggregates via mean-pooling over MLP functions.
+    """
+    def __init__(self, input_dim, output_dim, model_size="small", neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
+        super(MeanPoolingAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        if model_size == "small":
+            hidden_dim = self.hidden_dim = 512
+        elif model_size == "big":
+            hidden_dim = self.hidden_dim = 1024
+
+        self.mlp_layers = []
+        self.mlp_layers.append(Dense(input_dim=neigh_input_dim,
+                                 output_dim=hidden_dim,
+                                 act=tf.nn.relu,
+                                 dropout=dropout,
+                                 sparse_inputs=False,
+                                 logging=self.logging))
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['neigh_weights'] = glorot([hidden_dim, output_dim],
+                                                        name='neigh_weights')
+           
+            self.vars['self_weights'] = glorot([input_dim, output_dim],
+                                                        name='self_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+        self.neigh_input_dim = neigh_input_dim
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+        neigh_h = neigh_vecs
+
+        dims = tf.shape(neigh_h)
+        batch_size = dims[0]
+        num_neighbors = dims[1]
+        # [nodes * sampled neighbors] x [hidden_dim]
+        h_reshaped = tf.reshape(neigh_h, (batch_size * num_neighbors, self.neigh_input_dim))
+
+        for l in self.mlp_layers:
+            h_reshaped = l(h_reshaped)
+        neigh_h = tf.reshape(h_reshaped, (batch_size, num_neighbors, self.hidden_dim))
+        neigh_h = tf.reduce_mean(neigh_h, axis=1)
+        
+        from_neighs = tf.matmul(neigh_h, self.vars['neigh_weights'])
+        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
+        
+        if not self.concat:
+            output = tf.add_n([from_self, from_neighs])
+        else:
+            output = tf.concat([from_self, from_neighs], axis=1)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
+
+class TwoMaxLayerPoolingAggregator(Layer):
+    """ Aggregates via pooling over two MLP functions.
+    """
+    def __init__(self, input_dim, output_dim, model_size="small", neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
+        super(TwoMaxLayerPoolingAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        if model_size == "small":
+            hidden_dim_1 = self.hidden_dim_1 = 512
+            hidden_dim_2 = self.hidden_dim_2 = 256
+        elif model_size == "big":
+            hidden_dim_1 = self.hidden_dim_1 = 1024
+            hidden_dim_2 = self.hidden_dim_2 = 512
+
+        self.mlp_layers = []
+        self.mlp_layers.append(Dense(input_dim=neigh_input_dim,
+                                 output_dim=hidden_dim_1,
+                                 act=tf.nn.relu,
+                                 dropout=dropout,
+                                 sparse_inputs=False,
+                                 logging=self.logging))
+        self.mlp_layers.append(Dense(input_dim=hidden_dim_1,
+                                 output_dim=hidden_dim_2,
+                                 act=tf.nn.relu,
+                                 dropout=dropout,
+                                 sparse_inputs=False,
+                                 logging=self.logging))
+
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['neigh_weights'] = glorot([hidden_dim_2, output_dim],
+                                                        name='neigh_weights')
+           
+            self.vars['self_weights'] = glorot([input_dim, output_dim],
+                                                        name='self_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+        self.neigh_input_dim = neigh_input_dim
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+        neigh_h = neigh_vecs
+
+        dims = tf.shape(neigh_h)
+        batch_size = dims[0]
+        num_neighbors = dims[1]
+        # [nodes * sampled neighbors] x [hidden_dim]
+        h_reshaped = tf.reshape(neigh_h, (batch_size * num_neighbors, self.neigh_input_dim))
+
+        for l in self.mlp_layers:
+            h_reshaped = l(h_reshaped)
+        neigh_h = tf.reshape(h_reshaped, (batch_size, num_neighbors, self.hidden_dim_2))
+        neigh_h = tf.reduce_max(neigh_h, axis=1)
+        
+        from_neighs = tf.matmul(neigh_h, self.vars['neigh_weights'])
+        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
+        
+        if not self.concat:
+            output = tf.add_n([from_self, from_neighs])
+        else:
+            output = tf.concat([from_self, from_neighs], axis=1)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
+class SeqAggregator(Layer):
+    """ Aggregates via a standard LSTM.
+    """
+    def __init__(self, input_dim, output_dim, model_size="small", neigh_input_dim=None,
+            dropout=0., bias=False, act=tf.nn.relu, name=None,  concat=False, **kwargs):
+        super(SeqAggregator, self).__init__(**kwargs)
+
+        self.dropout = dropout
+        self.bias = bias
+        self.act = act
+        self.concat = concat
+
+        if neigh_input_dim is None:
+            neigh_input_dim = input_dim
+
+        if name is not None:
+            name = '/' + name
+        else:
+            name = ''
+
+        if model_size == "small":
+            hidden_dim = self.hidden_dim = 128
+        elif model_size == "big":
+            hidden_dim = self.hidden_dim = 256
+
+        with tf.variable_scope(self.name + name + '_vars'):
+            self.vars['neigh_weights'] = glorot([hidden_dim, output_dim],
+                                                        name='neigh_weights')
+           
+            self.vars['self_weights'] = glorot([input_dim, output_dim],
+                                                        name='self_weights')
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+        self.neigh_input_dim = neigh_input_dim
+        self.cell = tf.contrib.rnn.BasicLSTMCell(self.hidden_dim)
+
+    def _call(self, inputs):
+        self_vecs, neigh_vecs = inputs
+
+        dims = tf.shape(neigh_vecs)
+        batch_size = dims[0]
+        initial_state = self.cell.zero_state(batch_size, tf.float32)
+        used = tf.sign(tf.reduce_max(tf.abs(neigh_vecs), axis=2))
+        length = tf.reduce_sum(used, axis=1)
+        length = tf.maximum(length, tf.constant(1.))
+        length = tf.cast(length, tf.int32)
+
+        with tf.variable_scope(self.name) as scope:
+            try:
+                rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
+                        self.cell, neigh_vecs,
+                        initial_state=initial_state, dtype=tf.float32, time_major=False,
+                        sequence_length=length)
+            except ValueError:
+                scope.reuse_variables()
+                rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
+                        self.cell, neigh_vecs,
+                        initial_state=initial_state, dtype=tf.float32, time_major=False,
+                        sequence_length=length)
+        batch_size = tf.shape(rnn_outputs)[0]
+        max_len = tf.shape(rnn_outputs)[1]
+        out_size = int(rnn_outputs.get_shape()[2])
+        index = tf.range(0, batch_size) * max_len + (length - 1)
+        flat = tf.reshape(rnn_outputs, [-1, out_size])
+        neigh_h = tf.gather(flat, index)
+
+        from_neighs = tf.matmul(neigh_h, self.vars['neigh_weights'])
+        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
+         
+        output = tf.add_n([from_self, from_neighs])
+
+        if not self.concat:
+            output = tf.add_n([from_self, from_neighs])
+        else:
+            output = tf.concat([from_self, from_neighs], axis=1)
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+       
+        return self.act(output)
+
--- a/src/libnrl/graphsage/graphsageAPI.py
+++ b/src/libnrl/graphsage/graphsageAPI.py
@ -0,0 +1,112 @@
+# -*- coding: utf-8 -*-
+
+'''
+#-----------------------------------------------------------------------------
+# author: Chengbin Hou @ SUSTech 2018
+# Email: Chengbin.Hou10@foxmail.com
+# we provide utils to transform the orignal data into graphSAGE format
+# you may easily use these APIs as what we demostrated in main.py of OpenANE
+# the APIs are designed for unsupervised, for supervised way, plz complete 'label' to do codes...
+#-----------------------------------------------------------------------------
+'''
+from networkx.readwrite import json_graph
+import json
+import random
+import networkx as nx
+import numpy as np
+from libnrl.graphsage import unsupervised_train
+
+def add_train_val_test_to_G(graph, test_perc=0.0, val_perc=0.1):  #due to unsupervised, we do not need test data
+    G = graph.G  #take out nx G
+    random.seed(2018)
+    num_nodes = nx.number_of_nodes(G)
+    test_ind = random.sample(range(0, num_nodes), int(num_nodes*test_perc))
+    val_ind = random.sample(range(0, num_nodes), int(num_nodes*val_perc))
+    for ind in range(0, num_nodes):
+        id = graph.look_back_list[ind]
+        if ind in test_ind:
+            G.nodes[id]['test'] = True
+            G.nodes[id]['val'] = False
+        elif ind in val_ind:
+            G.nodes[id]['test'] = False
+            G.nodes[id]['val'] = True
+        else:
+            G.nodes[id]['test'] = False
+            G.nodes[id]['val'] = False
+    
+    ## Make sure the graph has edge train_removed annotations
+    ## (some datasets might already have this..)
+    print("Loaded data.. now preprocessing..")
+    for edge in G.edges():
+        if (G.node[edge[0]]['val'] or G.node[edge[1]]['val'] or
+            G.node[edge[0]]['test'] or G.node[edge[1]]['test']):
+            G[edge[0]][edge[1]]['train_removed'] = True
+        else:
+            G[edge[0]][edge[1]]['train_removed'] = False
+    return G
+
+def run_random_walks(G, num_walks=50, walk_len=5):
+    nodes = [n for n in G.nodes() if not G.node[n]["val"] and not G.node[n]["test"]]
+    G = G.subgraph(nodes)
+    pairs = []
+    for count, node in enumerate(nodes):
+        if G.degree(node) == 0:
+            continue
+        for i in range(num_walks):
+            curr_node = node
+            for j in range(walk_len):
+                if len(list(G.neighbors(curr_node))) == 0:  #isolated nodes! often appeared in real-world
+                    break
+                next_node = random.choice(list(G.neighbors(curr_node)))  #changed due to compatibility
+                #next_node = random.choice(G.neighbors(curr_node))
+                # self co-occurrences are useless
+                if curr_node != node:
+                    pairs.append((node,curr_node))
+                curr_node = next_node
+        if count % 1000 == 0:
+            print("Done walks for", count, "nodes")
+    return pairs
+
+def tranform_data_for_graphsage(graph):
+    G = add_train_val_test_to_G(graph)  #given OpenANE graph --> obtain graphSAGE graph
+    #G_json = json_graph.node_link_data(G)  #train_data[0] in unsupervised_train.py
+
+    id_map = graph.look_up_dict
+    #conversion = lambda n : int(n)  # compatible with networkx >2.0
+    #id_map = {conversion(k):int(v) for k,v in id_map.items()}  # due to graphSAGE requirement 
+
+    feats = np.array([G.nodes[id]['feature'] for id in id_map.keys()])
+    normalize = True  #have decleared in __init__.py
+    if normalize and not feats is None:
+        print("-------------row norm of node attributes/features------------------")
+        from sklearn.preprocessing import StandardScaler
+        train_inds = [id_map[n] for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]
+        train_feats = feats[train_inds]
+        scaler = StandardScaler()
+        scaler.fit(train_feats)
+        feats = scaler.transform(feats)
+    #feats1 = nx.get_node_attributes(G,'test')
+    #feats2 = nx.get_node_attributes(G,'val')
+
+    walks = []
+    walks = run_random_walks(G, num_walks=50, walk_len=5) #use the defualt parameter in graphSAGE
+
+    class_map = 0  #to do... use sklearn to make class into binary form, no need for unsupervised...
+    return G, feats, id_map, walks, class_map
+
+def graphsage_unsupervised_train(graph, graphsage_model = 'graphsage_mean'):
+    train_data = tranform_data_for_graphsage(graph)
+    #from unsupervised_train.py 
+    vectors = unsupervised_train.train(train_data, test_data=None, model = graphsage_model)
+    return vectors
+
+'''
+def save_embeddings(self, filename):
+    fout = open(filename, 'w')
+    node_num = len(self.vectors.keys())
+    fout.write("{} {}\n".format(node_num, self.size))
+    for node, vec in self.vectors.items():
+        fout.write("{} {}\n".format(node,
+                                    ' '.join([str(x) for x in vec])))
+    fout.close()
+'''
--- a/src/libnrl/graphsage/inits.py
+++ b/src/libnrl/graphsage/inits.py
@ -0,0 +1,30 @@
+import tensorflow as tf
+import numpy as np
+
+# DISCLAIMER:
+# Parts of this code file are derived from
+# https://github.com/tkipf/gcn
+# which is under an identical MIT license as GraphSAGE
+
+def uniform(shape, scale=0.05, name=None):
+    """Uniform init."""
+    initial = tf.random_uniform(shape, minval=-scale, maxval=scale, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+
+def glorot(shape, name=None):
+    """Glorot & Bengio (AISTATS 2010) init."""
+    init_range = np.sqrt(6.0/(shape[0]+shape[1]))
+    initial = tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+
+def zeros(shape, name=None):
+    """All zeros."""
+    initial = tf.zeros(shape, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
+
+def ones(shape, name=None):
+    """All ones."""
+    initial = tf.ones(shape, dtype=tf.float32)
+    return tf.Variable(initial, name=name)
--- a/src/libnrl/graphsage/layers.py
+++ b/src/libnrl/graphsage/layers.py
@ -0,0 +1,116 @@
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from libnrl.graphsage.inits import zeros
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+# DISCLAIMER:
+# Boilerplate parts of this code file were originally forked from
+# https://github.com/tkipf/gcn
+# which itself was very inspired by the keras package
+
+# global unique layer ID dictionary for layer name assignment
+_LAYER_UIDS = {}
+
+def get_layer_uid(layer_name=''):
+    """Helper function, assigns unique layer IDs."""
+    if layer_name not in _LAYER_UIDS:
+        _LAYER_UIDS[layer_name] = 1
+        return 1
+    else:
+        _LAYER_UIDS[layer_name] += 1
+        return _LAYER_UIDS[layer_name]
+
+class Layer(object):
+    """Base layer class. Defines basic API for all layer objects.
+    Implementation inspired by keras (http://keras.io).
+    # Properties
+        name: String, defines the variable scope of the layer.
+        logging: Boolean, switches Tensorflow histogram logging on/off
+
+    # Methods
+        _call(inputs): Defines computation graph of layer
+            (i.e. takes input, returns output)
+        __call__(inputs): Wrapper for _call()
+        _log_vars(): Log all variables
+    """
+
+    def __init__(self, **kwargs):
+        allowed_kwargs = {'name', 'logging', 'model_size'}
+        for kwarg in kwargs.keys():
+            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
+        name = kwargs.get('name')
+        if not name:
+            layer = self.__class__.__name__.lower()
+            name = layer + '_' + str(get_layer_uid(layer))
+        self.name = name
+        self.vars = {}
+        logging = kwargs.get('logging', False)
+        self.logging = logging
+        self.sparse_inputs = False
+
+    def _call(self, inputs):
+        return inputs
+
+    def __call__(self, inputs):
+        with tf.name_scope(self.name):
+            if self.logging and not self.sparse_inputs:
+                tf.summary.histogram(self.name + '/inputs', inputs)
+            outputs = self._call(inputs)
+            if self.logging:
+                tf.summary.histogram(self.name + '/outputs', outputs)
+            return outputs
+
+    def _log_vars(self):
+        for var in self.vars:
+            tf.summary.histogram(self.name + '/vars/' + var, self.vars[var])
+
+
+class Dense(Layer):
+    """Dense layer."""
+    def __init__(self, input_dim, output_dim, dropout=0., 
+                 act=tf.nn.relu, placeholders=None, bias=True, featureless=False, 
+                 sparse_inputs=False, **kwargs):
+        super(Dense, self).__init__(**kwargs)
+
+        self.dropout = dropout
+
+        self.act = act
+        self.featureless = featureless
+        self.bias = bias
+        self.input_dim = input_dim
+        self.output_dim = output_dim
+
+        # helper variable for sparse dropout
+        self.sparse_inputs = sparse_inputs
+        if sparse_inputs:
+            self.num_features_nonzero = placeholders['num_features_nonzero']
+
+        with tf.variable_scope(self.name + '_vars'):
+            self.vars['weights'] = tf.get_variable('weights', shape=(input_dim, output_dim),
+                                         dtype=tf.float32, 
+                                         initializer=tf.contrib.layers.xavier_initializer(),
+                                         regularizer=tf.contrib.layers.l2_regularizer(FLAGS.weight_decay))
+            if self.bias:
+                self.vars['bias'] = zeros([output_dim], name='bias')
+
+        if self.logging:
+            self._log_vars()
+
+    def _call(self, inputs):
+        x = inputs
+
+        x = tf.nn.dropout(x, 1-self.dropout)
+
+        # transform
+        output = tf.matmul(x, self.vars['weights'])
+
+        # bias
+        if self.bias:
+            output += self.vars['bias']
+
+        return self.act(output)
--- a/src/libnrl/graphsage/metrics.py
+++ b/src/libnrl/graphsage/metrics.py
@ -0,0 +1,40 @@
+import tensorflow as tf
+
+# DISCLAIMER:
+# Parts of this code file were originally forked from
+# https://github.com/tkipf/gcn
+# which itself was very inspired by the keras package
+def masked_logit_cross_entropy(preds, labels, mask):
+    """Logit cross-entropy loss with masking."""
+    loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=preds, labels=labels)
+    loss = tf.reduce_sum(loss, axis=1)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.maximum(tf.reduce_sum(mask), tf.constant([1.]))
+    loss *= mask
+    return tf.reduce_mean(loss)
+
+def masked_softmax_cross_entropy(preds, labels, mask):
+    """Softmax cross-entropy loss with masking."""
+    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.maximum(tf.reduce_sum(mask), tf.constant([1.]))
+    loss *= mask
+    return tf.reduce_mean(loss)
+
+
+def masked_l2(preds, actuals, mask):
+    """L2 loss with masking."""
+    loss = tf.nn.l2(preds, actuals)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.reduce_mean(mask)
+    loss *= mask
+    return tf.reduce_mean(loss)
+
+def masked_accuracy(preds, labels, mask):
+    """Accuracy with masking."""
+    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(labels, 1))
+    accuracy_all = tf.cast(correct_prediction, tf.float32)
+    mask = tf.cast(mask, dtype=tf.float32)
+    mask /= tf.reduce_mean(mask)
+    accuracy_all *= mask
+    return tf.reduce_mean(accuracy_all)
--- a/src/libnrl/graphsage/minibatch.py
+++ b/src/libnrl/graphsage/minibatch.py
@ -0,0 +1,320 @@
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+np.random.seed(123)
+
+class EdgeMinibatchIterator(object):
+    
+    """ This minibatch iterator iterates over batches of sampled edges or
+    random pairs of co-occuring edges.
+
+    G -- networkx graph
+    id2idx -- dict mapping node ids to index in feature tensor
+    placeholders -- tensorflow placeholders object
+    context_pairs -- if not none, then a list of co-occuring node pairs (from random walks)
+    batch_size -- size of the minibatches
+    max_degree -- maximum size of the downsampled adjacency lists
+    n2v_retrain -- signals that the iterator is being used to add new embeddings to a n2v model
+    fixed_n2v -- signals that the iterator is being used to retrain n2v with only existing nodes as context
+    """
+    def __init__(self, G, id2idx, 
+            placeholders, context_pairs=None, batch_size=100, max_degree=25,
+            n2v_retrain=False, fixed_n2v=False,
+            **kwargs):
+
+        self.G = G
+        self.nodes = G.nodes()
+        self.id2idx = id2idx
+        self.placeholders = placeholders
+        self.batch_size = batch_size
+        self.max_degree = max_degree
+        self.batch_num = 0
+
+        self.nodes = np.random.permutation(G.nodes())
+        self.adj, self.deg = self.construct_adj()
+        self.test_adj = self.construct_test_adj()
+        if context_pairs is None:
+            edges = G.edges()
+        else:
+            edges = context_pairs
+        self.train_edges = self.edges = np.random.permutation(edges)
+        if not n2v_retrain:
+            self.train_edges = self._remove_isolated(self.train_edges)
+            self.val_edges = [e for e in G.edges() if G[e[0]][e[1]]['train_removed']]
+        else:
+            if fixed_n2v:
+                self.train_edges = self.val_edges = self._n2v_prune(self.edges)
+            else:
+                self.train_edges = self.val_edges = self.edges
+
+        print(len([n for n in G.nodes() if not G.node[n]['test'] and not G.node[n]['val']]), 'train nodes')
+        print(len([n for n in G.nodes() if G.node[n]['test'] or G.node[n]['val']]), 'test nodes')
+        self.val_set_size = len(self.val_edges)
+
+    def _n2v_prune(self, edges):
+        is_val = lambda n : self.G.node[n]["val"] or self.G.node[n]["test"]
+        return [e for e in edges if not is_val(e[1])]
+
+    def _remove_isolated(self, edge_list):
+        new_edge_list = []
+        missing = 0
+        for n1, n2 in edge_list:
+            if not n1 in self.G.node or not n2 in self.G.node:
+                missing += 1
+                continue
+            if (self.deg[self.id2idx[n1]] == 0 or self.deg[self.id2idx[n2]] == 0) \
+                    and (not self.G.node[n1]['test'] or self.G.node[n1]['val']) \
+                    and (not self.G.node[n2]['test'] or self.G.node[n2]['val']):
+                continue
+            else:
+                new_edge_list.append((n1,n2))
+        print("Unexpected missing:", missing)
+        return new_edge_list
+
+    def construct_adj(self):
+        adj = len(self.id2idx)*np.ones((len(self.id2idx)+1, self.max_degree))
+        deg = np.zeros((len(self.id2idx),))
+
+        for nodeid in self.G.nodes():
+            if self.G.node[nodeid]['test'] or self.G.node[nodeid]['val']:
+                continue
+            neighbors = np.array([self.id2idx[neighbor] 
+                for neighbor in self.G.neighbors(nodeid)
+                if (not self.G[nodeid][neighbor]['train_removed'])])
+            deg[self.id2idx[nodeid]] = len(neighbors)
+            if len(neighbors) == 0:
+                continue
+            if len(neighbors) > self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
+            elif len(neighbors) < self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
+            adj[self.id2idx[nodeid], :] = neighbors
+        return adj, deg
+
+    def construct_test_adj(self):
+        adj = len(self.id2idx)*np.ones((len(self.id2idx)+1, self.max_degree))
+        for nodeid in self.G.nodes():
+            neighbors = np.array([self.id2idx[neighbor] 
+                for neighbor in self.G.neighbors(nodeid)])
+            if len(neighbors) == 0:
+                continue
+            if len(neighbors) > self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
+            elif len(neighbors) < self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
+            adj[self.id2idx[nodeid], :] = neighbors
+        return adj
+
+    def end(self):
+        return self.batch_num * self.batch_size >= len(self.train_edges)
+
+    def batch_feed_dict(self, batch_edges):
+        batch1 = []
+        batch2 = []
+        for node1, node2 in batch_edges:
+            batch1.append(self.id2idx[node1])
+            batch2.append(self.id2idx[node2])
+
+        feed_dict = dict()
+        feed_dict.update({self.placeholders['batch_size'] : len(batch_edges)})
+        feed_dict.update({self.placeholders['batch1']: batch1})
+        feed_dict.update({self.placeholders['batch2']: batch2})
+
+        return feed_dict
+
+    def next_minibatch_feed_dict(self):
+        start_idx = self.batch_num * self.batch_size
+        self.batch_num += 1
+        end_idx = min(start_idx + self.batch_size, len(self.train_edges))
+        batch_edges = self.train_edges[start_idx : end_idx]
+        return self.batch_feed_dict(batch_edges)
+
+    def num_training_batches(self):
+        return len(self.train_edges) // self.batch_size + 1
+
+    def val_feed_dict(self, size=None):
+        edge_list = self.val_edges
+        if size is None:
+            return self.batch_feed_dict(edge_list)
+        else:
+            ind = np.random.permutation(len(edge_list))
+            val_edges = [edge_list[i] for i in ind[:min(size, len(ind))]]
+            return self.batch_feed_dict(val_edges)
+
+    def incremental_val_feed_dict(self, size, iter_num):
+        edge_list = self.val_edges
+        val_edges = edge_list[iter_num*size:min((iter_num+1)*size, 
+            len(edge_list))]
+        return self.batch_feed_dict(val_edges), (iter_num+1)*size >= len(self.val_edges), val_edges
+
+    def incremental_embed_feed_dict(self, size, iter_num):
+        node_list = self.nodes
+        val_nodes = node_list[iter_num*size:min((iter_num+1)*size, 
+            len(node_list))]
+        val_edges = [(n,n) for n in val_nodes]
+        return self.batch_feed_dict(val_edges), (iter_num+1)*size >= len(node_list), val_edges
+
+    def label_val(self):
+        train_edges = []
+        val_edges = []
+        for n1, n2 in self.G.edges():
+            if (self.G.node[n1]['val'] or self.G.node[n1]['test'] 
+                    or self.G.node[n2]['val'] or self.G.node[n2]['test']):
+                val_edges.append((n1,n2))
+            else:
+                train_edges.append((n1,n2))
+        return train_edges, val_edges
+
+    def shuffle(self):
+        """ Re-shuffle the training set.
+            Also reset the batch number.
+        """
+        self.train_edges = np.random.permutation(self.train_edges)
+        self.nodes = np.random.permutation(self.nodes)
+        self.batch_num = 0
+
+class NodeMinibatchIterator(object):
+    
+    """ 
+    This minibatch iterator iterates over nodes for supervised learning.
+
+    G -- networkx graph
+    id2idx -- dict mapping node ids to integer values indexing feature tensor
+    placeholders -- standard tensorflow placeholders object for feeding
+    label_map -- map from node ids to class values (integer or list)
+    num_classes -- number of output classes
+    batch_size -- size of the minibatches
+    max_degree -- maximum size of the downsampled adjacency lists
+    """
+    def __init__(self, G, id2idx, 
+            placeholders, label_map, num_classes, 
+            batch_size=100, max_degree=25,
+            **kwargs):
+
+        self.G = G
+        self.nodes = G.nodes()
+        self.id2idx = id2idx
+        self.placeholders = placeholders
+        self.batch_size = batch_size
+        self.max_degree = max_degree
+        self.batch_num = 0
+        self.label_map = label_map
+        self.num_classes = num_classes
+
+        self.adj, self.deg = self.construct_adj()
+        self.test_adj = self.construct_test_adj()
+
+        self.val_nodes = [n for n in self.G.nodes() if self.G.node[n]['val']]
+        self.test_nodes = [n for n in self.G.nodes() if self.G.node[n]['test']]
+
+        self.no_train_nodes_set = set(self.val_nodes + self.test_nodes)
+        self.train_nodes = set(G.nodes()).difference(self.no_train_nodes_set)
+        # don't train on nodes that only have edges to test set
+        self.train_nodes = [n for n in self.train_nodes if self.deg[id2idx[n]] > 0]
+
+    def _make_label_vec(self, node):
+        label = self.label_map[node]
+        if isinstance(label, list):
+            label_vec = np.array(label)
+        else:
+            label_vec = np.zeros((self.num_classes))
+            class_ind = self.label_map[node]
+            label_vec[class_ind] = 1
+        return label_vec
+
+    def construct_adj(self):
+        adj = len(self.id2idx)*np.ones((len(self.id2idx)+1, self.max_degree))
+        deg = np.zeros((len(self.id2idx),))
+
+        for nodeid in self.G.nodes():
+            if self.G.node[nodeid]['test'] or self.G.node[nodeid]['val']:
+                continue
+            neighbors = np.array([self.id2idx[neighbor] 
+                for neighbor in self.G.neighbors(nodeid)
+                if (not self.G[nodeid][neighbor]['train_removed'])])
+            deg[self.id2idx[nodeid]] = len(neighbors)
+            if len(neighbors) == 0:
+                continue
+            if len(neighbors) > self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
+            elif len(neighbors) < self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
+            adj[self.id2idx[nodeid], :] = neighbors
+        return adj, deg
+
+    def construct_test_adj(self):
+        adj = len(self.id2idx)*np.ones((len(self.id2idx)+1, self.max_degree))
+        for nodeid in self.G.nodes():
+            neighbors = np.array([self.id2idx[neighbor] 
+                for neighbor in self.G.neighbors(nodeid)])
+            if len(neighbors) == 0:
+                continue
+            if len(neighbors) > self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=False)
+            elif len(neighbors) < self.max_degree:
+                neighbors = np.random.choice(neighbors, self.max_degree, replace=True)
+            adj[self.id2idx[nodeid], :] = neighbors
+        return adj
+
+    def end(self):
+        return self.batch_num * self.batch_size >= len(self.train_nodes)
+
+    def batch_feed_dict(self, batch_nodes, val=False):
+        batch1id = batch_nodes
+        batch1 = [self.id2idx[n] for n in batch1id]
+              
+        labels = np.vstack([self._make_label_vec(node) for node in batch1id])
+        feed_dict = dict()
+        feed_dict.update({self.placeholders['batch_size'] : len(batch1)})
+        feed_dict.update({self.placeholders['batch']: batch1})
+        feed_dict.update({self.placeholders['labels']: labels})
+
+        return feed_dict, labels
+
+    def node_val_feed_dict(self, size=None, test=False):
+        if test:
+            val_nodes = self.test_nodes
+        else:
+            val_nodes = self.val_nodes
+        if not size is None:
+            val_nodes = np.random.choice(val_nodes, size, replace=True)
+        # add a dummy neighbor
+        ret_val = self.batch_feed_dict(val_nodes)
+        return ret_val[0], ret_val[1]
+
+    def incremental_node_val_feed_dict(self, size, iter_num, test=False):
+        if test:
+            val_nodes = self.test_nodes
+        else:
+            val_nodes = self.val_nodes
+        val_node_subset = val_nodes[iter_num*size:min((iter_num+1)*size, 
+            len(val_nodes))]
+
+        # add a dummy neighbor
+        ret_val = self.batch_feed_dict(val_node_subset)
+        return ret_val[0], ret_val[1], (iter_num+1)*size >= len(val_nodes), val_node_subset
+
+    def num_training_batches(self):
+        return len(self.train_nodes) // self.batch_size + 1
+
+    def next_minibatch_feed_dict(self):
+        start_idx = self.batch_num * self.batch_size
+        self.batch_num += 1
+        end_idx = min(start_idx + self.batch_size, len(self.train_nodes))
+        batch_nodes = self.train_nodes[start_idx : end_idx]
+        return self.batch_feed_dict(batch_nodes)
+
+    def incremental_embed_feed_dict(self, size, iter_num):
+        node_list = self.nodes
+        val_nodes = node_list[iter_num*size:min((iter_num+1)*size, 
+            len(node_list))]
+        return self.batch_feed_dict(val_nodes), (iter_num+1)*size >= len(node_list), val_nodes
+
+    def shuffle(self):
+        """ Re-shuffle the training set.
+            Also reset the batch number.
+        """
+        self.train_nodes = np.random.permutation(self.train_nodes)
+        self.batch_num = 0
--- a/src/libnrl/graphsage/models.py
+++ b/src/libnrl/graphsage/models.py
@ -0,0 +1,504 @@
+from collections import namedtuple
+
+import tensorflow as tf
+import math
+
+import libnrl.graphsage.layers as layers
+import libnrl.graphsage.metrics as metrics
+
+from libnrl.graphsage.prediction import BipartiteEdgePredLayer
+from libnrl.graphsage.aggregators import MeanAggregator, MaxPoolingAggregator, MeanPoolingAggregator, SeqAggregator, GCNAggregator
+from libnrl.graphsage.__init__ import *  #import default parameters
+
+'''
+flags = tf.app.flags
+FLAGS = FLAGS
+'''
+
+# DISCLAIMER:
+# Boilerplate parts of this code file were originally forked from
+# https://github.com/tkipf/gcn
+# which itself was very inspired by the keras package
+
+class Model(object):
+    def __init__(self, **kwargs):
+        allowed_kwargs = {'name', 'logging', 'model_size'}
+        for kwarg in kwargs.keys():
+            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg
+        name = kwargs.get('name')
+        if not name:
+            name = self.__class__.__name__.lower()
+        self.name = name
+
+        logging = kwargs.get('logging', False)
+        self.logging = logging
+
+        self.vars = {}
+        self.placeholders = {}
+
+        self.layers = []
+        self.activations = []
+
+        self.inputs = None
+        self.outputs = None
+
+        self.loss = 0
+        self.accuracy = 0
+        self.optimizer = None
+        self.opt_op = None
+
+    def _build(self):
+        raise NotImplementedError
+
+    def build(self):
+        """ Wrapper for _build() """
+        with tf.variable_scope(self.name):
+            self._build()
+
+        # Build sequential layer model
+        self.activations.append(self.inputs)
+        for layer in self.layers:
+            hidden = layer(self.activations[-1])
+            self.activations.append(hidden)
+        self.outputs = self.activations[-1]
+
+        # Store model variables for easy access
+        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
+        self.vars = {var.name: var for var in variables}
+
+        # Build metrics
+        self._loss()
+        self._accuracy()
+
+        self.opt_op = self.optimizer.minimize(self.loss)
+
+    def predict(self):
+        pass
+
+    def _loss(self):
+        raise NotImplementedError
+
+    def _accuracy(self):
+        raise NotImplementedError
+
+    def save(self, sess=None):
+        if not sess:
+            raise AttributeError("TensorFlow session not provided.")
+        saver = tf.train.Saver(self.vars)
+        save_path = saver.save(sess, "tmp/%s.ckpt" % self.name)
+        print("Model saved in file: %s" % save_path)
+
+    def load(self, sess=None):
+        if not sess:
+            raise AttributeError("TensorFlow session not provided.")
+        saver = tf.train.Saver(self.vars)
+        save_path = "tmp/%s.ckpt" % self.name
+        saver.restore(sess, save_path)
+        print("Model restored from file: %s" % save_path)
+
+
+class MLP(Model):
+    """ A standard multi-layer perceptron """
+    def __init__(self, placeholders, dims, categorical=True, **kwargs):
+        super(MLP, self).__init__(**kwargs)
+
+        self.dims = dims
+        self.input_dim = dims[0]
+        self.output_dim = dims[-1]
+        self.placeholders = placeholders
+        self.categorical = categorical
+
+        self.inputs = placeholders['features']
+        self.labels = placeholders['labels']
+
+        self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
+
+        self.build()
+
+    def _loss(self):
+        # Weight decay loss
+        for var in self.layers[0].vars.values():
+            self.loss += weight_decay * tf.nn.l2_loss(var)
+
+        # Cross entropy error
+        if self.categorical:
+            self.loss += metrics.masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],
+                    self.placeholders['labels_mask'])
+        # L2
+        else:
+            diff = self.labels - self.outputs
+            self.loss += tf.reduce_sum(tf.sqrt(tf.reduce_sum(diff * diff, axis=1)))
+
+    def _accuracy(self):
+        if self.categorical:
+            self.accuracy = metrics.masked_accuracy(self.outputs, self.placeholders['labels'],
+                    self.placeholders['labels_mask'])
+
+    def _build(self):
+        self.layers.append(layers.Dense(input_dim=self.input_dim,
+                                 output_dim=self.dims[1],
+                                 act=tf.nn.relu,
+                                 dropout=self.placeholders['dropout'],
+                                 sparse_inputs=False,
+                                 logging=self.logging))
+
+        self.layers.append(layers.Dense(input_dim=self.dims[1],
+                                 output_dim=self.output_dim,
+                                 act=lambda x: x,
+                                 dropout=self.placeholders['dropout'],
+                                 logging=self.logging))
+
+    def predict(self):
+        return tf.nn.softmax(self.outputs)
+
+class GeneralizedModel(Model):
+    """
+    Base class for models that aren't constructed from traditional, sequential layers.
+    Subclasses must set self.outputs in _build method
+
+    (Removes the layers idiom from build method of the Model class)
+    """
+
+    def __init__(self, **kwargs):
+        super(GeneralizedModel, self).__init__(**kwargs)
+        
+
+    def build(self):
+        """ Wrapper for _build() """
+        with tf.variable_scope(self.name):
+            self._build()
+
+        # Store model variables for easy access
+        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
+        self.vars = {var.name: var for var in variables}
+
+        # Build metrics
+        self._loss()
+        self._accuracy()
+
+        self.opt_op = self.optimizer.minimize(self.loss)
+
+# SAGEInfo is a namedtuple that specifies the parameters 
+# of the recursive GraphSAGE layers
+SAGEInfo = namedtuple("SAGEInfo",
+    ['layer_name', # name of the layer (to get feature embedding etc.)
+     'neigh_sampler', # callable neigh_sampler constructor
+     'num_samples',
+     'output_dim' # the output (i.e., hidden) dimension
+    ])
+
+class SampleAndAggregate(GeneralizedModel):
+    """
+    Base implementation of unsupervised GraphSAGE
+    """
+
+    def __init__(self, placeholders, features, adj, degrees,
+            layer_infos, concat=True, aggregator_type="mean", 
+            model_size="small", identity_dim=0,
+            **kwargs):
+        '''
+        Args:
+            - placeholders: Stanford TensorFlow placeholder object.
+            - features: Numpy array with node features. 
+                        NOTE: Pass a None object to train in featureless mode (identity features for nodes)!
+            - adj: Numpy array with adjacency lists (padded with random re-samples)
+            - degrees: Numpy array with node degrees. 
+            - layer_infos: List of SAGEInfo namedtuples that describe the parameters of all 
+                   the recursive layers. See SAGEInfo definition above.
+            - concat: whether to concatenate during recursive iterations
+            - aggregator_type: how to aggregate neighbor information
+            - model_size: one of "small" and "big"
+            - identity_dim: Set to positive int to use identity features (slow and cannot generalize, but better accuracy)
+        '''
+        super(SampleAndAggregate, self).__init__(**kwargs)
+        if aggregator_type == "mean":
+            self.aggregator_cls = MeanAggregator
+        elif aggregator_type == "seq":
+            self.aggregator_cls = SeqAggregator
+        elif aggregator_type == "maxpool":
+            self.aggregator_cls = MaxPoolingAggregator
+        elif aggregator_type == "meanpool":
+            self.aggregator_cls = MeanPoolingAggregator
+        elif aggregator_type == "gcn":
+            self.aggregator_cls = GCNAggregator
+        else:
+            raise Exception("Unknown aggregator: ", self.aggregator_cls)
+
+        # get info from placeholders...
+        self.inputs1 = placeholders["batch1"]
+        self.inputs2 = placeholders["batch2"]
+        self.model_size = model_size
+        self.adj_info = adj
+        if identity_dim > 0:
+           self.embeds = tf.get_variable("node_embeddings", [adj.get_shape().as_list()[0], identity_dim])
+        else:
+           self.embeds = None
+        if features is None: 
+            if identity_dim == 0:
+                raise Exception("Must have a positive value for identity feature dimension if no input features given.")
+            self.features = self.embeds
+        else:
+            self.features = tf.Variable(tf.constant(features, dtype=tf.float32), trainable=False)
+            if not self.embeds is None:
+                self.features = tf.concat([self.embeds, self.features], axis=1)
+        self.degrees = degrees
+        self.concat = concat
+
+        self.dims = [(0 if features is None else features.shape[1]) + identity_dim]
+        self.dims.extend([layer_infos[i].output_dim for i in range(len(layer_infos))])
+        self.batch_size = placeholders["batch_size"]
+        self.placeholders = placeholders
+        self.layer_infos = layer_infos
+
+        self.optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
+
+        self.build()
+
+    def sample(self, inputs, layer_infos, batch_size=None):
+        """ Sample neighbors to be the supportive fields for multi-layer convolutions.
+
+        Args:
+            inputs: batch inputs
+            batch_size: the number of inputs (different for batch inputs and negative samples).
+        """
+        
+        if batch_size is None:
+            batch_size = self.batch_size
+        samples = [inputs]
+        # size of convolution support at each layer per node
+        support_size = 1
+        support_sizes = [support_size]
+        for k in range(len(layer_infos)):
+            t = len(layer_infos) - k - 1
+            support_size *= layer_infos[t].num_samples
+            sampler = layer_infos[t].neigh_sampler
+            node = sampler((samples[k], layer_infos[t].num_samples))
+            samples.append(tf.reshape(node, [support_size * batch_size,]))
+            support_sizes.append(support_size)
+        return samples, support_sizes
+
+
+    def aggregate(self, samples, input_features, dims, num_samples, support_sizes, batch_size=None,
+            aggregators=None, name=None, concat=False, model_size="small"):
+        """ At each layer, aggregate hidden representations of neighbors to compute the hidden representations 
+            at next layer.
+        Args:
+            samples: a list of samples of variable hops away for convolving at each layer of the
+                network. Length is the number of layers + 1. Each is a vector of node indices.
+            input_features: the input features for each sample of various hops away.
+            dims: a list of dimensions of the hidden representations from the input layer to the
+                final layer. Length is the number of layers + 1.
+            num_samples: list of number of samples for each layer.
+            support_sizes: the number of nodes to gather information from for each layer.
+            batch_size: the number of inputs (different for batch inputs and negative samples).
+        Returns:
+            The hidden representation at the final layer for all nodes in batch
+        """
+
+        if batch_size is None:
+            batch_size = self.batch_size
+
+        # length: number of layers + 1
+        hidden = [tf.nn.embedding_lookup(input_features, node_samples) for node_samples in samples]
+        new_agg = aggregators is None
+        if new_agg:
+            aggregators = []
+        for layer in range(len(num_samples)):
+            if new_agg:
+                dim_mult = 2 if concat and (layer != 0) else 1
+                # aggregator at current layer
+                if layer == len(num_samples) - 1:
+                    aggregator = self.aggregator_cls(dim_mult*dims[layer], dims[layer+1], act=lambda x : x,
+                            dropout=self.placeholders['dropout'], 
+                            name=name, concat=concat, model_size=model_size)
+                else:
+                    aggregator = self.aggregator_cls(dim_mult*dims[layer], dims[layer+1],
+                            dropout=self.placeholders['dropout'], 
+                            name=name, concat=concat, model_size=model_size)
+                aggregators.append(aggregator)
+            else:
+                aggregator = aggregators[layer]
+            # hidden representation at current layer for all support nodes that are various hops away
+            next_hidden = []
+            # as layer increases, the number of support nodes needed decreases
+            for hop in range(len(num_samples) - layer):
+                dim_mult = 2 if concat and (layer != 0) else 1
+                neigh_dims = [batch_size * support_sizes[hop], 
+                              num_samples[len(num_samples) - hop - 1], 
+                              dim_mult*dims[layer]]
+                h = aggregator((hidden[hop],
+                                tf.reshape(hidden[hop + 1], neigh_dims)))
+                next_hidden.append(h)
+            hidden = next_hidden
+        return hidden[0], aggregators
+
+    def _build(self):
+        labels = tf.reshape(
+                tf.cast(self.placeholders['batch2'], dtype=tf.int64),
+                [self.batch_size, 1])
+        self.neg_samples, _, _ = (tf.nn.fixed_unigram_candidate_sampler(
+            true_classes=labels,
+            num_true=1,
+            num_sampled=neg_sample_size,
+            unique=False,
+            range_max=len(self.degrees),
+            distortion=0.75,
+            unigrams=self.degrees.tolist()))
+
+           
+        # perform "convolution"
+        samples1, support_sizes1 = self.sample(self.inputs1, self.layer_infos)
+        samples2, support_sizes2 = self.sample(self.inputs2, self.layer_infos)
+        num_samples = [layer_info.num_samples for layer_info in self.layer_infos]
+        self.outputs1, self.aggregators = self.aggregate(samples1, [self.features], self.dims, num_samples,
+                support_sizes1, concat=self.concat, model_size=self.model_size)
+        self.outputs2, _ = self.aggregate(samples2, [self.features], self.dims, num_samples,
+                support_sizes2, aggregators=self.aggregators, concat=self.concat,
+                model_size=self.model_size)
+
+        neg_samples, neg_support_sizes = self.sample(self.neg_samples, self.layer_infos,
+            neg_sample_size)
+        self.neg_outputs, _ = self.aggregate(neg_samples, [self.features], self.dims, num_samples,
+                neg_support_sizes, batch_size=neg_sample_size, aggregators=self.aggregators,
+                concat=self.concat, model_size=self.model_size)
+
+        dim_mult = 2 if self.concat else 1
+        self.link_pred_layer = BipartiteEdgePredLayer(dim_mult*self.dims[-1],
+                dim_mult*self.dims[-1], self.placeholders, act=tf.nn.sigmoid, 
+                bilinear_weights=False,
+                name='edge_predict')
+
+        self.outputs1 = tf.nn.l2_normalize(self.outputs1, 1)
+        self.outputs2 = tf.nn.l2_normalize(self.outputs2, 1)
+        self.neg_outputs = tf.nn.l2_normalize(self.neg_outputs, 1)
+
+    def build(self):
+        self._build()
+
+        # TF graph management
+        self._loss()
+        self._accuracy()
+        self.loss = self.loss / tf.cast(self.batch_size, tf.float32)
+        grads_and_vars = self.optimizer.compute_gradients(self.loss)
+        clipped_grads_and_vars = [(tf.clip_by_value(grad, -5.0, 5.0) if grad is not None else None, var) 
+                for grad, var in grads_and_vars]
+        self.grad, _ = clipped_grads_and_vars[0]
+        self.opt_op = self.optimizer.apply_gradients(clipped_grads_and_vars)
+
+    def _loss(self):
+        for aggregator in self.aggregators:
+            for var in aggregator.vars.values():
+                self.loss += weight_decay * tf.nn.l2_loss(var)
+
+        self.loss += self.link_pred_layer.loss(self.outputs1, self.outputs2, self.neg_outputs) 
+        tf.summary.scalar('loss', self.loss)
+
+    def _accuracy(self):
+        # shape: [batch_size]
+        aff = self.link_pred_layer.affinity(self.outputs1, self.outputs2)
+        # shape : [batch_size x num_neg_samples]
+        self.neg_aff = self.link_pred_layer.neg_cost(self.outputs1, self.neg_outputs)
+        self.neg_aff = tf.reshape(self.neg_aff, [self.batch_size, neg_sample_size])
+        _aff = tf.expand_dims(aff, axis=1)
+        self.aff_all = tf.concat(axis=1, values=[self.neg_aff, _aff])
+        size = tf.shape(self.aff_all)[1]
+        _, indices_of_ranks = tf.nn.top_k(self.aff_all, k=size)
+        _, self.ranks = tf.nn.top_k(-indices_of_ranks, k=size)
+        self.mrr = tf.reduce_mean(tf.div(1.0, tf.cast(self.ranks[:, -1] + 1, tf.float32)))
+        tf.summary.scalar('mrr', self.mrr)
+
+
+class Node2VecModel(GeneralizedModel):
+    def __init__(self, placeholders, dict_size, degrees, name=None,
+                 nodevec_dim=50, lr=0.001, **kwargs):
+        """ Simple version of Node2Vec/DeepWalk algorithm.
+
+        Args:
+            dict_size: the total number of nodes.
+            degrees: numpy array of node degrees, ordered as in the data's id_map
+            nodevec_dim: dimension of the vector representation of node.
+            lr: learning rate of optimizer.
+        """
+
+        super(Node2VecModel, self).__init__(**kwargs)
+
+        self.placeholders = placeholders
+        self.degrees = degrees
+        self.inputs1 = placeholders["batch1"]
+        self.inputs2 = placeholders["batch2"]
+
+        self.batch_size = placeholders['batch_size']
+        self.hidden_dim = nodevec_dim
+
+        # following the tensorflow word2vec tutorial
+        self.target_embeds = tf.Variable(
+                tf.random_uniform([dict_size, nodevec_dim], -1, 1),
+                name="target_embeds")
+        self.context_embeds = tf.Variable(
+                tf.truncated_normal([dict_size, nodevec_dim],
+                stddev=1.0 / math.sqrt(nodevec_dim)),
+                name="context_embeds")
+        self.context_bias = tf.Variable(
+                tf.zeros([dict_size]),
+                name="context_bias")
+
+        self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr)
+
+        self.build()
+
+    def _build(self):
+        labels = tf.reshape(
+                tf.cast(self.placeholders['batch2'], dtype=tf.int64),
+                [self.batch_size, 1])
+        self.neg_samples, _, _ = (tf.nn.fixed_unigram_candidate_sampler(
+            true_classes=labels,
+            num_true=1,
+            num_sampled=neg_sample_size,
+            unique=True,
+            range_max=len(self.degrees),
+            distortion=0.75,
+            unigrams=self.degrees.tolist()))
+
+        self.outputs1 = tf.nn.embedding_lookup(self.target_embeds, self.inputs1)
+        self.outputs2 = tf.nn.embedding_lookup(self.context_embeds, self.inputs2)
+        self.outputs2_bias = tf.nn.embedding_lookup(self.context_bias, self.inputs2)
+        self.neg_outputs = tf.nn.embedding_lookup(self.context_embeds, self.neg_samples)
+        self.neg_outputs_bias = tf.nn.embedding_lookup(self.context_bias, self.neg_samples)
+
+        self.link_pred_layer = BipartiteEdgePredLayer(self.hidden_dim, self.hidden_dim,
+                self.placeholders, bilinear_weights=False)
+
+    def build(self):
+        self._build()
+        # TF graph management
+        self._loss()
+        self._minimize()
+        self._accuracy()
+
+    def _minimize(self):
+        self.opt_op = self.optimizer.minimize(self.loss)
+
+    def _loss(self):
+        aff = tf.reduce_sum(tf.multiply(self.outputs1, self.outputs2), 1) + self.outputs2_bias
+        neg_aff = tf.matmul(self.outputs1, tf.transpose(self.neg_outputs)) + self.neg_outputs_bias
+        true_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+                labels=tf.ones_like(aff), logits=aff)
+        negative_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+                labels=tf.zeros_like(neg_aff), logits=neg_aff)
+        loss = tf.reduce_sum(true_xent) + tf.reduce_sum(negative_xent)
+        self.loss = loss / tf.cast(self.batch_size, tf.float32)
+        tf.summary.scalar('loss', self.loss)
+        
+    def _accuracy(self):
+        # shape: [batch_size]
+        aff = self.link_pred_layer.affinity(self.outputs1, self.outputs2)
+       # shape : [batch_size x num_neg_samples]
+        self.neg_aff = self.link_pred_layer.neg_cost(self.outputs1, self.neg_outputs)
+        self.neg_aff = tf.reshape(self.neg_aff, [self.batch_size, neg_sample_size])
+        _aff = tf.expand_dims(aff, axis=1)
+        self.aff_all = tf.concat(axis=1, values=[self.neg_aff, _aff])
+        size = tf.shape(self.aff_all)[1]
+        _, indices_of_ranks = tf.nn.top_k(self.aff_all, k=size)
+        _, self.ranks = tf.nn.top_k(-indices_of_ranks, k=size)
+        self.mrr = tf.reduce_mean(tf.div(1.0, tf.cast(self.ranks[:, -1] + 1, tf.float32)))
+        tf.summary.scalar('mrr', self.mrr)
--- a/src/libnrl/graphsage/neigh_samplers.py
+++ b/src/libnrl/graphsage/neigh_samplers.py
@ -0,0 +1,29 @@
+from __future__ import division
+from __future__ import print_function
+
+from libnrl.graphsage.layers import Layer
+
+import tensorflow as tf
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+
+"""
+Classes that are used to sample node neighborhoods
+"""
+
+class UniformNeighborSampler(Layer):
+    """
+    Uniformly samples neighbors.
+    Assumes that adj lists are padded with random re-sampling
+    """
+    def __init__(self, adj_info, **kwargs):
+        super(UniformNeighborSampler, self).__init__(**kwargs)
+        self.adj_info = adj_info
+
+    def _call(self, inputs):
+        ids, num_samples = inputs
+        adj_lists = tf.nn.embedding_lookup(self.adj_info, ids) 
+        adj_lists = tf.transpose(tf.random_shuffle(tf.transpose(adj_lists)))
+        adj_lists = tf.slice(adj_lists, [0,0], [-1, num_samples])
+        return adj_lists
--- a/src/libnrl/graphsage/prediction.py
+++ b/src/libnrl/graphsage/prediction.py
@ -0,0 +1,128 @@
+from __future__ import division
+from __future__ import print_function
+
+from libnrl.graphsage.inits import zeros
+from libnrl.graphsage.layers import Layer
+import tensorflow as tf
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+
+class BipartiteEdgePredLayer(Layer):
+    def __init__(self, input_dim1, input_dim2, placeholders, dropout=False, act=tf.nn.sigmoid,
+            loss_fn='xent', neg_sample_weights=1.0,
+            bias=False, bilinear_weights=False, **kwargs):
+        """
+        Basic class that applies skip-gram-like loss
+        (i.e., dot product of node+target and node and negative samples)
+        Args:
+            bilinear_weights: use a bilinear weight for affinity calculation: u^T A v. If set to
+                false, it is assumed that input dimensions are the same and the affinity will be 
+                based on dot product.
+        """
+        super(BipartiteEdgePredLayer, self).__init__(**kwargs)
+        self.input_dim1 = input_dim1
+        self.input_dim2 = input_dim2
+        self.act = act
+        self.bias = bias
+        self.eps = 1e-7
+
+        # Margin for hinge loss
+        self.margin = 0.1
+        self.neg_sample_weights = neg_sample_weights
+
+        self.bilinear_weights = bilinear_weights
+
+        if dropout:
+            self.dropout = placeholders['dropout']
+        else:
+            self.dropout = 0.
+
+        # output a likelihood term
+        self.output_dim = 1
+        with tf.variable_scope(self.name + '_vars'):
+            # bilinear form
+            if bilinear_weights:
+                #self.vars['weights'] = glorot([input_dim1, input_dim2],
+                #                              name='pred_weights')
+                self.vars['weights'] = tf.get_variable(
+                        'pred_weights', 
+                        shape=(input_dim1, input_dim2),
+                        dtype=tf.float32, 
+                        initializer=tf.contrib.layers.xavier_initializer())
+
+            if self.bias:
+                self.vars['bias'] = zeros([self.output_dim], name='bias')
+
+        if loss_fn == 'xent':
+            self.loss_fn = self._xent_loss
+        elif loss_fn == 'skipgram':
+            self.loss_fn = self._skipgram_loss
+        elif loss_fn == 'hinge':
+            self.loss_fn = self._hinge_loss
+
+        if self.logging:
+            self._log_vars()
+
+    def affinity(self, inputs1, inputs2):
+        """ Affinity score between batch of inputs1 and inputs2.
+        Args:
+            inputs1: tensor of shape [batch_size x feature_size].
+        """
+        # shape: [batch_size, input_dim1]
+        if self.bilinear_weights:
+            prod = tf.matmul(inputs2, tf.transpose(self.vars['weights']))
+            self.prod = prod
+            result = tf.reduce_sum(inputs1 * prod, axis=1)
+        else:
+            result = tf.reduce_sum(inputs1 * inputs2, axis=1)
+        return result
+
+    def neg_cost(self, inputs1, neg_samples, hard_neg_samples=None):
+        """ For each input in batch, compute the sum of its affinity to negative samples.
+
+        Returns:
+            Tensor of shape [batch_size x num_neg_samples]. For each node, a list of affinities to
+                negative samples is computed.
+        """
+        if self.bilinear_weights:
+            inputs1 = tf.matmul(inputs1, self.vars['weights'])
+        neg_aff = tf.matmul(inputs1, tf.transpose(neg_samples))
+        return neg_aff
+
+    def loss(self, inputs1, inputs2, neg_samples):
+        """ negative sampling loss.
+        Args:
+            neg_samples: tensor of shape [num_neg_samples x input_dim2]. Negative samples for all
+            inputs in batch inputs1.
+        """
+        return self.loss_fn(inputs1, inputs2, neg_samples)
+
+    def _xent_loss(self, inputs1, inputs2, neg_samples, hard_neg_samples=None):
+        aff = self.affinity(inputs1, inputs2)
+        neg_aff = self.neg_cost(inputs1, neg_samples, hard_neg_samples)
+        true_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+                labels=tf.ones_like(aff), logits=aff)
+        negative_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+                labels=tf.zeros_like(neg_aff), logits=neg_aff)
+        loss = tf.reduce_sum(true_xent) + self.neg_sample_weights * tf.reduce_sum(negative_xent)
+        return loss
+
+    def _skipgram_loss(self, inputs1, inputs2, neg_samples, hard_neg_samples=None):
+        aff = self.affinity(inputs1, inputs2)
+        neg_aff = self.neg_cost(inputs1, neg_samples, hard_neg_samples)
+        neg_cost = tf.log(tf.reduce_sum(tf.exp(neg_aff), axis=1))
+        loss = tf.reduce_sum(aff - neg_cost)
+        return loss
+
+    def _hinge_loss(self, inputs1, inputs2, neg_samples, hard_neg_samples=None):
+        aff = self.affinity(inputs1, inputs2)
+        neg_aff = self.neg_cost(inputs1, neg_samples, hard_neg_samples)
+        diff = tf.nn.relu(tf.subtract(neg_aff, tf.expand_dims(aff, 1) - self.margin), name='diff')
+        loss = tf.reduce_sum(diff)
+        self.neg_shape = tf.shape(neg_aff)
+        return loss
+
+    def weights_norm(self):
+        return tf.nn.l2_norm(self.vars['weights'])
--- a/src/libnrl/graphsage/unsupervised_train.py
+++ b/src/libnrl/graphsage/unsupervised_train.py
@ -0,0 +1,293 @@
+from __future__ import division
+from __future__ import print_function
+
+import os
+import time
+import tensorflow as tf
+import numpy as np
+
+from libnrl.graphsage.models import SampleAndAggregate, SAGEInfo, Node2VecModel
+from libnrl.graphsage.minibatch import EdgeMinibatchIterator
+from libnrl.graphsage.neigh_samplers import UniformNeighborSampler
+#from libnrl.graphsage.utils import load_data
+from libnrl.graphsage.__init__ import *  #import default parameters
+
+
+# Define model evaluation function
+def evaluate(sess, model, minibatch_iter, size=None):
+    t_test = time.time()
+    feed_dict_val = minibatch_iter.val_feed_dict(size)
+    outs_val = sess.run([model.loss, model.ranks, model.mrr], 
+                        feed_dict=feed_dict_val)
+    return outs_val[0], outs_val[1], outs_val[2], (time.time() - t_test)
+
+'''
+def incremental_evaluate(sess, model, minibatch_iter, size):
+    t_test = time.time()
+    finished = False
+    val_losses = []
+    val_mrrs = []
+    iter_num = 0
+    while not finished:
+        feed_dict_val, finished, _ = minibatch_iter.incremental_val_feed_dict(size, iter_num)
+        iter_num += 1
+        outs_val = sess.run([model.loss, model.ranks, model.mrr], 
+                            feed_dict=feed_dict_val)
+        val_losses.append(outs_val[0])
+        val_mrrs.append(outs_val[2])
+    return np.mean(val_losses), np.mean(val_mrrs), (time.time() - t_test)
+'''
+
+def save_val_embeddings(sess, model, minibatch_iter, size, mod=""):
+    val_embeddings = []
+    finished = False
+    seen = set([])  #this as set to store already seen emb-node id!
+    nodes = []
+    iter_num = 0
+    name = "val"
+    while not finished:
+        feed_dict_val, finished, edges = minibatch_iter.incremental_embed_feed_dict(size, iter_num)
+        iter_num += 1
+        outs_val = sess.run([model.loss, model.mrr, model.outputs1], 
+                            feed_dict=feed_dict_val)
+        #ONLY SAVE FOR embeds1 because of planetoid
+        for i, edge in enumerate(edges):
+            if not edge[0] in seen:
+                val_embeddings.append(outs_val[-1][i,:])
+                nodes.append(edge[0])  #nodes: a list; has order
+                seen.add(edge[0])  #seen: a set; NO order!!!
+    #if not os.path.exists(out_dir):
+    #    os.makedirs(out_dir)
+
+    val_embeddings = np.vstack(val_embeddings)
+    print(val_embeddings.shape)
+    vectors = {}
+    for i, embedding in enumerate(val_embeddings):
+        vectors[nodes[i]] = embedding  #warning: seen: a set; nodes: a list
+    return vectors
+
+    '''  #if we want to save embs, modify the following code
+    np.save(out_dir + name + mod + ".npy",  val_embeddings)
+    with open(out_dir + name + mod + ".txt", "w") as fp:
+        fp.write("\n".join(map(str,nodes)))
+    '''
+
+def construct_placeholders():
+    # Define placeholders
+    placeholders = {
+        'batch1' : tf.placeholder(tf.int32, shape=(None), name='batch1'),
+        'batch2' : tf.placeholder(tf.int32, shape=(None), name='batch2'),
+        # negative samples for all nodes in the batch
+        'neg_samples': tf.placeholder(tf.int32, shape=(None,),
+            name='neg_sample_size'),
+        'dropout': tf.placeholder_with_default(0., shape=(), name='dropout'),
+        'batch_size' : tf.placeholder(tf.int32, name='batch_size'),
+    }
+    return placeholders
+
+
+def train(train_data, test_data=None, model='graphsage_mean'):
+    print('---------- the graphsage model we used: ', model)
+    print('---------- parameters we sued: epochs, dim_1+dim_2, samples_1, samples_2, dropout, weight_decay, learning_rate, batch_size, normalize', 
+            epochs, dim_1+dim_2, samples_1, samples_2, dropout, weight_decay, learning_rate, batch_size, normalize)
+    G = train_data[0]
+    features = train_data[1]  #note: features are in order of graph.look_up_list, since id_map = {k: v for v, k in enumerate(graph.look_back_list)}
+    id_map = train_data[2]
+
+    if not features is None:
+        # pad with dummy zero vector
+        features = np.vstack([features, np.zeros((features.shape[1],))])
+    
+    random_context = False
+    context_pairs = train_data[3] if random_context else None
+    placeholders = construct_placeholders()
+    minibatch = EdgeMinibatchIterator(G, 
+            id_map,
+            placeholders, batch_size=batch_size,
+            max_degree=max_degree, 
+            num_neg_samples=neg_sample_size,
+            context_pairs = context_pairs)
+    adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.adj.shape)
+    adj_info = tf.Variable(adj_info_ph, trainable=False, name="adj_info")
+
+    if model == 'graphsage_mean':
+        # Create model
+        sampler = UniformNeighborSampler(adj_info)
+        layer_infos = [SAGEInfo("node", sampler, samples_1, dim_1),
+                            SAGEInfo("node", sampler, samples_2, dim_2)]
+
+        model = SampleAndAggregate(placeholders, 
+                                     features,
+                                     adj_info,
+                                     minibatch.deg,
+                                     layer_infos=layer_infos, 
+                                     model_size=model_size,
+                                     identity_dim = identity_dim,
+                                     logging=True)
+    elif model == 'gcn':
+        # Create model
+        sampler = UniformNeighborSampler(adj_info)
+        layer_infos = [SAGEInfo("node", sampler, samples_1, 2*dim_1),
+                            SAGEInfo("node", sampler, samples_2, 2*dim_2)]
+
+        model = SampleAndAggregate(placeholders, 
+                                     features,
+                                     adj_info,
+                                     minibatch.deg,
+                                     layer_infos=layer_infos, 
+                                     aggregator_type="gcn",
+                                     model_size=model_size,
+                                     identity_dim = identity_dim,
+                                     concat=False,
+                                     logging=True)
+
+    elif model == 'graphsage_seq':  #LSTM as stated in paper? very slow anyway...
+        sampler = UniformNeighborSampler(adj_info)
+        layer_infos = [SAGEInfo("node", sampler, samples_1, dim_1),
+                            SAGEInfo("node", sampler, samples_2, dim_2)]
+
+        model = SampleAndAggregate(placeholders, 
+                                     features,
+                                     adj_info,
+                                     minibatch.deg,
+                                     layer_infos=layer_infos, 
+                                     identity_dim = identity_dim,
+                                     aggregator_type="seq",
+                                     model_size=model_size,
+                                     logging=True)
+
+    elif model == 'graphsage_maxpool':
+        sampler = UniformNeighborSampler(adj_info)
+        layer_infos = [SAGEInfo("node", sampler, samples_1, dim_1),
+                            SAGEInfo("node", sampler, samples_2, dim_2)]
+
+        model = SampleAndAggregate(placeholders, 
+                                    features,
+                                    adj_info,
+                                    minibatch.deg,
+                                     layer_infos=layer_infos, 
+                                     aggregator_type="maxpool",
+                                     model_size=model_size,
+                                     identity_dim = identity_dim,
+                                     logging=True)
+    elif model == 'graphsage_meanpool':
+        sampler = UniformNeighborSampler(adj_info)
+        layer_infos = [SAGEInfo("node", sampler, samples_1, dim_1),
+                            SAGEInfo("node", sampler, samples_2, dim_2)]
+
+        model = SampleAndAggregate(placeholders, 
+                                    features,
+                                    adj_info,
+                                    minibatch.deg,
+                                     layer_infos=layer_infos, 
+                                     aggregator_type="meanpool",
+                                     model_size=model_size,
+                                     identity_dim = identity_dim,
+                                     logging=True)
+
+    elif model == 'n2v':
+        model = Node2VecModel(placeholders, features.shape[0],
+                                       minibatch.deg,
+                                       #2x because graphsage uses concat
+                                       nodevec_dim=2*dim_1,
+                                       lr=learning_rate)
+    else:
+        raise Exception('Error: model name unrecognized.')
+
+    config = tf.ConfigProto(log_device_placement=log_device_placement)
+    config.gpu_options.allow_growth = True
+    #config.gpu_options.per_process_gpu_memory_fraction = GPU_MEM_FRACTION
+    config.allow_soft_placement = True
+    
+    # Initialize session
+    sess = tf.Session(config=config)
+    merged = tf.summary.merge_all()
+    #summary_writer = tf.summary.FileWriter(log_dir(), sess.graph)
+     
+    # Init variables
+    sess.run(tf.global_variables_initializer(), feed_dict={adj_info_ph: minibatch.adj})
+    
+    # Train model
+    
+    train_shadow_mrr = None
+    shadow_mrr = None
+
+    total_steps = 0
+    avg_time = 0.0
+    epoch_val_costs = []
+
+    train_adj_info = tf.assign(adj_info, minibatch.adj)
+    val_adj_info = tf.assign(adj_info, minibatch.test_adj)
+    for epoch in range(epochs): 
+        minibatch.shuffle() 
+
+        iter = 0
+        epoch_val_costs.append(0)
+        train_cost = 0
+        train_mrr = 0
+        train_shadow_mrr = 0
+        val_cost = 0
+        val_mrr = 0
+        shadow_mrr = 0
+        avg_time = 0
+        while not minibatch.end():
+            # Construct feed dictionary
+            feed_dict = minibatch.next_minibatch_feed_dict()
+            feed_dict.update({placeholders['dropout']: dropout})
+
+            t = time.time()
+            # Training step
+            outs = sess.run([merged, model.opt_op, model.loss, model.ranks, model.aff_all, 
+                    model.mrr, model.outputs1], feed_dict=feed_dict)
+            train_cost = outs[2]
+            train_mrr = outs[5]
+            if train_shadow_mrr is None:
+                train_shadow_mrr = train_mrr#
+            else:
+                train_shadow_mrr -= (1-0.99) * (train_shadow_mrr - train_mrr)
+
+            if iter % validate_iter == 0:
+                # Validation
+                sess.run(val_adj_info.op)
+                val_cost, ranks, val_mrr, duration  = evaluate(sess, model, minibatch, size=validate_batch_size)
+                sess.run(train_adj_info.op)
+                epoch_val_costs[-1] += val_cost
+            if shadow_mrr is None:
+                shadow_mrr = val_mrr
+            else:
+                shadow_mrr -= (1-0.99) * (shadow_mrr - val_mrr)
+
+            #if total_steps % print_every == 0:
+                #summary_writer.add_summary(outs[0], total_steps)
+    
+            # Print results
+            avg_time = (avg_time * total_steps + time.time() - t) / (total_steps + 1)
+
+            iter += 1
+            total_steps += 1
+
+            if total_steps > max_total_steps:
+                break
+        
+        epoch += 1
+        print("Epoch:", '%04d' % epoch, 
+        "train_loss=", "{:.5f}".format(train_cost),
+        "train_mrr=", "{:.5f}".format(train_mrr), 
+        "train_mrr_ema=", "{:.5f}".format(train_shadow_mrr), # exponential moving average
+        "val_loss=", "{:.5f}".format(val_cost),
+        "val_mrr=", "{:.5f}".format(val_mrr), 
+        "val_mrr_ema=", "{:.5f}".format(shadow_mrr), # exponential moving average
+        "time=", "{:.5f}".format(avg_time))
+
+        if total_steps > max_total_steps:
+                break
+    
+    print("Optimization Finished!")
+
+    sess.run(val_adj_info.op)
+    #save_val_embeddings(sess, model, minibatch, validate_batch_size, log_dir())
+    return save_val_embeddings(sess, model, minibatch, validate_batch_size)  #return embs
+
+
+def graphsage_save_embeddings(self, filename): #to do...
+    pass
--- a/src/libnrl/graphsage/utils.py
+++ b/src/libnrl/graphsage/utils.py
@ -0,0 +1,117 @@
+from __future__ import print_function
+
+#-----------
+#compatible with networkx >2.0 in line 18 and 32 by Chengbin
+#compatible with latest random.choice in line 94 by Chengbin
+#--------------
+
+import numpy as np
+import random
+import json
+import sys
+import os
+
+import networkx as nx
+from networkx.readwrite import json_graph
+version_info = list(map(int, nx.__version__.split('.')))
+major = version_info[0]
+minor = version_info[1]
+#assert (major <= 1) and (minor <= 11), "networkx major version > 1.11"
+
+WALK_LEN=5
+N_WALKS=50
+
+def load_data(prefix, normalize=True, load_walks=False):
+    G_data = json.load(open(prefix + "-G.json"))
+    G = json_graph.node_link_graph(G_data)
+    '''
+    if isinstance(G.nodes()[0], int):
+        conversion = lambda n : int(n)
+    else:
+        conversion = lambda n : n
+    '''
+    conversion = lambda n : int(n)  # compatible with networkx >2.0
+
+    if os.path.exists(prefix + "-feats.npy"):
+        feats = np.load(prefix + "-feats.npy")
+    else:
+        print("No features present.. Only identity features will be used.")
+        feats = None
+    id_map = json.load(open(prefix + "-id_map.json"))
+    id_map = {conversion(k):int(v) for k,v in id_map.items()}
+    walks = []
+    class_map = json.load(open(prefix + "-class_map.json"))
+    if isinstance(list(class_map.values())[0], list):
+        lab_conversion = lambda n : n
+    else:
+        lab_conversion = lambda n : int(n)
+
+    class_map = {conversion(k):lab_conversion(v) for k,v in class_map.items()}
+
+    ## Remove all nodes that do not have val/test annotations
+    ## (necessary because of networkx weirdness with the Reddit data)
+    broken_count = 0
+    for node in G.nodes():
+        if not 'val' in G.node[node] or not 'test' in G.node[node]:
+            G.remove_node(node)
+            broken_count += 1
+    print("Removed {:d} nodes that lacked proper annotations due to networkx versioning issues".format(broken_count))
+
+    ## Make sure the graph has edge train_removed annotations
+    ## (some datasets might already have this..)
+    print("Loaded data.. now preprocessing..")
+    for edge in G.edges():
+        if (G.node[edge[0]]['val'] or G.node[edge[1]]['val'] or
+            G.node[edge[0]]['test'] or G.node[edge[1]]['test']):
+            G[edge[0]][edge[1]]['train_removed'] = True
+        else:
+            G[edge[0]][edge[1]]['train_removed'] = False
+
+    if normalize and not feats is None:
+        from sklearn.preprocessing import StandardScaler
+        train_ids = np.array([id_map[n] for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']])
+        train_feats = feats[train_ids]
+        scaler = StandardScaler()
+        scaler.fit(train_feats)
+        feats = scaler.transform(feats)
+    
+    if load_walks:
+        with open(prefix + "-walks.txt") as fp:
+            for line in fp:
+                walks.append(map(conversion, line.split()))
+
+    return G, feats, id_map, walks, class_map
+
+def run_random_walks(G, nodes, num_walks=N_WALKS):
+    pairs = []
+    for count, node in enumerate(nodes):
+        if G.degree(node) == 0:
+            continue
+        for i in range(num_walks):
+            curr_node = node
+            for j in range(WALK_LEN):
+                next_node = random.choice(list(G.neighbors(curr_node)))  #changed due to compatibility
+                #next_node = random.choice(G.neighbors(curr_node))
+                # self co-occurrences are useless
+                if curr_node != node:
+                    pairs.append((node,curr_node))
+                curr_node = next_node
+        if count % 1000 == 0:
+            print("Done walks for", count, "nodes")
+    return pairs
+
+if __name__ == "__main__":   #这个地方需要改写，可以每次运行都跑一次
+    """ Run random walks """
+    graph_file = sys.argv[1]
+    out_file = sys.argv[2]
+    G_data = json.load(open(graph_file))
+    G = json_graph.node_link_graph(G_data)
+    nodes = [n for n in G.nodes() if not G.node[n]["val"] and not G.node[n]["test"]]
+    G = G.subgraph(nodes)
+    pairs = run_random_walks(G, nodes)
+    with open(out_file, "w") as fp:
+        fp.write("\n".join([str(p[0]) + "\t" + str(p[1]) for p in pairs]))
+
+
+#go to this file dir and run the following line in CMD
+#python utils.py ../example_data/toy-ppi-G.json ../example_data/toy-ppi-walks.txt
--- a/src/libnrl/grarep.py
+++ b/src/libnrl/grarep.py
@ -0,0 +1,68 @@
+import math
+import numpy as np
+from numpy import linalg as la
+from sklearn.preprocessing import normalize
+
+class GraRep(object):
+    
+    def __init__(self, graph, Kstep, dim):
+        self.g = graph
+        self.Kstep = Kstep
+        assert dim%Kstep == 0
+        self.dim = int(dim/Kstep)
+        self.train()
+
+    def getAdjMat(self):
+        graph = self.g.G
+        node_size = self.g.node_size
+        look_up = self.g.look_up_dict
+        adj = np.zeros((node_size, node_size))
+        for edge in self.g.G.edges():
+            adj[look_up[edge[0]]][look_up[edge[1]]] = 1.0
+            adj[look_up[edge[1]]][look_up[edge[0]]] = 1.0
+        # ScaleSimMat
+        return np.matrix(adj/np.sum(adj, axis=1))
+
+    def GetProbTranMat(self, Ak):
+        probTranMat = np.log(Ak/np.tile(
+            np.sum(Ak, axis=0), (self.node_size, 1))) \
+            - np.log(1.0/self.node_size)
+        probTranMat[probTranMat < 0] = 0
+        probTranMat[probTranMat == np.nan] = 0
+        return probTranMat
+
+    def GetRepUseSVD(self, probTranMat, alpha):
+        U, S, VT = la.svd(probTranMat)
+        Ud = U[:, 0:self.dim]
+        Sd = S[0:self.dim]
+        return np.array(Ud)*np.power(Sd, alpha).reshape((self.dim))
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.Kstep*self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,' '.join([str(x) for x in vec])))
+        fout.close()
+
+    def train(self):
+        self.adj = self.getAdjMat()
+        self.node_size = self.adj.shape[0]
+        self.Ak = np.matrix(np.identity(self.node_size))
+        self.RepMat = np.zeros((self.node_size, int(self.dim*self.Kstep)))
+        for i in range(self.Kstep):
+            print('Kstep =', i)
+            self.Ak = np.dot(self.Ak, self.adj)
+            probTranMat = self.GetProbTranMat(self.Ak)
+            Rk = self.GetRepUseSVD(probTranMat, 0.5)
+            Rk = normalize(Rk, axis=1, norm='l2')
+            self.RepMat[:, self.dim*i:self.dim*(i+1)] = Rk[:, :]
+        # get embeddings
+        self.vectors = {}
+        look_back = self.g.look_back_list
+        for i, embedding in enumerate(self.RepMat):
+            self.vectors[look_back[i]] = embedding
+
+
+
+
--- a/src/libnrl/line.py
+++ b/src/libnrl/line.py
@ -0,0 +1,259 @@
+from __future__ import print_function
+import random
+import math
+import numpy as np
+from sklearn.linear_model import LogisticRegression
+import tensorflow as tf
+from .classify import ncClassifier, lpClassifier, read_node_label, read_edge_label
+
+
+class _LINE(object):
+
+    def __init__(self, graph, rep_size=128, batch_size=1000, negative_ratio=5, order=3):
+        self.cur_epoch = 0
+        self.order = order
+        self.g = graph
+        self.node_size = graph.G.number_of_nodes()
+        self.rep_size = rep_size
+        self.batch_size = batch_size
+        self.negative_ratio = negative_ratio
+
+        self.gen_sampling_table()
+        self.sess = tf.Session()
+        cur_seed = random.getrandbits(32)
+        initializer = tf.contrib.layers.xavier_initializer(uniform=False, seed=cur_seed)
+        with tf.variable_scope("model", reuse=None, initializer=initializer):
+            self.build_graph()
+        self.sess.run(tf.global_variables_initializer())
+
+    def build_graph(self):
+        self.h = tf.placeholder(tf.int32, [None])
+        self.t = tf.placeholder(tf.int32, [None])
+        self.sign = tf.placeholder(tf.float32, [None])
+
+        cur_seed = random.getrandbits(32)
+        self.embeddings = tf.get_variable(name="embeddings"+str(self.order), shape=[self.node_size, self.rep_size], initializer = tf.contrib.layers.xavier_initializer(uniform = False, seed=cur_seed))
+        self.context_embeddings = tf.get_variable(name="context_embeddings"+str(self.order), shape=[self.node_size, self.rep_size], initializer = tf.contrib.layers.xavier_initializer(uniform = False, seed=cur_seed))
+        # self.h_e = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.embeddings, self.h), 1)
+        # self.t_e = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.embeddings, self.t), 1)
+        # self.t_e_context = tf.nn.l2_normalize(tf.nn.embedding_lookup(self.context_embeddings, self.t), 1)
+        self.h_e = tf.nn.embedding_lookup(self.embeddings, self.h)
+        self.t_e = tf.nn.embedding_lookup(self.embeddings, self.t)
+        self.t_e_context = tf.nn.embedding_lookup(self.context_embeddings, self.t)
+        self.second_loss = -tf.reduce_mean(tf.log_sigmoid(self.sign*tf.reduce_sum(tf.multiply(self.h_e, self.t_e_context), axis=1)))
+        self.first_loss = -tf.reduce_mean(tf.log_sigmoid(self.sign*tf.reduce_sum(tf.multiply(self.h_e, self.t_e), axis=1)))
+        if self.order == 1:
+            self.loss = self.first_loss
+        else:
+            self.loss = self.second_loss
+        optimizer = tf.train.AdamOptimizer(0.001)
+        self.train_op = optimizer.minimize(self.loss)
+
+
+    def train_one_epoch(self):
+        sum_loss = 0.0
+        batches = self.batch_iter()
+        batch_id = 0
+        for batch in batches:
+            h, t, sign = batch
+            feed_dict = {
+                self.h : h,
+                self.t : t,
+                self.sign : sign,
+            }
+            _, cur_loss = self.sess.run([self.train_op, self.loss],feed_dict)
+            sum_loss += cur_loss
+            batch_id += 1
+        print('epoch:{} sum of loss:{!s}'.format(self.cur_epoch, sum_loss))
+        self.cur_epoch += 1
+
+    def batch_iter(self):
+        look_up = self.g.look_up_dict
+
+        table_size = 1e8
+        numNodes = self.node_size
+
+        edges = [(look_up[x[0]], look_up[x[1]]) for x in self.g.G.edges()]
+
+        data_size = self.g.G.number_of_edges()
+        edge_set = set([x[0]*numNodes+x[1] for x in edges])
+        shuffle_indices = np.random.permutation(np.arange(data_size))
+
+        # positive or negative mod
+        mod = 0
+        mod_size = 1 + self.negative_ratio
+        h = []
+        t = []
+        sign = 0
+
+        start_index = 0
+        end_index = min(start_index+self.batch_size, data_size)
+        while start_index < data_size:
+            if mod == 0:
+                sign = 1.
+                h = []
+                t = []
+                for i in range(start_index, end_index):
+                    if not random.random() < self.edge_prob[shuffle_indices[i]]:
+                        shuffle_indices[i] = self.edge_alias[shuffle_indices[i]]
+                    cur_h = edges[shuffle_indices[i]][0]
+                    cur_t = edges[shuffle_indices[i]][1]
+                    h.append(cur_h)
+                    t.append(cur_t)
+            else:
+                sign = -1.
+                t = []
+                for i in range(len(h)):
+                    t.append(self.sampling_table[random.randint(0, table_size-1)])
+
+            yield h, t, [sign]
+            mod += 1
+            mod %= mod_size
+            if mod == 0:
+                start_index = end_index
+                end_index = min(start_index+self.batch_size, data_size)
+
+    def gen_sampling_table(self):
+        table_size = 1e8
+        power = 0.75
+        numNodes = self.node_size
+
+        print("Pre-procesing for non-uniform negative sampling!")
+        node_degree = np.zeros(numNodes) # out degree
+
+        look_up = self.g.look_up_dict
+        for edge in self.g.G.edges():
+            node_degree[look_up[edge[0]]] += self.g.G[edge[0]][edge[1]]["weight"]
+
+        norm = sum([math.pow(node_degree[i], power) for i in range(numNodes)])
+
+        self.sampling_table = np.zeros(int(table_size), dtype=np.uint32)
+
+        p = 0
+        i = 0
+        for j in range(numNodes):
+            p += float(math.pow(node_degree[j], power)) / norm
+            while i < table_size and float(i) / table_size < p:
+                self.sampling_table[i] = j
+                i += 1
+
+        data_size = self.g.G.number_of_edges()
+        self.edge_alias = np.zeros(data_size, dtype=np.int32)
+        self.edge_prob = np.zeros(data_size, dtype=np.float32)
+        large_block = np.zeros(data_size, dtype=np.int32)
+        small_block = np.zeros(data_size, dtype=np.int32)
+
+        total_sum = sum([self.g.G[edge[0]][edge[1]]["weight"] for edge in self.g.G.edges()])
+        norm_prob = [self.g.G[edge[0]][edge[1]]["weight"]*data_size/total_sum for edge in self.g.G.edges()]
+        num_small_block = 0
+        num_large_block = 0
+        cur_small_block = 0
+        cur_large_block = 0
+        for k in range(data_size-1, -1, -1):
+            if norm_prob[k] < 1:
+                small_block[num_small_block] = k
+                num_small_block += 1
+            else:
+                large_block[num_large_block] = k
+                num_large_block += 1
+        while num_small_block and num_large_block:
+            num_small_block -= 1
+            cur_small_block = small_block[num_small_block]
+            num_large_block -= 1
+            cur_large_block = large_block[num_large_block]
+            self.edge_prob[cur_small_block] = norm_prob[cur_small_block]
+            self.edge_alias[cur_small_block] = cur_large_block
+            norm_prob[cur_large_block] = norm_prob[cur_large_block] + norm_prob[cur_small_block] -1
+            if norm_prob[cur_large_block] < 1:
+                small_block[num_small_block] = cur_large_block
+                num_small_block += 1
+            else:
+                large_block[num_large_block] = cur_large_block
+                num_large_block += 1
+
+        while num_large_block:
+            num_large_block -= 1
+            self.edge_prob[large_block[num_large_block]] = 1
+        while num_small_block:
+            num_small_block -= 1
+            self.edge_prob[small_block[num_small_block]] = 1
+
+
+    def get_embeddings(self):
+        vectors = {}
+        embeddings = self.embeddings.eval(session=self.sess)
+        # embeddings = self.sess.run(tf.nn.l2_normalize(self.embeddings.eval(session=self.sess), 1))
+        look_back = self.g.look_back_list
+        for i, embedding in enumerate(embeddings):
+            vectors[look_back[i]] = embedding
+        return vectors
+
+class LINE(object):
+
+    def __init__(self, graph, rep_size=128, batch_size=1000, epoch=10, negative_ratio=5, order=3, label_file = None, clf_ratio = 0.5, auto_save = True):
+        self.rep_size = rep_size
+        self.order = order
+        self.best_result = 0
+        self.vectors = {}
+        if order == 3:
+            self.model1 = _LINE(graph, rep_size/2, batch_size, negative_ratio, order=1)
+            self.model2 = _LINE(graph, rep_size/2, batch_size, negative_ratio, order=2)
+            for i in range(epoch):
+                self.model1.train_one_epoch()
+                self.model2.train_one_epoch()
+                '''
+                if label_file:
+                    self.get_embeddings()
+                    X, Y = read_node_label(label_file)
+                    print("Training classifier using {:.2f}% nodes...".format(clf_ratio*100))
+                    clf = Classifier(vectors=self.vectors, clf=LogisticRegression())
+                    result = clf.split_train_evaluate(X, Y, clf_ratio)
+
+                    if result['macro'] > self.best_result:
+                        self.best_result = result['macro']
+                        if auto_save:
+                            self.best_vector = self.vectors
+                '''
+
+        else:
+            self.model = _LINE(graph, rep_size, batch_size, negative_ratio, order=self.order)
+            for i in range(epoch):
+                self.model.train_one_epoch()
+                '''
+                if label_file:
+                    self.get_embeddings()
+                    X, Y = read_node_label(label_file)
+                    print("Training classifier using {:.2f}% nodes...".format(clf_ratio*100))
+                    clf = Classifier(vectors=self.vectors, clf=LogisticRegression())
+                    result = clf.split_train_evaluate(X, Y, clf_ratio)
+
+                    if result['macro'] > self.best_result:
+                        self.best_result = result['macro']
+                        if auto_save:
+                            self.best_vector = self.vectors
+                '''
+
+        self.get_embeddings()
+        if auto_save and label_file:
+            #self.vectors = self.best_vector
+            pass
+
+    def get_embeddings(self):
+        self.last_vectors = self.vectors
+        self.vectors = {}
+        if self.order == 3:
+            vectors1 = self.model1.get_embeddings()
+            vectors2 = self.model2.get_embeddings()
+            for node in vectors1.keys():
+                self.vectors[node] = np.append(vectors1[node], vectors2[node])
+        else:
+            self.vectors = self.model.get_embeddings()
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.rep_size))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,
+                                        ' '.join([str(x) for x in vec])))
+        fout.close()
--- a/src/libnrl/node2vec.py
+++ b/src/libnrl/node2vec.py
@ -0,0 +1,47 @@
+from __future__ import print_function
+import time
+import warnings
+warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
+from gensim.models import Word2Vec
+from . import walker
+
+
+class Node2vec(object):
+
+    def __init__(self, graph, path_length, num_paths, dim, p=1.0, q=1.0, dw=False, **kwargs):
+        
+        kwargs["workers"] = kwargs.get("workers", 1)
+        if dw:
+            kwargs["hs"] = 1
+            p = 1.0
+            q = 1.0
+
+        self.graph = graph
+        if dw:
+            self.walker = walker.BasicWalker(graph, workers=kwargs["workers"])
+        else:
+            self.walker = walker.Walker(graph, p=p, q=q, workers=kwargs["workers"])
+            print("Preprocess transition probs...")
+            self.walker.preprocess_transition_probs()
+        sentences = self.walker.simulate_walks(num_walks=num_paths, walk_length=path_length)
+        kwargs["sentences"] = sentences
+        kwargs["min_count"] = kwargs.get("min_count", 0)
+        kwargs["size"] = kwargs.get("size", dim)
+        kwargs["sg"] = 1
+
+        self.size = kwargs["size"]
+        print("Learning representation...")
+        word2vec = Word2Vec(**kwargs)
+        self.vectors = {}
+        for word in graph.G.nodes():
+            self.vectors[word] = word2vec.wv[word]
+        del word2vec
+
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.size))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,
+                                        ' '.join([str(x) for x in vec])))
+        fout.close()
--- a/src/libnrl/tadw.py
+++ b/src/libnrl/tadw.py
@ -0,0 +1,128 @@
+# -*- coding: utf-8 -*-
+from __future__ import print_function
+import math
+import numpy as np
+from numpy import linalg as la
+from sklearn.preprocessing import normalize
+from .gcn.utils import *
+
+'''
+#-----------------------------------------------------------------------------
+# part of code was originally forked from https://github.com/thunlp/OpenNE
+# modified by Chengbin Hou 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+class TADW(object):
+    
+    def __init__(self, graph, dim, lamb=0.2):
+        self.g = graph
+        self.lamb = lamb
+        self.dim = dim
+        self.train()
+
+    def getAdj(self):  #changed with the same data preprocessing, and our preprocessing obtain better result
+        '''
+        graph = self.g.G
+        node_size = self.g.node_size
+        look_up = self.g.look_up_dict
+        adj = np.zeros((node_size, node_size))
+        for edge in self.g.G.edges():
+            adj[look_up[edge[0]]][look_up[edge[1]]] = 1.0
+            adj[look_up[edge[1]]][look_up[edge[0]]] = 1.0
+        # ScaleSimMat
+        return adj/np.sum(adj, axis=1)   #orignal way may get numerical error sometimes...
+        '''
+        A = self.g.getA()
+        return self.g.rowAsPDF(A)
+        
+
+    def getT(self):  #changed with the same data preprocessing method
+        g = self.g.G
+        look_back = self.g.look_back_list
+        self.features = np.vstack([g.nodes[look_back[i]]['feature']
+            for i in range(g.number_of_nodes())]) 
+        self.preprocessFeature()    #call the orig data preprocessing method
+        return self.features.T
+        ''' 
+        #changed with the same data preprocessing method, see self.g.preprocessAttrInfo(X=X, dim=200, method='svd')
+        #seems get better result?
+        X = self.g.getX()
+        self.features = self.g.preprocessAttrInfo(X=X, dim=200, method='svd') #svd or pca for dim reduction
+        return np.transpose(self.features)
+        '''
+        
+    def preprocessFeature(self):    #the orignal data preprocess method
+        U, S, VT = la.svd(self.features)
+        Ud = U[:, 0:200]
+        Sd = S[0:200]
+        self.features = np.array(Ud)*Sd.reshape(200)
+    
+    def save_embeddings(self, filename):
+        fout = open(filename, 'w')
+        node_num = len(self.vectors.keys())
+        fout.write("{} {}\n".format(node_num, self.dim))
+        for node, vec in self.vectors.items():
+            fout.write("{} {}\n".format(node,' '.join([str(x) for x in vec])))
+        fout.close()
+
+    def train(self):
+        self.adj = self.getAdj()
+        # M=(A+A^2)/2 where A is the row-normalized adjacency matrix
+        self.M = (self.adj + np.dot(self.adj, self.adj))/2
+        # T is feature_size*node_num, text features
+        self.T = self.getT()        #transpose of self.features!!!
+        self.node_size = self.adj.shape[0]
+        self.feature_size = self.features.shape[1]
+        self.W = np.random.randn(self.dim, self.node_size)
+        self.H = np.random.randn(self.dim, self.feature_size)
+        # Update
+        for i in range(20):  #trade-off between acc and speed, 20-50
+            print('Iteration ', i)
+            # Update W
+            B = np.dot(self.H, self.T)
+            drv = 2 * np.dot(np.dot(B, B.T), self.W) - \
+                    2*np.dot(B, self.M.T) + self.lamb*self.W
+            Hess = 2*np.dot(B, B.T) + self.lamb*np.eye(self.dim)
+            drv = np.reshape(drv, [self.dim*self.node_size, 1])
+            rt = -drv
+            dt = rt
+            vecW = np.reshape(self.W, [self.dim*self.node_size, 1])
+            while np.linalg.norm(rt, 2) > 1e-4:
+                dtS = np.reshape(dt, (self.dim, self.node_size))
+                Hdt = np.reshape(np.dot(Hess, dtS), [self.dim*self.node_size, 1])
+
+                at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
+                vecW = vecW + at*dt
+                rtmp = rt
+                rt = rt - at*Hdt
+                bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
+                dt = rt + bt * dt
+            self.W = np.reshape(vecW, (self.dim, self.node_size))
+
+            # Update H
+            drv = np.dot((np.dot(np.dot(np.dot(self.W, self.W.T),self.H),self.T)
+                    - np.dot(self.W, self.M.T)), self.T.T) + self.lamb*self.H
+            drv = np.reshape(drv, (self.dim*self.feature_size, 1))
+            rt = -drv
+            dt = rt
+            vecH = np.reshape(self.H, (self.dim*self.feature_size, 1))
+            while np.linalg.norm(rt, 2) > 1e-4:
+                dtS = np.reshape(dt, (self.dim, self.feature_size))
+                Hdt = np.reshape(np.dot(np.dot(np.dot(self.W, self.W.T), dtS), np.dot(self.T, self.T.T))
+                                + self.lamb*dtS, (self.dim*self.feature_size, 1))
+                at = np.dot(rt.T, rt)/np.dot(dt.T, Hdt)
+                vecH = vecH + at*dt
+                rtmp = rt
+                rt = rt - at*Hdt
+                bt = np.dot(rt.T, rt)/np.dot(rtmp.T, rtmp)
+                dt = rt + bt * dt
+            self.H = np.reshape(vecH, (self.dim, self.feature_size))
+        self.Vecs = np.hstack((normalize(self.W.T), normalize(np.dot(self.T.T, self.H.T))))
+        # get embeddings
+        self.vectors = {}
+        look_back = self.g.look_back_list
+        for i, embedding in enumerate(self.Vecs):
+            self.vectors[look_back[i]] = embedding
+            
--- a/src/libnrl/utils.py
+++ b/src/libnrl/utils.py
@ -0,0 +1,260 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+from scipy import sparse
+# from sklearn.model_selection import train_test_split
+
+
+'''
+#-----------------------------------------------------------------------------
+# Chengbin Hou @ SUSTech 2018
+# Email: Chengbin.Hou10@foxmail.com
+#-----------------------------------------------------------------------------
+'''
+
+# ---------------------------------ulits for calculation--------------------------------
+
+
+def row_as_probdist(mat):
+    """Make each row of matrix sums up to 1.0, i.e., a probability distribution.
+    Support both dense and sparse matrix.
+
+    Attributes
+    ----------
+    mat : scipy sparse matrix or dense matrix or numpy array
+        The matrix to be normalized
+
+    Note
+    ----
+    For row with all entries 0, we normalize it to a vector with all entries 1/n
+
+    Returns
+    -------
+    dense or sparse matrix:
+        return dense matrix if input is dense matrix or numpy array
+        return sparse matrix for sparse matrix input
+    """
+    row_sum = np.array(mat.sum(axis=1))  # type: np.array
+    zero_rows = row_sum == 0
+    row_sum[zero_rows] = 1
+    diag = sparse.dia_matrix((1 / row_sum, 0), (mat.shape[0], mat.shape[0]))
+    mat = diag.dot(mat)
+    mat += sparse.bsr_matrix(zero_rows.astype(int)).T.dot(sparse.bsr_matrix(np.repeat(1 / mat.shape[1], mat.shape[1])))
+
+    return mat
+
+
+def pairwise_similarity(mat, type='cosine'):
+    # XXX: possible to integrate pairwise_similarity with top_k to enhance performance?
+    if type == 'cosine':  # support sprase and dense mat
+        from sklearn.metrics.pairwise import cosine_similarity
+        result = cosine_similarity(mat, dense_output=True)
+    elif type == 'jaccard':
+        from sklearn.metrics import jaccard_similarity_score
+        from sklearn.metrics.pairwise import pairwise_distances
+        # n_jobs=-1 means using all CPU for parallel computing
+        result = pairwise_distances(mat.todense(), metric=jaccard_similarity_score, n_jobs=-1)
+    elif type == 'euclidean':
+        from sklearn.metrics.pairwise import euclidean_distances
+        # note: similarity = - distance
+        # other version: similarity = 1 - 2 / pi * arctan(distance)
+        result = euclidean_distances(mat)
+        result = -result
+        # result = 1 - 2 / np.pi * np.arctan(result)
+    elif type == 'manhattan':
+        from sklearn.metrics.pairwise import manhattan_distances
+        # note: similarity = - distance
+        # other version: similarity = 1 - 2 / pi * arctan(distance)
+        result = manhattan_distances(mat)
+        result = -result
+        # result = 1 - 2 / np.pi * np.arctan(result)
+    else:
+        print('Please choose from: cosine, jaccard, euclidean or manhattan')
+        return 'Not found!'
+    return result
+
+
+# ---------------------------------ulits for preprocessing--------------------------------
+def node_auxi_to_attr(fin, fout):
+    """ TODO...
+        -> read auxi info associated with each node;
+        -> preprocessing auxi via:
+            1) NLP for sentences; or 2) one-hot for discrete features;
+        -> then becomes node attr with m dim, and store them into attr file
+    """
+    # https://radimrehurek.com/gensim/apiref.html
+    # word2vec, doc2vec, 把句子转为vec
+    # text2vec, tfidf, 把离散的features转为vec
+    pass
+
+
+def simulate_incomplete_stru():
+    pass
+
+
+def simulate_incomplete_attr():
+    pass
+
+
+def simulate_noisy_world():
+    pass
+
+# ---------------------------------ulits for downstream tasks--------------------------------
+# XXX: read and save using panda or numpy
+
+
+def read_edge_label_downstream(filename):
+    fin = open(filename, 'r')
+    X = []
+    Y = []
+    while 1:
+        line = fin.readline()
+        if line == '':
+            break
+        vec = line.strip().split(' ')
+        X.append(vec[:2])
+        Y.append(vec[2])
+    fin.close()
+    return X, Y
+
+
+def read_node_label_downstream(filename):
+    """ may be used in node classification task;
+        part of labels for training clf and
+        the result served as ground truth;
+        note: similar method can be found in graph.py -> read_node_label
+    """
+    fin = open(filename, 'r')
+    X = []
+    Y = []
+    while 1:
+        line = fin.readline()
+        if line == '':
+            break
+        vec = line.strip().split(' ')
+        X.append(vec[0])
+        Y.append(vec[1:])
+    fin.close()
+    return X, Y
+
+
+def store_embedddings(vectors, filename, dim):
+    """ store embeddings to file
+    """
+    fout = open(filename, 'w')
+    num_nodes = len(vectors.keys())
+    fout.write("{} {}\n".format(num_nodes, dim))
+    for node, vec in vectors.items():
+        fout.write("{} {}\n".format(node, ' '.join([str(x) for x in vec])))
+    fout.close()
+    print('store the resulting embeddings in file: ', filename)
+
+
+def load_embeddings(filename):
+    """ load embeddings from file
+    """
+    fin = open(filename, 'r')
+    num_nodes, size = [int(x) for x in fin.readline().strip().split()]
+    vectors = {}
+    while 1:
+        line = fin.readline()
+        if line == '':
+            break
+        vec = line.strip().split(' ')
+        assert len(vec) == size + 1
+        vectors[vec[0]] = [float(x) for x in vec[1:]]
+    fin.close()
+    assert len(vectors) == num_nodes
+    return vectors
+
+
+#----------------- 以下你整理到utils，有问题的我都用中文写出来了，没有中文的暂时没啥问题，可以先不用管-----------------------
+def generate_edges_for_linkpred(graph, edges_removed, balance_ratio=1.0):
+    ''' given a graph and edges_removed;
+        generate non_edges not in [both graph and edges_removed];
+        return all_test_samples including [edges_removed (pos samples), non_edges (neg samples)];
+        return format X=[[1,2],[2,4],...] Y=[1,0,...] where Y tells where corresponding element has a edge
+    '''
+    g = graph
+    num_edges_removed = len(edges_removed)
+    num_non_edges = int(balance_ratio * num_edges_removed)
+    num = 0
+    #np.random.seed(2018)
+    non_edges = []
+    exist_edges = list(g.G.edges())+list(edges_removed)
+    while num < num_non_edges:
+        non_edge = list(np.random.choice(g.look_back_list, size=2, replace=False))
+        if non_edge not in exist_edges:
+            num += 1
+            non_edges.append(non_edge)
+    
+    test_node_pairs = edges_removed + non_edges
+    test_edge_labels = list(np.ones(num_edges_removed)) + list(np.zeros(num_non_edges))
+    return test_node_pairs, test_edge_labels
+
+
+def dim_reduction(mat, dim=128, method='pca'):
+    ''' dimensionality reduction: PCA, SVD, etc...
+        dim = # of columns
+    '''
+    print('START dimensionality reduction using ' + method + ' ......')
+    t1 = time.time()
+    if method == 'pca':
+        from sklearn.decomposition import PCA
+        pca = PCA(n_components=dim, svd_solver='auto', random_state=None)
+        mat_reduced = pca.fit_transform(mat)   #sklearn pca auto remove mean, no need to preprocess
+    elif method == 'svd':
+        from sklearn.decomposition import TruncatedSVD
+        svd = TruncatedSVD(n_components=dim, n_iter=5, random_state=None)
+        mat_reduced = svd.fit_transform(mat)
+    else:  #to do... more methods... e.g. random projection, ica, t-sne...
+        print('dimensionality reduction method not found......')
+    t2 = time.time()
+    print('END dimensionality reduction: {:.2f}s'.format(t2-t1))
+    return mat_reduced
+
+
+def row_normalized(mat, is_transition_matrix=False):
+    ''' to do...
+        两个问题：1）sparse矩阵在该场景下比dense慢,(至少我自己写的这块代码是)
+                2）dense矩阵测试后发现所有元素加起来不是整数，似乎还是要用我以前笨方法来弥补
+                3)在is_transition_matrix时候，需要给全零行赋值，sparse时候会有点小问题，不能直接mat[i, :] = p赋值
+    '''
+    p = 1.0/mat.shape[0] #probability = 1/num of rows
+    norms = np.asarray(mat.sum(axis=1)).ravel()
+    for i, norm in enumerate(norms):
+        if norm != 0:
+            mat[i, :] /= norm
+        else:
+            if is_transition_matrix:
+                mat[i, :] = p #every row of transition matrix should sum up to 1
+            else:
+                pass #do nothing; keep all-zero row
+    return mat
+
+''' 笨方法如下'''
+def rowAsPDF(mat): #make each row sum up to 1 i.e. a probabolity density distribution
+    mat = np.array(mat)
+    for i in range(mat.shape[0]):
+        sum_row = mat[i,:].sum()
+        if sum_row !=0:
+            mat[i,:] = mat[i,:]/sum_row     #if a row [0, 1, 1, 1] -> [0, 1/3, 1/3, 1/3] -> may have some small issue...
+        else:
+            # to do...
+            # for node without any link... remain row as [0, 0, 0, 0]  OR set to [1/n, 1/n, 1/n...]??
+            pass 
+        if mat[i,:].sum() != 1.00:      #small trick to make sure each row is a pdf 笨犯法。。。
+            error = 1.00 - mat[i,:].sum()
+            mat[i,-1] += error
+    return mat
+
+
+
+def sparse_to_dense():
+    ''' to dense np.matrix format 你补充下，记得dtype用float64'''
+    import scipy.sparse as sp
+    pass
+
+def dense_to_sparse():
+    ''' to sparse crs format 你补充下，记得dtype用float64'''
+    import scipy.sparse as sp
+    pass
--- a/src/libnrl/walker.py
+++ b/src/libnrl/walker.py
@ -0,0 +1,327 @@
+# -*- coding: utf-8 -*-
+from __future__ import print_function
+
+import multiprocessing
+import random
+import time
+from itertools import chain
+
+import numpy as np
+from networkx import nx
+
+
+'''
+#-----------------------------------------------------------------------------
+# part of code was originally forked from https://github.com/thunlp/OpenNE
+# modified by Chengbin Hou @ SUSTech 2018
+# Email: Chengbin.Hou10@foxmail.com
+# ***class BiasedWalker was created by Chengbin Hou
+# ***we realize two ways to do ABRW 
+#       1) naive sampling (also multi-processor version)
+#       2) alias sampling (similar to node2vec)
+#-----------------------------------------------------------------------------
+'''
+
+
+def deepwalk_walk_wrapper(class_instance, walk_length, start_node):
+    class_instance.deepwalk_walk(walk_length, start_node)
+
+
+# ===========================================ABRW-weighted-walker============================================
+class BiasedWalker:  # ------ our method
+    def __init__(self, g, P, workers):
+        self.G = g.G  # nx data stcuture
+        self.P = P  # biased transition probability; n*n; each row is a pdf for a node
+        self.workers = workers
+        self.node_size = g.node_size
+        self.look_back_list = g.look_back_list
+        self.look_up_dict = g.look_up_dict
+
+    # alias sampling for ABRW-------------------------------------------------------------------
+    def simulate_walks(self, num_walks, walk_length):
+        self.P_G = nx.to_networkx_graph(self.P, create_using=nx.DiGraph())  # create a new nx graph based on ABRW transition prob matrix
+        t1 = time.time()
+        self.preprocess_transition_probs()  # note: we simply adapt node2vec
+        t2 = time.time()
+        print('Time for construct alias table: {:.2f}'.format(t2-t1))
+        walks = []
+        nodes = list(self.P_G.nodes())
+        print('Walk iteration:')
+        for walk_iter in range(num_walks):
+            print(str(walk_iter+1), '/', str(num_walks))
+            random.shuffle(nodes)
+            for node in nodes:
+                walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node))
+
+        for i in range(len(walks)):  # use ind to retrive orignal node ID
+            for j in range(len(walks[0])):
+                walks[i][j] = self.look_back_list[int(walks[i][j])]
+        return walks
+
+    def node2vec_walk(self, walk_length, start_node):  # to do...
+        G = self.P_G  # more efficient way instead of copy from node2vec
+        alias_nodes = self.alias_nodes
+        walk = [start_node]
+        while len(walk) < walk_length:
+            cur = walk[-1]
+            cur_nbrs = list(G.neighbors(cur))
+            if len(cur_nbrs) > 0:
+                walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])])
+            else:
+                break
+        return walk
+
+    def preprocess_transition_probs(self):
+        G = self.P_G
+        alias_nodes = {}
+        for node in G.nodes():
+            unnormalized_probs = [G[node][nbr]['weight'] for nbr in G.neighbors(node)]
+            norm_const = sum(unnormalized_probs)
+            normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
+            alias_nodes[node] = alias_setup(normalized_probs)
+        self.alias_nodes = alias_nodes
+
+
+'''
+    #naive sampling for ABRW-------------------------------------------------------------------
+    def weighted_walk(self, start_node):
+        #
+        #Simulate a weighted walk starting from start node.
+        #
+        G = self.G
+        look_up_dict = self.look_up_dict
+        look_back_list = self.look_back_list
+        node_size = self.node_size
+        walk = [start_node]
+
+        while len(walk) < self.walk_length:
+            cur_node = walk[-1]        #the last one entry/node
+            cur_ind = look_up_dict[cur_node]        #key -> index
+            pdf = self.P[cur_ind,:]    #the pdf of node with ind
+            #pdf = np.random.randn(18163)+10  #......test multiprocessor
+            #pdf = pdf / pdf.sum()            #......test multiprocessor
+            #next_ind = int( np.array( nx.utils.random_sequence.discrete_sequence(n=1,distribution=pdf) ) )
+            next_ind = np.random.choice(len(pdf), 1, p=pdf)[0]  #faster than nx
+            #next_ind = 0                     #......test multiprocessor
+            next_node = look_back_list[next_ind]    #index -> key
+            walk.append(next_node)
+        return walk
+
+    def simulate_walks(self, num_walks, walk_length):
+        #
+        #Repeatedly simulate weighted walks from each node.
+        #
+        G = self.G
+        self.num_walks = num_walks
+        self.walk_length = walk_length
+        self.walks = []  #what we all need later as input to skip-gram 
+        nodes = list(G.nodes())
+
+        print('Walk iteration:')
+        for walk_iter in range(num_walks):
+            t1 = time.time()
+            random.shuffle(nodes)
+            for node in nodes:                              #for single cpu, if # of nodes < 2000 (speed up) or nodes > 20000 (avoid memory error)
+                self.walks.append(self.weighted_walk(node)) #for single cpu, if # of nodes < 2000 (speed up) or nodes > 20000 (avoid memory error)
+            #pool = multiprocessing.Pool(processes=3)  #use all cpu by defalut or specify processes = xx 
+            #self.walks.append(pool.map(self.weighted_walk, nodes))   #ref: https://stackoverflow.com/questions/8533318/multiprocessing-pool-when-to-use-apply-apply-async-or-map
+            #pool.close()
+            #pool.join()
+            t2 = time.time()
+            print(str(walk_iter+1), '/', str(num_walks), ' each itr last for: {:.2f}s'.format(t2-t1))
+        #self.walks = list(chain.from_iterable(self.walks))  #unlist...[[[x,x],[x,x]]] -> [x,x], [x,x]
+        return self.walks
+'''
+
+
+# ===========================================deepWalk-walker============================================
+class BasicWalker:
+    def __init__(self, G, workers):
+        self.G = G.G
+        self.node_size = G.get_num_nodes()
+        self.look_up_dict = G.look_up_dict
+
+    def deepwalk_walk(self, walk_length, start_node):
+        '''
+        Simulate a random walk starting from start node.
+        '''
+        G = self.G
+        look_up_dict = self.look_up_dict
+        node_size = self.node_size
+
+        walk = [start_node]
+
+        while len(walk) < walk_length:
+            cur = walk[-1]
+            cur_nbrs = list(G.neighbors(cur))
+            if len(cur_nbrs) > 0:
+                walk.append(random.choice(cur_nbrs))
+            else:
+                break
+        return walk
+
+    def simulate_walks(self, num_walks, walk_length):
+        '''
+        Repeatedly simulate random walks from each node.
+        '''
+        G = self.G
+        walks = []
+        nodes = list(G.nodes())
+        print('Walk iteration:')
+        for walk_iter in range(num_walks):
+            # pool = multiprocessing.Pool(processes = 4)
+            print(str(walk_iter+1), '/', str(num_walks))
+            random.shuffle(nodes)
+            for node in nodes:
+                # walks.append(pool.apply_async(deepwalk_walk_wrapper, (self, walk_length, node, )))
+                walks.append(self.deepwalk_walk(walk_length=walk_length, start_node=node))
+            # pool.close()
+            # pool.join()
+        # print(len(walks))
+        return walks
+
+
+# ===========================================node2vec-walker============================================
+class Walker:
+    def __init__(self, G, p, q, workers):
+        self.G = G.G
+        self.p = p
+        self.q = q
+        self.node_size = G.node_size
+        self.look_up_dict = G.look_up_dict
+
+    def node2vec_walk(self, walk_length, start_node):
+        '''
+        Simulate a random walk starting from start node.
+        '''
+        G = self.G
+        alias_nodes = self.alias_nodes
+        alias_edges = self.alias_edges
+        look_up_dict = self.look_up_dict
+        node_size = self.node_size
+
+        walk = [start_node]
+
+        while len(walk) < walk_length:
+            cur = walk[-1]
+            cur_nbrs = list(G.neighbors(cur))
+            if len(cur_nbrs) > 0:
+                if len(walk) == 1:
+                    walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])])
+                else:
+                    prev = walk[-2]
+                    pos = (prev, cur)
+                    next = cur_nbrs[alias_draw(alias_edges[pos][0],
+                                               alias_edges[pos][1])]
+                    walk.append(next)
+            else:
+                break
+        return walk
+
+    def simulate_walks(self, num_walks, walk_length):
+        '''
+        Repeatedly simulate random walks from each node.
+        '''
+        G = self.G
+        walks = []
+        nodes = list(G.nodes())
+        print('Walk iteration:')
+        for walk_iter in range(num_walks):
+            print(str(walk_iter+1), '/', str(num_walks))
+            random.shuffle(nodes)
+            for node in nodes:
+                walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node))
+        return walks
+
+    def get_alias_edge(self, src, dst):
+        '''
+        Get the alias edge setup lists for a given edge.
+        '''
+        G = self.G
+        p = self.p
+        q = self.q
+
+        unnormalized_probs = []
+        for dst_nbr in G.neighbors(dst):
+            if dst_nbr == src:
+                unnormalized_probs.append(G[dst][dst_nbr]['weight']/p)
+            elif G.has_edge(dst_nbr, src):
+                unnormalized_probs.append(G[dst][dst_nbr]['weight'])
+            else:
+                unnormalized_probs.append(G[dst][dst_nbr]['weight']/q)
+        norm_const = sum(unnormalized_probs)
+        normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
+
+        return alias_setup(normalized_probs)
+
+    def preprocess_transition_probs(self):
+        '''
+        Preprocessing of transition probabilities for guiding the random walks.
+        '''
+        G = self.G
+
+        alias_nodes = {}
+        for node in G.nodes():
+            unnormalized_probs = [G[node][nbr]['weight'] for nbr in G.neighbors(node)]
+            norm_const = sum(unnormalized_probs)
+            normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]
+            alias_nodes[node] = alias_setup(normalized_probs)
+
+        alias_edges = {}
+        triads = {}
+
+        look_up_dict = self.look_up_dict
+        node_size = self.node_size
+        for edge in G.edges():
+            alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])
+
+        self.alias_nodes = alias_nodes
+        self.alias_edges = alias_edges
+
+        return
+
+
+def alias_setup(probs):
+    '''
+    Compute utility lists for non-uniform sampling from discrete distributions.
+    Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
+    for details
+    '''
+    K = len(probs)
+    q = np.zeros(K, dtype=np.float32)
+    J = np.zeros(K, dtype=np.int32)
+
+    smaller = []
+    larger = []
+    for kk, prob in enumerate(probs):
+        q[kk] = K*prob
+        if q[kk] < 1.0:
+            smaller.append(kk)
+        else:
+            larger.append(kk)
+
+    while len(smaller) > 0 and len(larger) > 0:
+        small = smaller.pop()
+        large = larger.pop()
+
+        J[small] = large
+        q[large] = q[large] + q[small] - 1.0
+        if q[large] < 1.0:
+            smaller.append(large)
+        else:
+            larger.append(large)
+
+    return J, q
+
+
+def alias_draw(J, q):
+    '''
+    Draw sample from a non-uniform discrete distribution using alias sampling.
+    '''
+    K = len(J)
+
+    kk = int(np.floor(np.random.rand()*K))
+    if np.random.rand() < q[kk]:
+        return kk
+    else:
+        return J[kk]
--- a/src/main.py
+++ b/src/main.py
@ -0,0 +1,258 @@
+'''
+demo of using (attributed) Network Embedding methods;
+STEP1: load data -->
+STEP2: prepare data -->
+STEP3: learn node embeddings -->
+STEP4: downstream evaluations
+
+python src/main.py --method abrw --save-emb True
+
+by Chengbin Hou 2018 <chengbin.hou10@foxmail.com>
+'''
+
+import time
+import random
+import numpy as np
+from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
+from sklearn.linear_model import LogisticRegression #to do... 1) put it in downstream.py; and 2) try SVM...
+from libnrl.classify import ncClassifier, lpClassifier, read_node_label
+from libnrl.graph import *
+from libnrl.utils import *
+from libnrl import abrw #ANE method; Attributed Biased Random Walk
+from libnrl import tadw #ANE method
+from libnrl import aane #ANE method
+from libnrl import asne #ANE method
+from libnrl.gcn import gcnAPI #ANE method
+from libnrl.graphsage import graphsageAPI #ANE method
+from libnrl import attrcomb #ANE method
+from libnrl import attrpure #NE method simply use svd or pca for dim reduction
+from libnrl import node2vec #PNE method; including deepwalk and node2vec
+from libnrl import line #PNE method
+from libnrl.grarep import GraRep #PNE method
+#from libnrl import TriDNR #to do... ANE method
+#https://github.com/dfdazac/dgi #to do... ANE method
+
+
+def parse_args():
+    parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter, conflict_handler='resolve')
+    #-----------------------------------------------general settings--------------------------------------------------
+    parser.add_argument('--graph-format', default='adjlist', choices=['adjlist', 'edgelist'],
+                        help='graph/network format')
+    parser.add_argument('--graph-file', default='data/cora/cora_adjlist.txt',
+                        help='graph/network file')
+    parser.add_argument('--attribute-file', default='data/cora/cora_attr.txt',
+                        help='node attribute/feature file')
+    parser.add_argument('--label-file', default='data/cora/cora_label.txt',
+                        help='node label file') 
+    parser.add_argument('--emb-file', default='emb/unnamed_node_embs.txt',
+                        help='node embeddings file; suggest: data_method_dim_embs.txt')
+    parser.add_argument('--save-emb', default=False, type=bool,
+                        help='save emb to disk if True')       
+    parser.add_argument('--dim', default=128, type=int,
+                        help='node embeddings dimensions')
+    parser.add_argument('--task', default='lp_and_nc', choices=['none', 'lp', 'nc', 'lp_and_nc'],
+                        help='choices of downstream tasks: none, lp, nc, lp_and_nc')
+    parser.add_argument('--link-remove', default=0.1, type=float, 
+                        help='simulate randomly missing links if necessary; a ratio ranging [0.0, 1.0]')
+    #parser.add_argument('--attr-remove', default=0.0, type=float, 
+    #                    help='simulate randomly missing attributes if necessary; a ratio ranging [0.0, 1.0]')
+    #parser.add_argument('--link-reserved', default=0.7, type=float, 
+    #                    help='for lp task, train/test split, a ratio ranging [0.0, 1.0]')
+    parser.add_argument('--label-reserved', default=0.7, type=float,
+                        help='for nc task, train/test split, a ratio ranging [0.0, 1.0]')
+    parser.add_argument('--directed', default=False, type=bool,
+                        help='directed or undirected graph')
+    parser.add_argument('--weighted', default=False, type=bool,
+                        help='weighted or unweighted graph')
+    #-------------------------------------------------method settings-----------------------------------------------------------
+    parser.add_argument('--method', default='abrw', choices=['node2vec', 'deepwalk', 'line', 'gcn', 'grarep', 'tadw',
+                                                            'abrw', 'asne', 'aane', 'attrpure', 'attrcomb', 'graphsage'],
+                        help='choices of Network Embedding methods')
+    parser.add_argument('--ABRW-topk', default=30, type=int,
+                        help='select the most attr similar top k nodes of a node; ranging [0, # of nodes]') 
+    parser.add_argument('--ABRW-alpha', default=0.8, type=float,
+                        help='balance struc and attr info; ranging [0, 1]') 
+    parser.add_argument('--TADW-lamb', default=0.2, type=float,
+                        help='balance struc and attr info; ranging [0, inf]')       
+    parser.add_argument('--AANE-lamb', default=0.05, type=float,
+                        help='balance struc and attr info; ranging [0, inf]')
+    parser.add_argument('--AANE-rho', default=5, type=float,
+                        help='penalty parameter; ranging [0, inf]')
+    parser.add_argument('--AANE-mode', default='comb', type=str, 
+                        help='choices of mode: comb, pure')  
+    parser.add_argument('--ASNE-lamb', default=1.0, type=float,
+                        help='balance struc and attr info; ranging [0, inf]')
+    parser.add_argument('--AttrComb-mode', default='concat', type=str,
+                        help='choices of mode: concat, elementwise-mean, elementwise-max')
+    parser.add_argument('--Node2Vec-p', default=0.5, type=float,
+                        help='trade-off BFS and DFS; rid search [0.25; 0.50; 1; 2; 4]')             
+    parser.add_argument('--Node2Vec-q', default=0.5, type=float,
+                        help='trade-off BFS and DFS; rid search [0.25; 0.50; 1; 2; 4]')
+    parser.add_argument('--GraRep-kstep', default=4, type=int,
+                        help='use k-step transition probability matrix')
+    parser.add_argument('--LINE-order', default=3, type=int, 
+                        help='choices of the order(s), 1st order, 2nd order, 1st+2nd order')
+    parser.add_argument('--LINE-no-auto-save', action='store_true',
+                        help='no save the best embeddings when training LINE')
+    parser.add_argument('--LINE-negative-ratio', default=5, type=int,
+                        help='the negative ratio')
+    #for walk based methods; some Word2Vec SkipGram parameters are not specified here
+    parser.add_argument('--number-walks', default=10, type=int,
+                        help='# of random walks of each node')
+    parser.add_argument('--walk-length', default=80, type=int,
+                        help='length of each random walk')
+    parser.add_argument('--window-size', default=10, type=int, 
+                        help='window size of skipgram model')
+    parser.add_argument('--workers', default=24, type=int,
+                        help='# of parallel processes.')
+    #for deep learning based methods; parameters about layers and neurons used are not specified here
+    parser.add_argument('--learning-rate', default=0.001, type=float,  
+                        help='learning rate')        
+    parser.add_argument('--batch-size', default=128, type=int,
+                        help='batch size')
+    parser.add_argument('--epochs', default=100, type=int,
+                        help='epochs')
+    parser.add_argument('--dropout', default=0.5, type=float,  
+                        help='dropout rate (1 - keep probability)')
+    parser.add_argument('--weight-decay', type=float, default=0.0001,
+                        help='weight for L2 loss on embedding matrix')
+    args = parser.parse_args()
+    return args
+
+
+def main(args):
+    g = Graph() #see graph.py for commonly-used APIs and use g.G to access NetworkX APIs
+    print('\nSummary of all settings: ', args)
+
+
+    #---------------------------------------STEP1: load data-----------------------------------------------------
+    print('\nSTEP1: start loading data......')
+    t1 = time.time()
+    #load graph structure info------
+    if args.graph_format == 'adjlist':
+        g.read_adjlist(path=args.graph_file, directed=args.directed)
+    elif args.graph_format == 'edgelist':
+        g.read_edgelist(path=args.graph_file, weighted=args.weighted, directed=args.directed)
+    #load node attribute info------
+    is_ane = (args.method == 'abrw' or args.method == 'tadw' or args.method == 'gcn' or args.method == 'graphsage' or
+                 args.method == 'attrpure' or args.method == 'attrcomb' or args.method == 'asne' or args.method == 'aane')
+    if is_ane:
+        assert args.attribute_file != ''
+        g.read_node_attr(args.attribute_file)
+    #load node label info------
+    #to do... similar to attribute {'key_attribute': value}, label also loaded as {'key_label': value}
+    t2 = time.time()
+    print('STEP1: end loading data; time cost: {:.2f}s'.format(t2-t1))
+
+
+    #---------------------------------------STEP2: prepare data----------------------------------------------------
+    print('\nSTEP2: start preparing data for link pred task......')
+    t1 = time.time()
+    test_node_pairs=[]
+    test_edge_labels=[]
+    if args.task == 'lp' or args.task == 'lp_and_nc':
+        edges_removed = g.remove_edge(ratio=args.link_remove)
+        test_node_pairs, test_edge_labels = generate_edges_for_linkpred(graph=g, edges_removed=edges_removed, balance_ratio=1.0)
+    t2 = time.time()
+    print('STEP2: end preparing data; time cost: {:.2f}s'.format(t2-t1))
+
+
+    #-----------------------------------STEP3: upstream embedding task-------------------------------------------------
+    print('\nSTEP3: start learning embeddings......')
+    print('the graph: ', args.graph_file, '\nthe # of nodes: ', g.get_num_nodes(), '\nthe # of edges used during embedding (edges maybe removed if lp task): ', g.get_num_edges(),
+            '\nthe # of isolated nodes: ', g.get_num_isolates(), '\nis directed graph: ', g.get_isdirected(), '\nthe model used: ', args.method)
+    t1 = time.time()
+    model = None
+    if args.method == 'abrw': 
+        model = abrw.ABRW(graph=g, dim=args.dim, alpha=args.ABRW_alpha, topk=args.ABRW_topk, num_paths=args.number_walks,
+                            path_length=args.walk_length, workers=args.workers, window=args.window_size)
+    elif args.method == 'attrpure':
+        model = attrpure.ATTRPURE(graph=g, dim=args.dim)
+    elif args.method == 'attrcomb':
+        model = attrcomb.ATTRCOMB(graph=g, dim=args.dim, comb_with='deepwalk',
+                                     num_paths=args.number_walks, comb_method=args.AttrComb_mode)  #concat, elementwise-mean, elementwise-max
+    elif args.method == 'asne':
+        if args.task == 'nc':
+            model = asne.ASNE(graph=g, dim=args.dim, alpha=args.ASNE_lamb, epoch=args.epochs, learning_rate=args.learning_rate, batch_size=args.batch_size,
+                             X_test=None, Y_test=None, task=args.task, nc_ratio=args.label_reserved, lp_ratio=args.link_reserved, label_file=args.label_file)
+        else:
+            model = asne.ASNE(graph=g, dim=args.dim, alpha=args.ASNE_lamb, epoch=args.epochs, learning_rate=args.learning_rate, batch_size=args.batch_size,
+                             X_test=X_test_lp, Y_test=Y_test_lp, task=args.task, nc_ratio=args.label_reserved, lp_ratio=args.link_reserved, label_file=args.label_file)
+    elif args.method == 'aane':
+        model = aane.AANE(graph=g, dim=args.dim, lambd=args.AANE_lamb, mode=args.AANE_mode)
+    elif args.method == 'tadw':
+        model = tadw.TADW(graph=g, dim=args.dim, lamb=args.TADW_lamb)
+    elif args.method == 'deepwalk':
+        model = node2vec.Node2vec(graph=g, path_length=args.walk_length,
+                                 num_paths=args.number_walks, dim=args.dim,
+                                 workers=args.workers, window=args.window_size, dw=True)
+    elif args.method == 'node2vec':
+        model = node2vec.Node2vec(graph=g, path_length=args.walk_length, num_paths=args.number_walks, dim=args.dim,
+                                 workers=args.workers, p=args.Node2Vec_p, q=args.Node2Vec_q, window=args.window_size)
+    elif args.method == 'grarep':
+        model = GraRep(graph=g, Kstep=args.GraRep_kstep, dim=args.dim)
+    elif args.method == 'line':
+        if args.label_file and not args.LINE_no_auto_save:
+            model = line.LINE(g, epoch = args.epochs, rep_size=args.dim, order=args.LINE_order, 
+                label_file=args.label_file, clf_ratio=args.label_reserved)
+        else:
+            model = line.LINE(g, epoch = args.epochs, rep_size=args.dim, order=args.LINE_order)
+    elif args.method == 'graphsage':
+        model = graphsageAPI.graphsage_unsupervised_train(graph=g, graphsage_model = 'graphsage_mean')  
+        #we follow the default parameters, see __inti__.py in graphsage file
+        #choices: graphsage_mean, gcn ......
+        #model.save_embeddings(args.emb_file)  #to do...
+    elif args.method == 'gcn':
+        model = graphsageAPI.graphsage_unsupervised_train(graph=g, graphsage_model = 'gcn') #graphsage-gcn
+    else:
+        print('no method was found...')
+        exit(0)
+    '''
+    elif args.method == 'gcn':   #OR use graphsage-gcn as in graphsage method...
+        assert args.label_file != ''        #must have node label
+        assert args.feature_file != ''      #different from previous ANE methods
+        g.read_node_label(args.label_file)  #gcn is an end-to-end supervised ANE methoed
+        model = gcnAPI.GCN(graph=g, dropout=args.dropout,
+                            weight_decay=args.weight_decay, hidden1=args.hidden,
+                            epochs=args.epochs, clf_ratio=args.label_reserved)
+        #gcn does not have model.save_embeddings() func
+    '''
+    if args.save_emb:
+        model.save_embeddings(args.emb_file + time.strftime(' %Y%m%d-%H%M%S', time.localtime()))
+        print('Save node embeddings in file: ', args.emb_file)
+    t2 = time.time()
+    print('STEP3: end learning embeddings; time cost: {:.2f}s'.format(t2-t1))
+
+
+    #---------------------------------------STEP4: downstream task-----------------------------------------------
+    print('\nSTEP4: start evaluating ......: ')
+    print('nc for node classification tasks; lp for link prediction task', args.task)
+    t1 = time.time()
+    if args.method != 'semi_supervised_gcn':  #except semi-supervised methods, we will get emb first, and then eval emb
+        vectors = 0
+        if args.method == 'graphsage' or args.method == 'gcn':  #to do... run without this 'if'
+            vectors = model                       
+        else:
+            vectors = model.vectors #for other methods....
+        del model, g           
+        #------lp task
+        if args.task == 'lp' or args.task == 'lp_and_nc':
+            #X_test_lp, Y_test_lp = read_edge_label(args.label_file)  #enable this if you want to load your own lp testing data, see classfiy.py
+            print('During embedding we have used {:.2f}% links and the remaining will be left for lp evaluation...'.format(args.link_remove*100))
+            clf = lpClassifier(vectors=vectors)     #similarity/distance metric as clf; basically, lp is a binary clf probelm
+            clf.evaluate(test_node_pairs, test_edge_labels)
+        #------nc task
+        if args.task == 'nc' or args.task == 'lp_and_nc':
+            X, Y = read_node_label(args.label_file)
+            print('Training nc classifier using {:.2f}% node labels...'.format(args.label_reserved*100))
+            clf = ncClassifier(vectors=vectors, clf=LogisticRegression())   #use Logistic Regression as clf; we may choose SVM or more advanced ones
+            clf.split_train_evaluate(X, Y, args.label_reserved)
+    t2 = time.time()
+    print('STEP4: end evaluating; time cost: {:.2f}s'.format(t2-t1))
+
+
+if __name__ == '__main__':
+    #random.seed(2018)
+    #np.random.seed(2018)    
+    main(parse_args())
+
--- a/src/vis.py
+++ b/src/vis.py
@ -0,0 +1,63 @@
+import pandas as pd
+import tensorflow as tf
+import numpy as np
+from tensorflow.contrib.tensorboard.plugins import projector
+import os
+
+def read_node_label(filename):
+    with open(filename, 'r') as f:
+        node_label = {}  #dict
+        for l in f.readlines():
+            vec = l.split()
+            node_label[int(vec[0])] = str(vec[1:])
+    return node_label
+
+def read_node_emb(filename):
+    with open(filename, 'r') as f:
+        node_emb = {}  #dict
+        next(f)  #except the head line: num_of_nodes, dim
+        for l in f.readlines():
+            vec = l.split()
+            node_emb[int(vec[0])] = [float(i) for i in vec[1:]]
+    return node_emb
+
+# load the node label and saved embeddings
+label_file = './data/cora/cora_label.txt'
+emb_file = './emb/abrw.txt'
+label_dict = read_node_label(label_file)
+emb_dict = read_node_emb(emb_file)
+
+if label_dict.keys() != emb_dict.keys():
+    print('ERROR, node ids are not matched! Plz check again')
+    exit(0)
+
+#embeddings = np.array([i for i in emb_dict.values()], dtype=np.float32)
+embeddings = np.array([emb_dict[i] for i in sorted(emb_dict.keys(), reverse=False)], dtype=np.float32)
+
+labels = [label_dict[i] for i in sorted(label_dict.keys(), reverse=False)]
+
+
+# save embeddings and labels
+emb_df = pd.DataFrame(embeddings)
+emb_df.to_csv('emb/log/embeddings.tsv', sep='\t', header=False, index=False)
+
+lab_df = pd.Series(labels, name='label')
+lab_df.to_frame().to_csv('emb/log/node_labels.tsv', header=False, index=False)
+
+# save tf variable
+embeddings_var = tf.Variable(embeddings, name='embeddings')
+sess = tf.Session()
+
+saver = tf.train.Saver([embeddings_var])
+sess.run(embeddings_var.initializer)
+saver.save(sess, os.path.join('emb/log', "model.ckpt"), 1)
+
+# configure tf projector
+config = projector.ProjectorConfig()
+embedding = config.embeddings.add()
+embedding.tensor_name = 'embeddings'
+embedding.metadata_path = 'node_labels.tsv'
+
+projector.visualize_embeddings(tf.summary.FileWriter('emb/log'), config)
+
+# type "tensorboard --logdir=emb/log" in CMD and have fun :)