Graph_algorithm/OpenANE

Fork 0

OpenANE: the first Open source framework specialized in Attributed Network Embedding. The related paper was accepted by Neurocomputing. https://doi.org/10.1016/j.neucom.2020.05.080

abrw all-in-one attribute-biased-random-walks attributed-networks embedding graph-embedding network-embedding network-representation-learning

Go to file

Zeyu DONG 2039d5d3f1 Update README.md		2018-12-03 16:23:36 +08:00
data/cora	init_v0.0	2018-11-17 12:30:56 +00:00
emb	init bash emb log file	2018-11-17 16:42:10 +00:00
log	add vis result	2018-11-30 21:41:10 +00:00
src	add vis API	2018-11-30 21:36:50 +00:00
.gitignore	update readme and remove redundancy	2018-12-02 19:10:33 +00:00
LICENSE	OpenANE License update	2018-11-23 10:24:52 +00:00
README.md	Update README.md	2018-12-03 16:23:36 +08:00
requirements.txt	update requirements file	2018-12-01 12:40:01 +08:00

README.md

OpenANE: The first Open source framework specialized in Attributed Network Embedding (ANE)

We reproduce several ANE (Attributed Network Embedding) methods as well as PNE (Pure Network Embedding) methods in one unified framework, where they all share the same I/O, downstream tasks, etc. We start this project based on OpenNE which mainly integrates PNE methods in one unified framework.
OpenANE not only integrates those PNE methods that consider pure structural information, but also provides the state-of-the-art ANE methods that consider both structural and attribute information during embedding.

Authors: Chengbin HOU chengbin.hou10@foxmail.com & Zeyu DONG 11611716@mail.sustc.edu.cn 2018

Motivation

In many real-world scenarios, a network often comes with node attributes such as paper metadata in a citation network, user profiles in a social network, and even node degrees in any pure networks. Unfortunately, PNE methods cannot make use of attribute information that may further improve the quality of node embeddings.
From engineering perspective, by offering more APIs to handle attribute information in graph.py and utils.py, OpenANE shall be easy to use for embedding an attributed network. Except attributed networks, OpenANE can also deal with pure networks by calling PNE methods, or by assigning node degrees as node attributes and then calling ANE methods. Therefore, to some extent, ANE methods can be regarded as the generalization of PNE methods.

Methods

ABRW, SAGE-GCN, SAGE-Mean, ASNE, TADW, AANE, DeepWalk, Node2Vec, LINE, GraRep, AttrPure, AttrComb
Note: all methods in this framework are unsupervised, and so do not require any label during embedding phase.

For more details of each method, please have a look at our paper https://arxiv.org/abs/1811.11728
And if you find ABRW or this framework is useful for your research, please consider citing it.

Usages

Requirements

pip install -r requirements.txt

Python 3.6.6 or above is required due to the new print(f' ') feature

To obtain node embeddings as well as evaluate the quality

python src/main.py --method abrw --task lp_and_nc --emb-file emb/cora_abrw_emb --save-emb

To have an intuitive feeling in node embeddings

python src/vis.py --emb-file emb/cora_abrw_emb --label-file data/cora/cora_label.txt

Testing (Cora)

Parameter Settings

The default parameters for SAGE-GCN and SAGE-Mean are in src/libnrl/graphsage/_init_.py. And for other parameters:

AANE_lamb	AANE_maxiter	AANE_rho	ABRW_alpha	ABRW_topk	ASNE_lamb	AttrComb_mode	GraRep_kstep	LINE_negative_ratio	LINE_order	Node2Vec_p	Node2Vec_q	TADW_lamb	TADW_maxiter	batch_size	dim	dropout	epochs	label_reserved	learning_rate	link_remove	number_walks	walk_length	weight_decay	window_size	workers
0.05	10	5	0.8	30	1	concat	4	5	3	0.5	0.5	0.2	10	128	128	0.5	100	0.7	0.001	0.1	10	80	0.0001	10	24

Testing Results

Link Prediction (LP) and Node Classification (NC) tasks:

STEPS: Cora -> NE method -> node embeddings -> (downstream) LP/NC -> scores

Method	AUC (LP)	Micro-F1 (NC)	Macro-F1 (NC)
aane	0.8081	0.7296	0.6941
abrw	0.9376	0.8612	0.8523
asne	0.7728	0.6052	0.5656
attrcomb	0.9053	0.8446	0.8318
attrpure	0.7993	0.7368	0.7082
deepwalk	0.8465	0.8147	0.8048
grarep	0.8935	0.7632	0.7529
line	0.6930	0.6130	0.5949
node2vec	0.7935	0.7938	0.7856
sagegcn	0.8926	0.7964	0.7828
sagemean	0.8948	0.7899	0.7748
tadw	0.8877	0.8442	0.8321

2D Visualization task:

STEPS: Cora -> NE method -> node embeddings -> (downstream) PCA to 2D -> vis

The different colors indicate different ground truth labels.

Other Datasets

More well-prepared (attributed) network datasets are available at NetEmb-Datasets

Your Own Dataset

*--------------- Structural Info (each row) --------------------*
adjlist: node_id1 node_id2 node_id3 ... (neighbors of node_id1)
or edgelist: node_id1 node_id2 weight (weight is optional)
*--------------- Attribute Info (each row) ---------------------*
node_id1 attr1 attr2 ...
*--------------- Label Info (each row) -------------------------*
node_id1 label1 label2 ...

Parameter Tuning

For different dataset, one may need to search the optimal parameters instead of taking the default parameters. For the meaning and suggestion of each parameter, please see main.py.

Contribution

We highly welcome and appreciate your contribution in fixing bugs, reproducing new ANE methods, etc. Please use the pull request and your contribution will automatically appear in this project once accepted. We will add you to authors list, if your contribution is significant to this project.

References

To do...