Link Prediction with DGL

Deep Learning/Graph Neural Network

Link Prediction with DGL

언킴 2022. 9. 1. 20:32

본 글에서는 DGL 을 통한 Link Prediction을 GraphSAGE를 통해 진행한다. Link Prediction은 두 노드가 연결되어 있는지, 아닌지를 확인하는 문제로 이해할 수 있다.

패키지 불러오기

본 글에서는 DGL 과 PyTorch 그리고 인접행렬을 만들거나 연산을 진행하기 위한 numpy와 scipy를 사용한다.

import dgl 
import dgl.function as fn 

import torch 
import torch.nn as nn 

import torch.optim as optim 
import torch.nn.functional as F 
import itertools 
import numpy as np 
import scipy.sparse as sp

from sklearn.metrics import roc_auc_score

데이터 불러오기

그래프 task에는 다양한 데이터셋이 존재하며 DGL에서도 다양한 데이터를 지원하고 있다. 여기서는 Cora 데이터셋을 사용하여 논문들 간의 인용이 있는지 예측하는 모델을 구축한다.

import dgl.data 

dataset = dgl.data.CoraGraphDataset()
g = dataset[0]

#  NumNodes: 2708
#  NumEdges: 10556
#  NumFeats: 1433
#  NumClasses: 7
#  NumTrainingSamples: 140
#  NumValidationSamples: 500
#  NumTestSamples: 1000

노드는 2708개가 있으며 10556개의 엣지가 존재한다. 노드의 feature는 1433개, 클래스는 7개가 존재한다. 실제 Link Prediction을 수행하기 위해서는 일부 데이터의 Link 즉, 엣지를 마스킹 처리한 후 이를 예측하는 형태로 진행하기 때문에 10%는 Test 데이터로 사용하기 위해 마스킹 처리를 수행한다. 그 후 인접 행렬을 만들고, 연결되지 않은 Edge에서 Negative Sampling을 수행한다.

u, v = g.edges()

eids = np.arange(g.number_of_edges())
eids = np.random.permutation(eids)
test_size = int(len(eids)*0.1)

train_size = g.number_of_edges() - test_size 
test_pos_u, test_pos_v = u[eids[:test_size]], v[eids[:test_size]]
train_pos_u, train_pos_v = u[eids[test_size:]], v[eids[test_size:]]


# negative edge
adj = sp.coo_matrix((np.ones(len(u)), (u.numpy(), v.numpy()))) # 인접행렬을 생성합니다. coo_marix(data, (row, col)) row와 col에 동시에 출현하는 index는 1로 mapping하는 것입니다.
adj_neg = 1 - adj.todense() # negative edge를 찾는 것이기 때문에 1을 빼서 adj_neg를 생성합니다.
neg_u, neg_v = np.where(adj_neg != 0)

neg_eids = np.random.choice(len(neg_u), g.number_of_edges()//2)
test_neg_u, test_neg_v = neg_u[neg_eids[:test_size]], neg_v[neg_eids[:test_size]]
train_neg_u, train_neg_v = neg_u[neg_eids[test_size:]], neg_v[neg_eids[test_size:]]

train_g = dgl.remove_edges(g, eids[:test_size])

모델 구축하기

Link Prediction을 수행하기 위해 GraphSAGE 라는 모델을 사용하며, DGL을 사용하면 매우 간단하게 구현이 가능하다.

from dgl.nn import SAGEConv 

class GraphSAGE(nn.Module):
    def __init__(self, in_feats, h_feats):
        super(GraphSAGE, self).__init__()

        self.in_feats = in_feats 
        self.h_feats = h_feats 

        self.conv1 = SAGEConv(self.in_feats, self.h_feats, aggregator_type = 'mean')
        self.conv2 = SAGEConv(self.h_feats, self.h_feats, aggregator_type = 'mean')
        self.relu = nn.ReLU()

    def forward(self, g, in_feats):
        h = self.conv1(g, in_feats)
        h = self.relu(h)
        h = self.conv2(g, h)
        return h
        

class DotPredictor(nn.Module):
    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h 
            g.apply_edges(fn.u_dot_v('h', 'h', 'score')) # `h`: source node, `h` destination node
            return g.edata['score'][:, 0]


class MLPPredictor(nn.Module):
    def __init__(self, h_feats):
        super().__init__()
        self.W1 = nn.Linear(h_feats*2, h_feats)
        self.W2 = nn.Linear(h_feats, 1)

    def apply_edges(self, edges):
        h = torch.cat([edges.src['h'], edges.dst['h'], 1])
        return {'score': self.W2(F.relu(self.W1(h))).squeeze(1)}
    
    def forward(self, g, h):
        with g.local_scope():
            g.ndata['h'] = h
            g.apply_edges(self.apply_edges)
            return g.edata['score']

GraphSAGE는 Convolution Layer를 2개 쌓은 후 마지막에는 Dot Product 혹은 MLP, LSTM 등을 사용하여 최종 모델을 구축한다. 본 글에서는 Dot Product를 통한 연산을 예제로 사용할 것이며 MLP를 사용하고자 하는 경우 위에 표기된 MLPPredictor를 사용해도 된다.

학습하기

학습하는 방식은 일반적인 딥러닝과 동일하게 Optimizer를 설정하고, Criterion을 설정한 후 학습한다. 이때 노드 간의 연결이 존재하는 경우 1로 표기하고, 연결되지 않은 경우 0으로 표기하며 이를 예측하는 Binary Classification으로도 볼 수 있다.

model = GraphSAGE(train_g.ndata['feat'].shape[1], 16)

pred = DotPredictor()

def criterion(pos_score, neg_score):
    scores = torch.cat([pos_score, neg_score])
    labels = torch.cat([torch.ones(pos_score.shape[0]), torch.zeros(neg_score.shape[0])])
    return F.binary_cross_entropy_with_logits(scores, labels)

def accuracy(pos_score, neg_score):
    scores = torch.cat([pos_score, neg_score]).numpy()
    labels = torch.cat([torch.ones(pos_score.shape[0]), torch.zeros(neg_score.shape[0])]).numpy()
    return roc_auc_score(labels, scores)
    
all_logits = []

optimizer = optim.Adam(model.parameters())
for epoch in range(100):
    h = model(train_g, train_g.ndata['feat'])
    pos_score = pred(train_pos_g, h)
    neg_score = pred(train_neg_g, h)
    loss = criterion(pos_score, neg_score)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch+1) % 5 == 0 :
        print(f'In epoch {epoch+1}, loss: {loss:.4f}')


with torch.no_grad():
    pos_score = pred(test_pos_g, h)
    neg_score = pred(test_neg_g, h)
    print('AUC', accuracy(pos_score, neg_score))

저작자표시

'Deep Learning > Graph Neural Network' 카테고리의 다른 글

Community Detection (0)	2022.09.02
Graph를 이용한 Cascade 모델링 (0)	2022.09.02
DGL을 통한 Graph 생성하기. (0)	2022.09.01
Graph Convolution Network 구현 wtih DGL (0)	2022.09.01
Spectral GCN에서 Fourier Transform을 하는 이유 (0)	2022.06.22

현재글Link Prediction with DGL

ok-lab