4 min readNov 2, 2022

Gaussian Embedding of Large-scale Attributed Graphs :- Research paper summary

GitHub - bhagya-hettige/GLACE: Gaussian Embedding of Large-scale Attributed Graphs

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Core Ideas :-

Embbed nodes as Gausian distribution with some mean U and co- variance matrix Sigma
Gaussian distribution takes care of uncertainty in graphs
Mean U is similar point based representation of normal models ( bert etc)
Sigma is learnt as diagonal matrix to reduce parameters

Advantages :-

graphs constructed from real-world data can be very complex, noisy and imbalanced. Therefore, a mere point-based representation of the nodes may not be able to capture the variability of the graph and so some hidden patterns
Most existing graph embedding approaches are transductive and cannot infer embeddings for nodes unseen at training time. In practice, however, graphs evolve with time, and new nodes and edges can be added into the graph. There are a few recent studies which tried to provide a solution to this limitation. However, these methods either do not scale up to large graphs, or require additional information about the graph structure.

Architecture and some Equations :-

uses any encoder MLP bases transformer based CNN based for encoding each node into xi

1 is added so that ELU never creates negative outputs

We have our mean U and triangular matrix Sigma

Shared encoder for all nodes
Loss :- KL divergence

Where zi and zj are embeddings of connected nodes

Jargons :-

First order proximity

We learn first-order proximity of nodes, by modelling local pairwise proximity between two connected nodes in the graph. The empirical probability for first-order proximity measure observed in the original graph between nodes i and j is defined as the ratio of the weight of the edge (i, j) to the total of the weights of all the edges in the graph.

distance between the two distributions, O1 = DKL(ˆp1||p1)

Which means minimizing the the negative of log likelihood i.e NLL is same as minimizing the distance between two distributions

Second order proximity

Nodes which have more similar neighbourhoods should be closer in embedding space with respect to the nodes with less similar neighbourhoods

So we generate a context vector of all the neighbour nodes and calculate the loss between zi and Z’ which is generate by context vector f’ embeddings.

similarly O2 = Pi ∈V λiDKL(ˆp2(.|i)||p2(.|i))

this has a hard bottleneck on running performance because we need all nodes to calculate the context vector

so they do Negative edge sampling hence only take few of nodes connected reducing the cost

The encoder for context nodes is f0 : V → R m. The latent representation of context node i is z 0 i = N (µ0i , Σ0i ), where µ0i = f0µ (f0(xi)) and Σ0i = f 0 Σ(f0(xi))