Graph Emebeddings

Notes on graph embeddings, major references are:

What is Graph Embeddings?

  • Graph embeddings are the transformation of property graphs to a vector or a set of vectors.
  • Similar to word embeddings, which are vector representation of words.
  • Two major types of graph embeddings:
    1. Vertex Embeddings - Each vertex has its own vector representation. Two vertices with similar properties (e.g. similar neighbours, similar connection types, etc.) should have vectors close to each other.
    2. Graph Embeddings - Represent the whole graph as a single vector


There are many different machine learning tasks that are related to graphs and networks:

  • node classification: predict the class of a node
  • link prediction: predict whether two nodes are connected
  • community detection: detect densely linked clusters of nodes
  • network similarity: determine how similar two (sub)networks are


  • Make sure that the embeddings describe well the properties of the graphs
  • The speed of creating or learning the embeddings should not increase drastically with the size of the graph
  • How to determine the number of dimensions of the embedding

Node Embeddings

  • Learn a representation of the nodes in the embedding space (where the number of dimensions is smaller than the number of nodes)
  • The goal is to embed nodes such that similarity in the embedding space approximates similarity of the nodes in the network
  • Steps to learn node embeddings
    1. Define an encoder (i.e. a mapping from nodes to vectors)
    2. Define a node similarity function (i.e. a measure of node similarity in the original network)
    3. Optimize the parameters such that $z_u^{\top}z_v$ approximates $similarity(u, v)$

Different Approaches to Similarity

Adjacency-based Similarity

  • References:
  • Similarity function is just the edge weight between two nodes $u$ and $v$ in the original network
  • We want to minimize the following loss function: $\mathcal{L} = \sum_{(u,v)\in V \times V}||z_u^{\top}z_v - A_{u,v}||^2$, where $A_{u,v}$ is the corresponding cell in the adjacency matrix
  • Limitations:
    • $O(|V|^2)$ runtime (it must consider all node pairs)
    • It only considers direct and location connections

Multi-hop Similarity

Random Walk Approaches



  • DeepWalk: Online Learning of Social Representations
  • Implementation:
  • DeepWalk uses local information obtained from truncated random walks tolearnlatent representations by treating walks as the equivalent of sentences.
  • Steps:
    1. Sampling with random walks: about 32 to 64 random walks from each node will be performed
    2. Training skip-gram model: accept a vector representing one random walk of a node, predict the neighbour nodes (e.g. 20)
    3. Computing the embedding of each node


  • node2vec: Scalable Feature Learning for Networks
  • A generalization of DeepWalk
  • Parameters
    • $p$ (return parameter): controls how likely the random walk would go back to the previous node
    • $q$ (in-out parameter): controls how likely the random walk will visit new (undiscovered) nodes in the graph
  • Random walks become 2nd order Markov chains
  • Increase efficient sampling rate by reusing samples, for example:
    • Let $k$ be the number of steps we need to sample
    • We can perform a random walk of length $l > k$, such that we can generate samples for $l - k$ nodes at once
    • e.g. a random walk ${u, s1, s2, s3, s4, s5}$ results in $N(u) = {s1, s2, s3}$, $N(s1) = {s2, s3, s4}$, $N(s2) = {s3, s4, s5}$.
  • DeepWalk can be considered as a special case of node2vec with $p = q = 1$

Structural Deep Network Embedding (SDNE)

  • Structural Deep Network Embedding
  • Implementation:
  • Does not perform random walks
  • Designed to preserve the first and second order proximity:
    • First order proximity: local pairwise similarity between nodes linked by an edge, characterise local network sturcture (e.g. if a paper cites another paper, they address similar topics)
    • Second order proximity: indicate the similarity of the nodes’ neighbourhood structures; if two nodes share many neighbours, they tend to be similar to each other
  • SDNE is semi-supervised:
    • The unsupervised component: to learn a representation to reconstruct the second order proximity of nodes
    • The supervised component: to learn a representation that preserves the first order proximity of nodes

Graph Embeddings


Collective Classification

According to Sen et al., Give a network and an object $o$ in the network, three types of correlations can be utilized to determine the label of $o$:

  1. Correlations between the label of $o$ and the attributes of $o$
  2. Correlations between the label of $o$ and the attributes of the objects in the neighbourhood of $o$
  3. Correlations between the label of $o$ and the unobserved labels of the objects in the neighbourhood of $o$

Collective classification refers to the combined classification of a set of interlinked objects using all three types of information described above.

Applications: classification of webpages


Research Papers

Blogs and Articles


Source Codes