# Graph Emebeddings

Notes on graph embeddings, major references are:

## What is Graph Embeddings?

• Graph embeddings are the transformation of property graphs to a vector or a set of vectors.
• Similar to word embeddings, which are vector representation of words.
• Two major types of graph embeddings:
1. Vertex Embeddings - Each vertex has its own vector representation. Two vertices with similar properties (e.g. similar neighbours, similar connection types, etc.) should have vectors close to each other.
2. Graph Embeddings - Represent the whole graph as a single vector

## Applications

There are many different machine learning tasks that are related to graphs and networks:

• node classification: predict the class of a node
• link prediction: predict whether two nodes are connected
• community detection: detect densely linked clusters of nodes
• network similarity: determine how similar two (sub)networks are

## Challenges

• Make sure that the embeddings describe well the properties of the graphs
• The speed of creating or learning the embeddings should not increase drastically with the size of the graph
• How to determine the number of dimensions of the embedding

## Node Embeddings

• Learn a representation of the nodes in the embedding space (where the number of dimensions is smaller than the number of nodes)
• The goal is to embed nodes such that similarity in the embedding space approximates similarity of the nodes in the network
• Steps to learn node embeddings
1. Define an encoder (i.e. a mapping from nodes to vectors)
2. Define a node similarity function (i.e. a measure of node similarity in the original network)
3. Optimize the parameters such that $z_u^{\top}z_v$ approximates $similarity(u, v)$

### Different Approaches to Similarity

• References:
• Similarity function is just the edge weight between two nodes $u$ and $v$ in the original network
• We want to minimize the following loss function: $\mathcal{L} = \sum_{(u,v)\in V \times V}||z_u^{\top}z_v - A_{u,v}||^2$, where $A_{u,v}$ is the corresponding cell in the adjacency matrix
• Limitations:
• $O(|V|^2)$ runtime (it must consider all node pairs)
• It only considers direct and location connections

### Methods

#### DeepWalk

• DeepWalk: Online Learning of Social Representations
• Implementation: https://github.com/phanein/deepwalk
• DeepWalk uses local information obtained from truncated random walks tolearnlatent representations by treating walks as the equivalent of sentences.
• Steps:
1. Sampling with random walks: about 32 to 64 random walks from each node will be performed
2. Training skip-gram model: accept a vector representing one random walk of a node, predict the neighbour nodes (e.g. 20)
3. Computing the embedding of each node

#### node2vec

• node2vec: Scalable Feature Learning for Networks
• A generalization of DeepWalk
• Parameters
• $p$ (return parameter): controls how likely the random walk would go back to the previous node
• $q$ (in-out parameter): controls how likely the random walk will visit new (undiscovered) nodes in the graph
• Random walks become 2nd order Markov chains
• Increase efficient sampling rate by reusing samples, for example:
• Let $k$ be the number of steps we need to sample
• We can perform a random walk of length $l > k$, such that we can generate samples for $l - k$ nodes at once
• e.g. a random walk ${u, s1, s2, s3, s4, s5}$ results in $N(u) = {s1, s2, s3}$, $N(s1) = {s2, s3, s4}$, $N(s2) = {s3, s4, s5}$.
• DeepWalk can be considered as a special case of node2vec with $p = q = 1$ #### Structural Deep Network Embedding (SDNE)

• Structural Deep Network Embedding
• Implementation: https://github.com/suanrong/SDNE
• Does not perform random walks
• Designed to preserve the first and second order proximity:
• First order proximity: local pairwise similarity between nodes linked by an edge, characterise local network sturcture (e.g. if a paper cites another paper, they address similar topics)
• Second order proximity: indicate the similarity of the nodes’ neighbourhood structures; if two nodes share many neighbours, they tend to be similar to each other
• SDNE is semi-supervised:
• The unsupervised component: to learn a representation to reconstruct the second order proximity of nodes
• The supervised component: to learn a representation that preserves the first order proximity of nodes ## Collective Classification

According to Sen et al., Give a network and an object $o$ in the network, three types of correlations can be utilized to determine the label of $o$:

1. Correlations between the label of $o$ and the attributes of $o$
2. Correlations between the label of $o$ and the attributes of the objects in the neighbourhood of $o$
3. Correlations between the label of $o$ and the unobserved labels of the objects in the neighbourhood of $o$

Collective classification refers to the combined classification of a set of interlinked objects using all three types of information described above.

Applications: classification of webpages