Behind the scene — with scratch mathematics

In many aspects of daily life, such as social media, chemical compounds, networking, etc., we have data in the format of graphs. And it’s difficult to glean anything significant at first glance. AI now comes into the picture. The primary tool for processing graph data and carrying out classification, grouping, or regression is the Graph Convolutional Neural Network (GCNN). We have the ability to classify graphs, nodes, and edges, among other things. We will learn how GCNN for graph classification operates in this blog.

Any machine learning or deep learning algorithm needs labelled data in order to perform supervised classification. Assume that in order to classify a group of users on social media, Data A should look somewhat like this:

The key issue in this situation is that the data is independent and does not include several columns or features that are simple to process. Here, the independent data (X) is a graph, and classification must be done based on the features of the nodes, the connections between the nodes, and/or the features of the edges. We must change that data into the following format in order to process it:

Okay, let’s convert Data from ‘A’ to ‘B’…!!!

Every node of the graph has some features. Like in our case, the node may have feature like age, gender, name, hight, their posts, etc.

In this blog, we will understand the process in the following three steps:

1 — G: Solve the graph and create Convo

2 — C: Solve the Convo and create Tabular Data

3 — NN: Apply NN on that tabular data to predict

**Step 1**

**G:** Solve the graph and create Convo:

Every feature has some features like [x1, x2, …, xn] and has some value:

To get the Convo, we need to perform the message passing step and update the value of features according to their neighbours.

In our case, ‘f’ could be any mathematical function, we need to choose according to the problem:

Let’s choose the f = Mean value and update as follow:

Similarly, we can get the updated vector for each node and the update value will be in our example:

Now the question is, how many time we need to do message passing: Its depends in how many hops all the information will be shared over the whole graph or just follow for some optimum numbers of hops (if your graph has huge number of sparse edges).

Suppose if we have one extra node in our previous graph, we need to do message passing upto two hops so that newly added node can share the information of its feature with red node.

After completing the message passing, we can create a Convo of 2D metrics like this:

**Step 2**

**C: **Solve the convo:

In our case, we have a Convo of 3×2 and pass to a CNN (= C + NN). Pooling function can be anything Max, Min, Average, Mode, Mix, etc. I am taking Average.

Now, we have a vector which is representing whole graph. And, we can perform a Multiple Layer Perceptron (MLP) processing on this converted data, which is the third step of the GCNN.

**Step 3**

**NN: **Neural network processing

In this way, we can train the whole algorithm — conversion and then classification — on a graph-dataset.

To watch an animated video, please follow the link below: