Introduction to Linkspotter

Introduction

Linkspotter is a package of the R software that mainly allows to calculate and visualize using a graph all the bivariate links of a dataset.

Its main features are:

  • Calculation of several correlation matrices corresponding to different link coefficients
  • Clustering of variables using an unsupervised learning
  • Supervised discretization of one or a couple of variables.

It also offers a customizable user interface, allowing to:

  • visualize the links using a graph (the variables correspond to the nodes and the links correspond to the edges)
  • show the distribution of each variable using its histogram or barplot
  • visualize a link between a couple of variables using scatter plots, box plots, etc.

Available link coefficients are:

Installation

library(devtools)
install_github("sambaala/linkspotter")

Behind a proxy:

library(devtools)
library(httr)
set_config(
  use_proxy(url="<my_proxy>", port=<my_proxy_port>)
)
install_github("sambaala/linkspotter")

Usage

Load the package:

library(linkspotter)

Take a look at the documentation:

help(package="linkspotter")

The examples are carried out using “iris” data.

Calculate the MaxNMI between two variables

maxNMI(iris$Sepal.Length,iris$Petal.Length)
## [1] 0.6992338

Extract a correlation matrix from the correlation dataframe

The Pearson correlation matrix:

corMatrixPearson<-corCouplesToMatrix(x1_x2_val = corCouples[,c('X1','X2',"pearson")])
print(corMatrixPearson)
##              Petal.Length Petal.Width Sepal.Length Sepal.Width Species
## Petal.Length    1.0000000   0.9628654    0.8717538  -0.4284401      NA
## Petal.Width     0.9628654   1.0000000    0.8179411  -0.3661259      NA
## Sepal.Length    0.8717538   0.8179411    1.0000000  -0.1175698      NA
## Sepal.Width    -0.4284401  -0.3661259   -0.1175698   1.0000000      NA
## Species                NA          NA           NA          NA       1

The MaxNMI matrix:

corMatrixMaxNMI<-corCouplesToMatrix(x1_x2_val = corCouples[,c('X1','X2',"MaxNMI")])
print(corMatrixMaxNMI)
##              Petal.Length Petal.Width Sepal.Length Sepal.Width   Species
## Petal.Length    1.0000000   0.8351786    0.6992338   0.3789867 0.8702060
## Petal.Width     0.8351786   1.0000000    0.6322728   0.3703042 0.8920899
## Sepal.Length    0.6992338   0.6322728    1.0000000   0.2033015 0.4873895
## Sepal.Width     0.3789867   0.3703042    0.2033015   1.0000000 0.2606311
## Species         0.8702060   0.8920899    0.4873895   0.2606311 1.0000000

Clustering of variables using a correlation matrix

cl<-clusterVariables(corMatrix = corMatrixMaxNMI)
print(cl)
##            var group
## 1 Petal.Length     1
## 2  Petal.Width     2
## 3 Sepal.Length     3
## 4  Sepal.Width     4
## 5      Species     2

Visualize the graph using Pearson correlation

linkspotterGraph(corDF = corCouples, variablesClustering = cl, 
  corMethod = "pearson", minCor = 0.25, smoothEdges = FALSE, 
  dynamicNodes = FALSE)

Visualize the graph using MaxNMI

linkspotterGraph(corDF = corCouples, variablesClustering = cl, 
  corMethod = "MaxNMI", minCor = 0.25, smoothEdges = F, 
  dynamicNodes = TRUE)

Launch the customizable user interface

linkspotterUI(dataset = iris, corDF = corCouples, 
  variablesClustering = cl, appTitle = "Linkspotter example")

Additional features

Complete Linkspotter computation:

lsiris<-linkspotterComplete(iris)
## Number of variables: 5 
## Number of couples: 10 
## Number of observations: 150 
## Coef.: pearson, spearman, kendall, mic, MaxNMI
## Start time: 2024-11-20 06:22:25.345008 
## Correlation coef. computation finished: 2024-11-20 06:22:25.822929
## Clustering computation finished: 2024-11-20 06:22:25.829832
## Total Computation time: 0.485 secs

Complete Linkspotter computation from an external file:

lsiris<-linkspotterOnFile("iris.csv")
summary(lsiris)
summary(lsiris)
##                     Length Class      Mode     
## computationTime      1     -none-     character
## launchShiny          1     -none-     function 
## dataset              5     data.frame list     
## targetVar            0     -none-     NULL     
## corDF               10     data.frame list     
## corMatrices          5     -none-     list     
## corGroups            2     data.frame list     
## clusteringCorMethod  1     -none-     character
## defaultMinCor        1     -none-     numeric  
## defaultCorMethod     1     -none-     character
## corMethods           5     -none-     character

Then launch the user interface using:

lsiris$launchShiny()

Help:

help(linkspotterComplete)

User interface guide

‘Graphs’ tab

The graph

The variables correspond to the nodes and their links correspond to the edges. Node color depends on the clustering. Edge color depends on the correlation direction quantitative couples (blue: positive correlation, red: negative correlation).

First features

  • Correlation coefficient drop-down list : allows to choose the link coefficient to be visualized.
  • Minimum Correlation cursor: allows to define the minimum link measurement (from 0 to 1) threshold necessary to plot a edge.
  • Interest variable drop-down list: allows to choose a variable of interest. If a variable of interest is selected, a new cursor named Minimum Correlation with interest variable appears. It gives the possibility of defining a new minimum threshold of correlation specific to this variable of interest, in addition to the general threshold. The upper limit of this specific threshold is the global threshold.

Checkboxes

  • Highlight variable on click: if checked, it is possible to focus on a node, by clicking on it or by selecting it from the drop-down list that appears.
  • Variable Clustering: If checked, the nodes are colored according to the group to which they belong. If it is not checked, all nodes are blue.
  • Color Edges by correlation direction: if checked, the edges are colored according to the direction of the correlation (blue for a positive correlation, red for a negative correlation and gray for NA) found between the two variables. The NA case corresponds to when at least one of the two variables is qualitative.
  • Smooth edges: if checked, edges are allowed to bend as needed. Otherwise, they remain straight.
  • Dynamic nodes stabilization: if checked, the graph is repositioned according to the graph stabilization algorithm after each movement by the user.
  • Re-stabilize: button that allows to re-stabilize the graph (centering and redistribution of the nodes on plane in “almost optimal” way)

General information

  • nb. observations: the number of entries in the dataset
  • nb. variables: the number of variables in the dataset
  • nb. couples: the number of couples
  • nb. current edges: the current number of edges plotted according to the chosen thresholds

Click on a node of the graph

It produces the following:

A summary figure for the corresponding variable (bottom left of the graph)

Its type depends on the nature of the corresponding variable:

  • Quantitative variable: a histogram
  • Qualitative variable: a bar graph

A summary table for the corresponding variable (under general information)

Its type depends on the nature of the variable:

  • Quantitative variable: a table containing the min, 1st quartile, median, mean, 3rd quartile and max of the variable.
  • Qualitative variable: a table containing the frequency of each modality of the variable.

‘Tables’ tab

This tab displays 2 tables:

  • A correlation matrix corresponding to the selected correlation coefficient
  • A table indicating the group which each variable is assigned to after by the clustering.

The Correlation coefficient option allows you to choose the coefficient of correlation to be considered among those calculated initially.

Importations

Linkspotter uses and combine features coming from several other R packages, namely infotheo, minerva, energy, mclust, shiny, visNetwork, rAmCharts and ggplot2.