Title: | Bivariate Correlations Calculation and Visualization |
---|---|
Description: | Compute and visualize using the 'visNetwork' package all the bivariate correlations of a dataframe. Several and different types of correlation coefficients (Pearson's r, Spearman's rho, Kendall's tau, distance correlation, maximal information coefficient and equal-freq discretization-based maximal normalized mutual information) are used according to the variable couple type (quantitative vs categorical, quantitative vs quantitative, categorical vs categorical). |
Authors: | Alassane Samba [aut, cre], Orange [cph] |
Maintainer: | Alassane Samba <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.4.0.9000 |
Built: | 2024-11-20 06:22:27 UTC |
Source: | https://github.com/orange-opensource/linkspotter |
Discretize a quantitative variable by optimizing the obtained the Normalized Mutual Information with a target qualitative variable
BeEFdiscretization.numfact( continuousY, factorX, includeNA = T, showProgress = F )
BeEFdiscretization.numfact( continuousY, factorX, includeNA = T, showProgress = F )
continuousY |
a vector of numeric. |
factorX |
a vector of factor. |
includeNA |
a boolean. TRUE to include NA value as a factor level. |
showProgress |
a boolean to decide whether to show the progress bar. |
a factor.
# calculate a correlation dataframe data(iris) discreteSepalLength=BeEFdiscretization.numfact(continuousY=iris$Sepal.Length,factorX=iris$Species) summary(discreteSepalLength)
# calculate a correlation dataframe data(iris) discreteSepalLength=BeEFdiscretization.numfact(continuousY=iris$Sepal.Length,factorX=iris$Species) summary(discreteSepalLength)
Discretize two quantitative variables by optimizing the obtained the Normalized Mutual Information
BeEFdiscretization.numnum( continuousX, continuousY, maxNbBins = 100, includeNA = T, showProgress = F )
BeEFdiscretization.numnum( continuousX, continuousY, maxNbBins = 100, includeNA = T, showProgress = F )
continuousX |
a vector of numeric. |
continuousY |
a vector of numeric. |
maxNbBins |
an integer corresponding to the number of bins limitation (for computation time limitation), maxNbBins=100 by default. |
includeNA |
a boolean. TRUE to include NA value as a factor level. |
showProgress |
a boolean to decide whether to show the progress bar. |
a list of two factors.
# calculate a correlation dataframe data(iris) disc=BeEFdiscretization.numnum(iris$Sepal.Length,iris$Sepal.Width) summary(disc$x) summary(disc$y)
# calculate a correlation dataframe data(iris) disc=BeEFdiscretization.numnum(iris$Sepal.Length,iris$Sepal.Width) summary(disc$x) summary(disc$y)
Computation of a variable clustering on a correlation matrix.
clusterVariables(corMatrix, nbCluster = 1:9)
clusterVariables(corMatrix, nbCluster = 1:9)
corMatrix |
a dataframe corresponding to a correlation matrix |
nbCluster |
an integer or a vector of integers corresponding to the preferred number of cluster for the unsupervised learning. |
a dataframe: the first column contains the variable names, the second column the index of the cluster they are affected to.
# calculate a correlation dataframe data(iris) corDF <- multiBivariateCorrelation(dataset = iris, corMethods = "MaxNMI") # tranform to correlation matrix corMatrix <- corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) # perform the clustering corGroups <- clusterVariables(corMatrix = corMatrix, nbCluster = 3) print(corGroups)
# calculate a correlation dataframe data(iris) corDF <- multiBivariateCorrelation(dataset = iris, corMethods = "MaxNMI") # tranform to correlation matrix corMatrix <- corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) # perform the clustering corGroups <- clusterVariables(corMatrix = corMatrix, nbCluster = 3) print(corGroups)
Transform a 2 column correlation dataframe into a correlation matrix
corCouplesToMatrix(x1_x2_val)
corCouplesToMatrix(x1_x2_val)
x1_x2_val |
a specific dataframe containing correlations values resulting from the function multiBivariateCorrelation() and containing only one coefficient type. |
a dataframe corresponding to a correlation matrix.
# calculate a correlation dataframe data(iris) corDF<-multiBivariateCorrelation(dataset = iris, corMethods = "MaxNMI") corMatrix<-corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) print(corMatrix) corCouples<-matrixToCorCouples(corMatrix,coefName="pearson") print(corCouples)
# calculate a correlation dataframe data(iris) corDF<-multiBivariateCorrelation(dataset = iris, corMethods = "MaxNMI") corMatrix<-corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) print(corMatrix) corCouples<-matrixToCorCouples(corMatrix,coefName="pearson") print(corCouples)
This function creates a shiny app folder containing a shiny app object directly readable by a shiny-server.
createShinyAppFolder(linkspotterObject, folderName)
createShinyAppFolder(linkspotterObject, folderName)
linkspotterObject |
a linkspotter object, resulting from linkspotterComplete() or linkspotterOnFile() functions. |
folderName |
a character string corresponding to the name of the shiny app folder to create. |
data(iris) lsOutputIris<-linkspotterComplete(iris) tmpShinyFolder<-tempdir() createShinyAppFolder(lsOutputIris, folderName=file.path(tmpShinyFolder,"myIrisLinkspotterShinyApp1") ) ## Not run: # launch the shiny app shiny::runApp(tmpShinyFolder) ## End(Not run)
data(iris) lsOutputIris<-linkspotterComplete(iris) tmpShinyFolder<-tempdir() createShinyAppFolder(lsOutputIris, folderName=file.path(tmpShinyFolder,"myIrisLinkspotterShinyApp1") ) ## Not run: # launch the shiny app shiny::runApp(tmpShinyFolder) ## End(Not run)
Discretize a quantitative variable with equal frequency binning if possible
EFdiscretization(continuousX, nX, nbdigitsX = 3)
EFdiscretization(continuousX, nX, nbdigitsX = 3)
continuousX |
a vector of numeric. |
nX |
an integer corresponding to the desired number of intervals. |
nbdigitsX |
number of significant digits to use in constructing levels. Default is 3. |
a factor.
data(iris) disc.Sepal.Length=EFdiscretization(iris$Sepal.Length,5) summary(disc.Sepal.Length)
data(iris) disc.Sepal.Length=EFdiscretization(iris$Sepal.Length,5) summary(disc.Sepal.Length)
This function determines if a given vector of numeric or factor is a non informative variable or not.
is.not.informative.variable(x, includeNA = T)
is.not.informative.variable(x, includeNA = T)
x |
a vector of numeric or factor. |
includeNA |
a boolean. TRUE to include NA value as a factor level. |
data(iris) is.not.informative.variable(iris$Sepal.Length)
data(iris) is.not.informative.variable(iris$Sepal.Length)
Computation of correlation matrices, variable clustering and the customizable user interface to visualize them using a graph together with variables distributions and cross plots.
linkspotterComplete( dataset, targetVar = NULL, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), maxNbBins = 100, defaultMinCor = 0.3, defaultCorMethod = corMethods[length(corMethods)], clusteringCorMethod = defaultCorMethod, nbCluster = 1:9, printInfo = T, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "" )
linkspotterComplete( dataset, targetVar = NULL, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), maxNbBins = 100, defaultMinCor = 0.3, defaultCorMethod = corMethods[length(corMethods)], clusteringCorMethod = defaultCorMethod, nbCluster = 1:9, printInfo = T, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "" )
dataset |
the dataframe which variables bivariate correlations are to be analyzed. |
targetVar |
a vector of character strings corresponding to the names of the target variables. If not NULL, correlation coefficients are computed only with that target variables. |
corMethods |
a vector of correlation coefficients to compute. The available coefficients are
the following : |
maxNbBins |
an integer used if corMethods include 'MaxNMI'. It corresponds to the number of bins limitation (for computation time limitation), maxNbBins=100 by default. |
defaultMinCor |
a double between 0 and 1. It is the minimal correlation absolute value to consider for the first graph plot. |
defaultCorMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the first graph plot. |
clusteringCorMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the variables clustering. |
nbCluster |
an integer. It is the number of clusters to compute. |
printInfo |
a boolean indicating whether to print on the console some information about the dataset and the estimated computation time. |
appTitle |
a string taken as the title of the user interface. |
htmlTop |
a character string that enable to customize your shiny app by adding an HTML code in the HEAD tag. |
htmlBottom |
a character string that enable to customize your shiny app by adding an HTML code at the end of the BODY tag. |
a list containing all the material enabling to analyze correlations:
computationTime
: a string
run_it
: a shiny.appobj object enable to deploy instantly the user interface for a
customizable visualization.
dataset
: the initial dataset
corDF
: a the correlation data.frame including values for all coefficients
corMatrices
: a list of correlation matrices
corGroups
: data.frame a data.frame list
clusteringCorMethod
: a character
defaultMinCor
: a numeric
defaultCorMethod
: a string
corMethods
: vector of strings
# run linkspotter on iris example data data(iris) lsOutputIris<-linkspotterComplete(iris) summary(lsOutputIris) ## Not run: # launch the UI lsOutputIris$launchShiny(option=list(port=8000)) ## End(Not run)
# run linkspotter on iris example data data(iris) lsOutputIris<-linkspotterComplete(iris) summary(lsOutputIris) ## Not run: # launch the UI lsOutputIris$launchShiny(option=list(port=8000)) ## End(Not run)
plot the Linkspotter graph
linkspotterGraph( corDF, variablesClustering = NULL, minCor = 0.3, corMethod = colnames(corDF)[-c(1:3, ncol(corDF))][length(colnames(corDF)[-c(1:3, ncol(corDF))])], smoothEdges = T, dynamicNodes = F, colorEdgesByCorDirection = F )
linkspotterGraph( corDF, variablesClustering = NULL, minCor = 0.3, corMethod = colnames(corDF)[-c(1:3, ncol(corDF))][length(colnames(corDF)[-c(1:3, ncol(corDF))])], smoothEdges = T, dynamicNodes = F, colorEdgesByCorDirection = F )
corDF |
a specific dataframe containing correlations values resulting from the function multiBivariateCorrelation() |
variablesClustering |
a specific dataframe containing the output of the variable clustering resulting from the function clusterVariables() |
minCor |
a double between 0 and 1. It is the minimal correlation absolute value to consider for the first graph plot. |
corMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the first graph plot. |
smoothEdges |
a boolean. TRUE to let the edges be smooth. |
dynamicNodes |
a boolean. TRUE to let the graph re-organize itself after any movement. |
colorEdgesByCorDirection |
a boolean. TRUE to get the edges colored according to the correlation direction (positive-> blue, negative->red or NA->grey). |
a visNetwork object corresponding to a dynamic graph for the correlation matrix visualization.
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"spearman")]) corGroups=clusterVariables(corMatrix = corMatrix, nbCluster = 3) # launch the graph linkspotterGraph(corDF=corDF, variablesClustering=corGroups, minCor=0.3, corMethod='spearman', colorEdgesByCorDirection=TRUE)
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"spearman")]) corGroups=clusterVariables(corMatrix = corMatrix, nbCluster = 3) # launch the graph linkspotterGraph(corDF=corDF, variablesClustering=corGroups, minCor=0.3, corMethod='spearman', colorEdgesByCorDirection=TRUE)
Plot the Linkspotter graph from a correlation matrix.
linkspotterGraphOnMatrix( corMatrix, cluster = FALSE, variablesClustering = NULL, minCor = 0.3, corMethod = "Coef.", smoothEdges = T, dynamicNodes = F, colorEdgesByCorDirection = F )
linkspotterGraphOnMatrix( corMatrix, cluster = FALSE, variablesClustering = NULL, minCor = 0.3, corMethod = "Coef.", smoothEdges = T, dynamicNodes = F, colorEdgesByCorDirection = F )
corMatrix |
a dataframe corresponding to a matrix of correlation or distance. |
cluster |
a boolean to decide if to cluster variables or an integer corresponding directly to the number of clusters to consider. If variablesClustering is filled, "cluster" parameter is ignored. |
variablesClustering |
a specific dataframe containing the output of the variable clustering resulting from the function clusterVariables() |
minCor |
a double between 0 and 1. It is the minimal correlation absolute value to consider for the first graph plot. |
corMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the first graph plot. |
smoothEdges |
a boolean. TRUE to let the edges be smooth. |
dynamicNodes |
a boolean. TRUE to let the graph re-organize itself after any movement. |
colorEdgesByCorDirection |
a boolean. TRUE to get the edges colored according to the correlation direction (positive-> blue, negative->red or NA->grey). |
a visNetwork object corresponding to a dynamic graph for the correlation matrix visualization.
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"pearson")]) # launch the graph linkspotterGraphOnMatrix(corMatrix=corMatrix, minCor=0.3)
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"pearson")]) # launch the graph linkspotterGraphOnMatrix(corMatrix=corMatrix, minCor=0.3)
This function imports an external dataset, computes its correlation matrices, variable clustering and the customizable user interface to visualize them using a graph.
linkspotterOnFile( file, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), defaultMinCor = 0.3, defaultCorMethod = corMethods[length(corMethods)], clusteringCorMethod = corMethods[length(corMethods)], nbCluster = 1:9, printInfo = T, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "", ... )
linkspotterOnFile( file, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), defaultMinCor = 0.3, defaultCorMethod = corMethods[length(corMethods)], clusteringCorMethod = corMethods[length(corMethods)], nbCluster = 1:9, printInfo = T, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "", ... )
file |
the file containing a structured dataset which the bivariate correlations are to be analyzed. |
corMethods |
a vector of correlation coefficients to compute. The available coefficients
are the following : |
defaultMinCor |
a double between 0 and 1. It is the minimal correlation absolute value to consider for the first graph plot. |
defaultCorMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the first graph plot. |
clusteringCorMethod |
a string. One of "pearson","spearman","kendall","mic", "distCor" or "MaxNMI". It is the correlation coefficient to consider for the variables clustering. |
nbCluster |
an integer. It is the number of clusters to compute. |
printInfo |
a boolean indicating whether to print on the console some information about the dataset and the estimated computation time. |
appTitle |
a string taken as the title of the user interface. |
htmlTop |
a character string that enable to customize your shiny app by adding an HTML code in the HEAD tag. |
htmlBottom |
a character string that enable to customize your shiny app by adding an HTML code at the end of the BODY tag. |
... |
Further arguments to be passed to the used read.csv function. |
a list containing all the material enabling to analyze correlations:
computationTime
: a string
run_it
: a shiny.appobj object enable to deploy instantly the user interface
for a customizable visualization.
dataset
: the initial dataset
corDF
: a the correlation data.frame including values for all coefficients
corMatrices
: a list of correlation matrices
corGroups
: data.frame a data.frame list
clusteringCorMethod
: a character
defaultMinCor
: a numeric
defaultCorMethod
: a string
corMethods
: vector of strings
# run linkspotter on iris example data data(iris) tmpCSV<-tempfile(fileext = '.csv') write.csv(iris, tmpCSV, row.names = FALSE) lsOutputIrisFromFile<-linkspotterOnFile(file=tmpCSV) summary(lsOutputIrisFromFile) ## Not run: # launch the UI lsOutputIrisFromFile$launchShiny(options=list(port=8000)) ## End(Not run)
# run linkspotter on iris example data data(iris) tmpCSV<-tempfile(fileext = '.csv') write.csv(iris, tmpCSV, row.names = FALSE) lsOutputIrisFromFile<-linkspotterOnFile(file=tmpCSV) summary(lsOutputIrisFromFile) ## Not run: # launch the UI lsOutputIrisFromFile$launchShiny(options=list(port=8000)) ## End(Not run)
Build the Linkspotter user interface
linkspotterUI( dataset, corDF, variablesClustering = NULL, defaultMinCor = 0.3, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "", ... )
linkspotterUI( dataset, corDF, variablesClustering = NULL, defaultMinCor = 0.3, appTitle = "Linkspotter", htmlTop = "", htmlBottom = "", ... )
dataset |
the dataframe which variables bivariate correlations are contained in corDF |
corDF |
a specific dataframe containing correlations values resulting from the function multiBivariateCorrelation() |
variablesClustering |
a specific dataframe containing the output of the variable clustering resulting from the function clusterVariables() |
defaultMinCor |
a double between 0 and 1. It is the minimal correlation absolute value to consider for the first graph plot. |
appTitle |
a character string taken as the title of the user interface. |
htmlTop |
a character string that enable to customize your shiny app by adding an HTML code in the HEAD tag. |
htmlBottom |
a character string that enable to customize your shiny app by adding an HTML code at the end of the BODY tag. |
... |
: arguments for 'shiny::shinyApp' function |
a 'shiny.appobj' object enable to deploy instantly the user interface for a customizable visualization.
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) corGroups=clusterVariables(corMatrix = corMatrix, nbCluster = 3) ## Not run: # launch the UI linkspotterUI(dataset=iris, corDF=corDF, variablesClustering=corGroups, defaultMinCor=0.3,cappTitle="Linkspotter on iris data", options = list(port=8000) ) ## End(Not run)
# calculate a correlation dataframe data(iris) corDF=multiBivariateCorrelation(dataset = iris) corMatrix=corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"MaxNMI")]) corGroups=clusterVariables(corMatrix = corMatrix, nbCluster = 3) ## Not run: # launch the UI linkspotterUI(dataset=iris, corDF=corDF, variablesClustering=corGroups, defaultMinCor=0.3,cappTitle="Linkspotter on iris data", options = list(port=8000) ) ## End(Not run)
Transform a correlation matrix into a correlation couples dataframe
matrixToCorCouples(matrix, coefName = "Coef.", sortByDescAbs = F)
matrixToCorCouples(matrix, coefName = "Coef.", sortByDescAbs = F)
matrix |
a dataframe corresponding to a matrix of correlation. |
coefName |
a string: the name of the coefficient the values of the matrix represent. |
sortByDescAbs |
a boolean to decide if to sort by descending absolute value of the coefficient. |
a dataframe corresponding to all correlation couples from the matrix.
# calculate a correlation dataframe data(iris) corDF<-multiBivariateCorrelation(dataset = iris) corMatrix<-corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"pearson")]) print(corMatrix) corCouples<-matrixToCorCouples(matrix = corMatrix,coefName="pearson") print(corCouples)
# calculate a correlation dataframe data(iris) corDF<-multiBivariateCorrelation(dataset = iris) corMatrix<-corCouplesToMatrix(x1_x2_val = corDF[,c('X1','X2',"pearson")]) print(corMatrix) corCouples<-matrixToCorCouples(matrix = corMatrix,coefName="pearson") print(corCouples)
Computes the MaxNMI between the two variables whatever their types, by discretizing using Best Equal-Frequency-based discretization (BeEF) if necessary.
maxNMI(x, y, includeNA = T, maxNbBins = 100, showProgress = F)
maxNMI(x, y, includeNA = T, maxNbBins = 100, showProgress = F)
x |
a vector of numeric or factor. |
y |
a vector of numeric or factor. |
includeNA |
a boolean. TRUE to include NA value as a factor level. |
maxNbBins |
an integer corresponding to the number of bins limitation (for computation time limitation), maxNbBins=100 by default. |
showProgress |
a boolean to decide whether to show the progress bar. |
a double between 0 and 1 corresponding to the MaxNMI.
# calculate a correlation dataframe data(iris) maxNMI(iris$Sepal.Length,iris$Species) maxNMI(iris$Sepal.Length,iris$Sepal.Width)
# calculate a correlation dataframe data(iris) maxNMI(iris$Sepal.Length,iris$Species) maxNMI(iris$Sepal.Length,iris$Sepal.Width)
Computation of a correlation dataframe.
multiBivariateCorrelation( dataset, targetVar = NULL, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), maxNbBins = 100, showProgress = T )
multiBivariateCorrelation( dataset, targetVar = NULL, corMethods = c("pearson", "spearman", "kendall", "mic", "MaxNMI"), maxNbBins = 100, showProgress = T )
dataset |
the dataframe which variables bivariate correlations are to be analyzed. |
targetVar |
a vector of character strings corresponding to the names of the target variables. If not NULL, correlation coefficients are computed only with that target variables. |
corMethods |
a vector of correlation coefficients to compute. The available coefficients are the following : |
maxNbBins |
an integer used if corMethods include 'MaxNMI'. It corresponds to the number of bins limitation (for computation time limitation), maxNbBins=100 by default. |
showProgress |
a boolean to decide whether to show the progress bar. |
a specific dataframe containing correlations values or each specified correlation coefficient.
# run linkspotter on iris example data data(iris) corDF<-multiBivariateCorrelation(iris) print(corDF)
# run linkspotter on iris example data data(iris) corDF<-multiBivariateCorrelation(iris) print(corDF)
Calculate the MaxNMI relationship measurement for 2 categorical variables
NormalizedMI(x, y, includeNA = T)
NormalizedMI(x, y, includeNA = T)
x |
a vector of factor. |
y |
a vector of factor. |
includeNA |
a boolean. TRUE to include NA value as a factor level. |
a double between 0 and 1 corresponding to the MaxNMI.
# calculate a correlation dataframe data(iris) discreteSepalLength=BeEFdiscretization.numfact(continuousY=iris$Sepal.Length,factorX=iris$Species) NormalizedMI(iris$Species,discreteSepalLength)
# calculate a correlation dataframe data(iris) discreteSepalLength=BeEFdiscretization.numfact(continuousY=iris$Sepal.Length,factorX=iris$Species) NormalizedMI(iris$Species,discreteSepalLength)