ds.cor {dsBaseClient}R Documentation

Calculates the correlation between two variables


This function calculates the correlation of two variables or the correlation matrix for the vairables of an input dataframe


ds.cor(x = NULL, y = NULL, naAction = "pairwise.complete",
  type = "split", datasources = NULL)



a character, the name of a vector, matrix or dataframe of variable(s) for which the correlation(s) is (are) calculated for.


NULL (default) or the name of a vector, matrix or dataframe with compatible dimensions to x.


a character string giving a method for computing correlations in the presence of missing values. This must be one of the strings "casewise.complete" or "pairwise.complete". If use is set to 'casewise.complete', then the function omits all the rows in the whole dataframe that include at least one cell with a missing value before the calculation of correlations. If use is set to 'pairwise.complete' (default), then the function divides the input dataframe to subset subset dataframes formed by each pair between two variables (all combinations are considered) and omits the rows with missing values at each pair separately and then calculates the correlations of those pairs.


a character which represents the type of analysis to carry out. If type is set to 'split' (default), the correlation of two variables or the variance-correlation matrix of an input dataframe and the number of complete cases and missing values are returned for each single study. If type is set to 'combine', the pooled correlation, the total number of complete cases and the total number of missing values aggregated from all the involved studies, are returned.


a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources.


In addition to computing correlations; this function, produces a table outlining the number of complete cases and a table outlining the number of missing values to allow for the user to make a decision about the 'relevance' of the correlation based on the number of complete cases included in the correlation calculations.


a list containing the number of missing values in each variable, the number of missing variables casewise or paiwise depending on the argument use, the correlation matrix, the number of used complete cases and an error message which indicates whether or not the input variables pass the disclosure control (i.e. none of them is dichotomous with a level having less counts than the pre-specified threshold). If any of the input variables does not pass the disclosure control then all the output values are replaced with NAs. If all the variables are valid and pass the control, then the output matrices are returned and also an error message is returned but it is replaced by NA.


Gaye A; Avraam D; Burton PR


## Not run: 

#  # load that contains the login details
#  data(glmLoginData)
#  library(opal)
#  # login and assign specific variable(s)
#  # (by default the assigned dataset is a dataframe named 'D')
#  myvar <- list('LAB_HDL', 'LAB_TSC', 'LAB_GLUC_ADJUSTED', 'GENDER')
#  opals <- opal::datashield.login(logins=glmLoginData, assign=TRUE, variables=myvar)
#  # Example 1: generate the correlation matrix for the assigned dataset 'D' 
#  # which contains 4 vectors (3 continuous and 1 categorical)
#  ds.cor(x='D')
#  # Example 2: generate the correlation matrix for the dataset 'D' combined for all 
#  # studies and removing any missing values casewise 
#  ds.cor(x='D', naAction='casewise.complete', type='combine')
#  # Example 3: calculate the correlation between two vectors 
#  # (first assign the vectors from 'D')
#  ds.assign(newobj='labhdl', toAssign='D$LAB_HDL')
#  ds.assign(newobj='labtsc', toAssign='D$LAB_TSC')
#  ds.assign(newobj='gender', toAssign='D$GENDER')
#  ds.cor(x='labhdl', y='labtsc', naAction='pairwise.complete', type='combine')
#  ds.cor(x='labhdl', y='labtsc', naAction='casewise.complete', type='combine')
#  ds.cor(x='labhdl', y='gender', naAction='pairwise.complete', type='combine')
#  ds.cor(x='labhdl', y='gender', naAction='casewise.complete', type='combine')
#  # clear the Datashield R sessions and logout
#  opal::datashield.logout(opals)

## End(Not run)

[Package dsBaseClient version 5.0.0 ]