ds.cov {dsBaseClient}R Documentation

Calculates the covariance between two variables

Description

This function calculates the covariance of two variables or the variance-covariance matrix for the vairables of an input dataframe

Usage

ds.cov(x = NULL, y = NULL, naAction = "pairwise.complete",
  type = "split", datasources = NULL)

Arguments

x

a character, the name of a vector, matrix or dataframe of variable(s) for which the covariance(s) is (are) calculated for.

y

NULL (default) or the name of a vector, matrix or dataframe with compatible dimensions to x.

naAction

a character string giving a method for computing covariances in the presence of missing values. This must be one of the strings "casewise.complete" or "pairwise.complete". If use is set to 'casewise.complete', then the function omits all the rows in the whole dataframe that include at least one cell with a missing value before the calculation of covariances. If use is set to 'pairwise.complete' (default), then the function divides the input dataframe to subset subset dataframes formed by each pair between two variables (all combinations are considered) and omits the rows with missing values at each pair separately and then calculates the covariances of those pairs.

type

a character which represents the type of analysis to carry out. If type is set to 'split' (default), the covariance of two variables or the variance-covariance matrix of an input dataframe and the number of complete cases and missing values are returned for each single study. If type is set to 'combine', the pooled covariance, the total number of complete cases and the total number of missing values aggregated from all the involved studies, are returned.

datasources

a list of opal object(s) obtained after login in to opal servers; these objects hold also the data assign to R, as dataframe, from opal datasources.

Details

In addition to computing covariances; this function, produces a table outlining the number of complete cases and a table outlining the number of missing values to allow for the user to make a decision about the 'relevance' of the covariance based on the number of complete cases included in the covariance calculations.

Value

a list containing the number of missing values in each variable, the number of missing variables casewise or paiwise depending on the argument use, the covariance matrix, the number of used complete cases and an error message which indicates whether or not the input variables pass the disclosure control (i.e. none of them is dichotomous with a level having less counts than the pre-specified threshold). If any of the input variables does not pass the disclosure control then all the output values are replaced with NAs. If all the variables are valid and pass the control, then the output matrices are returned and also an error message is returned but it is replaced by NA.

Author(s)

Gaye A; Avraam D; Burton PR

Examples

## Not run: 

#  # load that contains the login details
#  data(glmLoginData)
#  library(opal)
#
#  # login and assign specific variable(s)
#  # (by default the assigned dataset is a dataframe named 'D')
#  myvar <- list('LAB_HDL', 'LAB_TSC', 'LAB_GLUC_ADJUSTED', 'GENDER')
#  opals <- opal::datashield.login(logins=glmLoginData, assign=TRUE, variables=myvar)
#
#  # Example 1: generate the covariance matrix for the assigned dataset 'D' 
#  # which contains 4 vectors (3 continuous and 1 categorical)
#  ds.cov(x='D')
#
#  # Example 2: generate the covariance matrix for the dataset 'D' combined for all 
#  # studies and removing any missing values casewise 
#  ds.cov(x='D', naAction='casewise.complete', type='combine')
#
#  # Example 3: calculate the covariance between two vectors 
#  # (first assign the vectors from 'D')
#  ds.assign(newobj='labhdl', toAssign='D$LAB_HDL')
#  ds.assign(newobj='labtsc', toAssign='D$LAB_TSC')
#  ds.assign(newobj='gender', toAssign='D$GENDER')
#  ds.cov(x='labhdl', y='labtsc', naAction='pairwise.complete', type='combine')
#  ds.cov(x='labhdl', y='labtsc', naAction='casewise.complete', type='combine')
#  ds.cov(x='labhdl', y='gender', naAction='pairwise.complete', type='combine')
#  ds.cov(x='labhdl', y='gender', naAction='casewise.complete', type='combine')
#
#  # clear the Datashield R sessions and logout
#  opal::datashield.logout(opals)


## End(Not run)


[Package dsBaseClient version 5.0.0 ]