ds.asFactor {dsBaseClient}R Documentation

Converts a server-side numeric vector into a factor

Description

This function assigns a server-side numeric vector into a factor type.

Usage

ds.asFactor(
  input.var.name = NULL,
  newobj.name = NULL,
  forced.factor.levels = NULL,
  fixed.dummy.vars = FALSE,
  baseline.level = 1,
  datasources = NULL
)

Arguments

input.var.name

a character string which provides the name of the variable to be converted to a factor.

newobj.name

a character string that provides the name for the output variable that is stored on the data servers. Default asfactor.newobj.

forced.factor.levels

the levels that the user wants to split the input variable. If NULL (default) a vector with all unique levels from all studies are created.

fixed.dummy.vars

boolean. If TRUE the input variable is converted to a factor but presented as a matrix of dummy variables. If FALSE (default) the input variable is converted to a factor and assigned as a vector.

baseline.level

an integer indicating the baseline level to be used in the creation of the matrix with dummy variables. If the fixed.dummy.vars is set to FALSE then any value of the baseline level is not taken into account.

datasources

a list of DSConnection-class objects obtained after login. If the datasources argument is not specified the default set of connections will be used: see datashield.connections_default.

Details

Converts a numeric vector into a factor type which is represented either as a vector or as a matrix of dummy variables depending on the argument fixed.dummy.vars. The matrix of dummy variables also depends on the argument baseline.level.

To understand how the matrix of the dummy variable is created let's assume that we have the vector (1, 2, 1, 3, 4, 4, 1, 3, 4, 5) of ten integer numbers. If we set the argument fixed.dummy.vars = TRUE, baseline.level = 1 and forced.factor.levels = c(1,2,3,4,5). The input vector is converted to the following matrix of dummy variables:

DV2 DV3 DV4 DV5
0 0 0 0
1 0 0 0
0 0 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

For the same example if the baseline.level = 3 then the matrix is:

DV1 DV2 DV4 DV5
1 0 0 0
0 1 0 0
1 0 0 0
0 0 0 0
0 0 1 0
0 0 1 0
1 0 0 0
0 0 0 0
0 0 1 0
0 0 0 1

In the first instance the first row of the matrix has zeros in all entries indicating that the first data point belongs to level 1 (as the baseline level is equal to 1). The second row has 1 at the first (DV2) column and zeros elsewhere, indicating that the second data point belongs to level 2. In the second instance (second matrix) where the baseline level is equal to 3, the first row of the matrix has 1 at the first (DV1) column and zeros elsewhere, indicating again that the first data point belongs to level 1. Also as we can see the fourth row of the second matrix has all its elements equal to zero indicating that the fourth data point belongs to level 3 (as the baseline level, in that case, is 3).

If the baseline.level is set to be equal to a value that is not one of the levels of the factor then a matrix of dummy variables is created having as many columns as the number of levels. In that case in each row there is a unique entry equal to 1 at a certain column indicating the level of each data point. So, for the above example where the vector has five levels if we set the baseline.level equal to a value that does not belong to those five levels (baseline.level=8) the matrix of dummy variables is:

DV1 DV2 DV3 DV4 DV5
1 0 0 0 0
0 1 0 0 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 1 0
1 0 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

Server functions called: asFactorDS1 and asFactorDS2

Value

ds.asFactor returns the unique levels of the converted variable in ascending order and a validity message with the name of the created object on the client-side and the output matrix or vector in the server-side.

Author(s)

DataSHIELD Development Team

Examples

## Not run: 
 
  ## Version 6, for version 5 see Wiki
  # Connecting to the Opal servers
  
  require('DSI')
  require('DSOpal')
  require('dsBaseClient')
  builder <- DSI::newDSLoginBuilder()
  builder$append(server = "study1", 
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM1", driver = "OpalDriver")
  builder$append(server = "study2", 
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM2", driver = "OpalDriver")
  builder$append(server = "study3",
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM3", driver = "OpalDriver")
  logindata <- builder$build()
  
  # Log onto the remote Opal training servers
  connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D") 

  ds.asFactor(input.var.name = "D$PM_BMI_CATEGORICAL", 
              newobj.name = "fact.obj", 
              forced.factor.levels = NULL, #a vector with all unique levels 
                                           #from all studies is created
              fixed.dummy.vars = TRUE, #create a matrix of dummy variables
              baseline.level = 1,
              datasources = connections)#all the Opal servers are used, in this case 3 
                                        #(see above the connection to the servers) 
  ds.asFactor(input.var.name = "D$PM_BMI_CATEGORICAL", 
              newobj.name = "fact.obj", 
              forced.factor.levels = c(2,3), #the variable is split in 2 levels
              fixed.dummy.vars = TRUE, #create a matrix of dummy variables
              baseline.level = 1,
              datasources = connections[1])#only the first Opal server is used ("study1")
              
   # Clear the Datashield R sessions and logout  
   datashield.logout(connections) 



## End(Not run)

[Package dsBaseClient version 6.1.1 ]