ds.dataFrame {dsBaseClient}R Documentation

ds.dataFrame calling dataFrameDS

Description

Creates a data frame from its elemental components: pre-existing data frames; single variables; matrices

Usage

ds.dataFrame(x = NULL, row.names = NULL, check.rows = FALSE,
  check.names = TRUE, stringsAsFactors = TRUE, completeCases = FALSE,
  DataSHIELD.checks = FALSE, newobj = "df_new", datasources = NULL,
  notify.of.progress = FALSE)

Arguments

x

This is a vector of character strings representing the names of the elemental components to be combined. For example, the call: ds.dataFrame(x=c('DF_input','matrix.m','var_age'),newobj='DF_output') will combine a pre-existing data.frame called DF_input with a matrix and a variable called var_age. The output will be the combined data.frame DF_output. As many elemental components as needed may be combined in any order e.g. 3 data.frames, 7 variables and 2 matrices. For convenience the x argument can alternatively be specified in a two step procedure, the first being a call to the native R environment on the client server: x.components<-c('DF_input1','matrix.m','DF_input2', 'var_age'); ds.dataFrame(x=x.components,newobj='DF_output')

row.names

NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.

check.rows

if TRUE then the rows are checked for consistency of length and names.

check.names

logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are. As a slight modification to the standard data.frame() function in native R, if any column names are duplicated, the second and subsequent occurances are given the suffixes .1, .2 etc by ds.dataFrame and so there are never any duplicates when check.names is invoked by the serverside function dataFrameDS

stringsAsFactors

logical: should character vectors be converted to factors? The 'factory-fresh' default is TRUE.

completeCases

logical. Default FALSE. If TRUE then any rows with missing values in any of the elemental components of the final output data.frame will be deleted.

DataSHIELD.checks

logical: If TRUE undertakes all DataSHIELD checks (time consuming). Default FALSE.

newobj

This a character string providing a name for the output data.frame which defaults to 'df_new' if no name is specified.

datasources

specifies the particular opal object(s) to use. If the <datasources> argument is not specified the default set of opals will be used. The default opals are called default.opals and the default can be set using the function ds.setDefaultOpals. If the <datasources> is to be specified, it should be set without inverted commas: e.g. datasources=opals.em or datasources=default.opals. If you wish to apply the function solely to e.g. the second opal server in a set of three, the argument can be specified as: e.g. datasources=opals.em[2]. If you wish to specify the first and third opal servers in a set you specify: e.g. datasources=opals.em[c(1,3)]

notify.of.progress

specifies if console output should be produce to indicate progress. The default value for notify.of.progress is FALSE.

Details

A data frame is a list of variables all with the same number of rows with unique row names, which is of class 'data.frame'. ds.dataFrame will create a data frame by combining a series of elemental components which may be pre-existing data.frames, matrices or variables. A critical requirement is that the length of all component variables, and the number of rows of the component data.frames or matrices must all be the same. The output data.frame will then have this same number of rows. ds.dataFrame calls the serverside function dataFrameDS which is almost the same as the native R function data.frame() and so several of the arguments are precisely the same as for data.frame()

Value

the object specified by the <newobj> argument (or default name <df_new>). which is written to the serverside. In addition, two validity messages are returned indicating whether <newobj> has been created in each data source and if so whether it is in a valid form. If its form is not valid in at least one study - e.g. because a disclosure trap was tripped and creation of the full output object was blocked - ds.dataFrame() also returns any studysideMessages that can explain the error in creating the full output object. As well as appearing on the screen at run time,if you wish to see the relevant studysideMessages at a later date you can use the ds.message function. If you type ds.message("newobj") it will print out the relevant studysideMessage from any datasource in which there was an error in creating <newobj> and a studysideMessage was saved. If there was no error and <newobj> was created without problems no studysideMessage will have been saved and ds.message("newobj") will return the message: "ALL OK: there are no studysideMessage(s) on this datasource".

Author(s)

DataSHIELD Development Team


[Package dsBaseClient version 5.0.0 ]