ds.dataFrameSubset {dsBaseClient} | R Documentation |
Subsets a data frame by rows and/or by columns.
ds.dataFrameSubset(
df.name = NULL,
V1.name = NULL,
V2.name = NULL,
Boolean.operator = NULL,
keep.cols = NULL,
rm.cols = NULL,
keep.NAs = NULL,
newobj = NULL,
datasources = NULL,
notify.of.progress = FALSE
)
df.name |
a character string providing the name of the data frame to be subseted. |
V1.name |
A character string specifying the name of the vector to which the Boolean operator is to be applied to define the subset. For more information see details. |
V2.name |
A character string specifying the name of the vector to compare
with |
Boolean.operator |
A character string specifying one of six possible Boolean operators:
|
keep.cols |
a numeric vector specifying the numbers of the columns to be kept in the final subset. |
rm.cols |
a numeric vector specifying the numbers of the columns to be removed from the final subset. |
keep.NAs |
logical, if TRUE the missing values are included in the subset. If FALSE or NULL all rows with at least one missing values are removed from the subset. |
newobj |
a character string that provides the name for the output
object that is stored on the data servers. Default |
datasources |
a list of |
notify.of.progress |
specifies if console output should be produced to indicate progress. Default FALSE. |
Subset a pre-existing data frame using the standard
set of Boolean operators (==, !=, >, >=, <, <=
).
The subsetting is made by rows, but it is also possible to select
columns to keep or remove. Instead, if you
wish to keep all rows in the subset (e.g. if the primary plan is to subset by columns
and not by rows) the V1.name
and V2.name
parameters can be used
to specify a vector of the same length
as the data frame to be subsetted in each study in which every element is 1 and
there are no missing values. For more information see the example 2 below.
Server functions called: dataFrameSubsetDS1
and dataFrameSubsetDS2
ds.dataFrameSubset
returns
the object specified by the newobj
argument
which is written to the server-side.
Also, two validity messages are returned to the client-side indicating
the name of the newobj
which has been created in each data source
and if it is in a valid form.
DataSHIELD Development Team
## Not run:
## Version 6, for version 5 see the Wiki
# connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Subsetting a data frame
#Example 1: Include some rows and all columns in the subset
ds.dataFrameSubset(df.name = "D",
V1.name = "D$LAB_TSC",
V2.name = "D$LAB_TRIG",
Boolean.operator = ">",
keep.cols = NULL, #All columns are included in the new subset
rm.cols = NULL, #All columns are included in the new subset
keep.NAs = FALSE, #All rows with NAs are removed
newobj = "new.subset",
datasources = connections[1],#only the first server is used ("study1")
notify.of.progress = FALSE)
#Example 2: Include all rows and some columns in the new subset
#Select complete cases (rows without NA)
ds.completeCases(x1 = "D",
newobj = "complet",
datasources = connections)
#Create a vector with all ones
ds.make(toAssign = "complet$LAB_TSC-complet$LAB_TSC+1",
newobj = "ONES",
datasources = connections)
#Subset the data
ds.dataFrameSubset(df.name = "complet",
V1.name = "ONES",
V2.name = "ONES",
Boolean.operator = "==",
keep.cols = c(1:4,10), #only columns 1, 2, 3, 4 and 10 are selected
rm.cols = NULL,
keep.NAs = FALSE,
newobj = "subset.all.rows",
datasources = connections, #all servers are used
notify.of.progress = FALSE)
# Clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)