ds.meanSdGp {dsBaseClient} | R Documentation |

This function calculates the mean and SD of a continuous variable for each class of a single factor.

```
ds.meanSdGp(
x = NULL,
y = NULL,
type = "both",
do.checks = FALSE,
datasources = NULL
)
```

`x` |
a character string specifying the name of a numeric continuous variable. |

`y` |
a character string specifying the name of a categorical variable of class factor. |

`type` |
a character string that represents the type of analysis to carry out.
This can be set as: |

`do.checks` |
logical. If TRUE the administrative checks are undertaken to ensure that the input objects are defined in all studies and that the variables are of equivalent class in each study. Default is FALSE to save time. |

`datasources` |
a list of |

This function calculates the mean, standard deviation (SD), N (number of observations) and the standard error of the mean (SEM) of a continuous variable broken down into subgroups defined by a single factor.

There are important differences between `ds.meanSdGp`

function compared to
the function `ds.meanByClass`

:

(A) `ds.meanSdGp`

does not actually subset the data it simply calculates the required statistics
and reports them. This means you cannot use this function if you wish to physically break the
data into subsets. On the other hand, it makes the function very much faster than `ds.meanByClass`

if you do not need to create physical subsets.

(B) `ds.meanByClass`

allows you to specify up to
three categorising factors, but `ds.meanSdGp`

only allows one. However, this is not a serious
problem. If you have two factors (e.g. sex with two levels `[0,1]`

and `BMI.categorical`

with
three levels `[1,2,3]`

) you simply need to create a new factor that combines the two together in a
way that gives each combination of levels a different value in the new factor. So, in the
example given, the calculation `newfactor = (3*sex) + BMI`

gives you six values:

(1) `sex = 0`

and `BMI = 1`

-> `newfactor = 1`

(2) `sex = 0`

and `BMI = 2`

-> `newfactor = 2`

(3) `sex = 0`

and `BMI = 3`

-> `newfactor = 3`

(4) `sex = 1`

and `BMI = 1`

-> `newfactor = 4`

(5) `sex = 1`

and `BMI = 2`

-> `newfactor = 5`

(6) `sex = 1`

and `BMI = 3`

-> `newfactor = 6`

(C) At present, `ds.meanByClass`

calculates the sample size in each group to mean the
total sample size (i.e. it
includes all observations in each group regardless of whether or not they include missing values
for the continuous variable or the factor). The calculation of sample size in each group by
`ds.meanSdGp`

always reports the number of observations that are non-missing both for the
continuous variable and the factor. This makes sense - in the case of `ds.meanByClass`

,
the total size of the physical subsets was important,
but when it comes down only to `ds.meanSdGp`

which
undertakes analysis without physical subsetting, it is only the observations with non-missing
values in both variables that contribute to the calculation of means and SDs within each group
and so it is logical to consider those counts as primary. The only reference `ds.meanSdGp`

makes
to missing counts is in the reporting of `Ntotal`

and `Nmissing`

overall (ie not broken down by
group).

For the future, we plan to extend `ds.meanByClass`

to report both total and non-missing
counts in subgroups.

Depending on the variable `type`

can be carried out different analysis:

(1) `"combine"`

: a pooled table of results is generated.

(2) `"split"`

a table of results is generated for each study.

(3) `"both"`

both sets of outputs are produced.

Server function called: `meanSdGpDS`

`ds.meanSdGp`

returns to the client-side the mean, SD, Nvalid and SEM combined
across studies and/or separately for each study, depending on the argument `type`

.

DataSHIELD Development Team

`ds.subsetByClass`

to subset by the classes of factor vector(s).

`ds.subset`

to subset by complete cases (i.e. removing missing values), threshold,
columns and rows.

```
## Not run:
## Version 6, for version 5 see the Wiki
# connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "SURVIVAL.EXPAND_NO_MISSING1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "SURVIVAL.EXPAND_NO_MISSING2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "SURVIVAL.EXPAND_NO_MISSING3", driver = "OpalDriver")
logindata <- builder$build()
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
#Example 1: Calculate the mean, SD, Nvalid and SEM of the continuous variable age.60 (age in
#years centralised at 60), broken down by time.id (a six level factor relating to survival time)
#and report the pooled results combined across studies.
ds.meanSdGp(x = "D$age.60",
y = "D$time.id",
type = "combine",
do.checks = FALSE,
datasources = connections)
#Example 2: Calculate the mean, SD, Nvalid and SEM of the continuous variable age.60 (age in
#years centralised at 60), broken down by time.id (a six level factor relating to survival time)
#and report both study-specific results and the pooled results combined across studies.
#Save the returned output to msg.b.
ds.meanSdGp(x = "D$age.60",
y = "D$time.id",
type = "both",
do.checks = FALSE,
datasources = connections)
# clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)
```

[Package *dsBaseClient* version 6.3.0 ]