ds.histogram {dsBaseClient} | R Documentation |

`ds.histogram`

function plots a non-disclosive histogram in the client-side.

```
ds.histogram(
x = NULL,
type = "split",
num.breaks = 10,
method = "smallCellsRule",
k = 3,
noise = 0.25,
vertical.axis = "Frequency",
datasources = NULL
)
```

`x` |
a character string specifying the name of a numerical vector. |

`type` |
a character string that represents the type of graph to display.
The |

`num.breaks` |
a numeric specifying the number of breaks of the histogram. Default value
is |

`method` |
a character string that defines which histogram will be created.
The |

`k` |
the number of the nearest neighbours for which their centroid is calculated.
Default |

`noise` |
the percentage of the initial variance that is used as the variance of the embedded
noise if the argument |

`vertical.axis, ` |
a character string that defines what is shown in the vertical axis of the
plot. The |

`datasources` |
a list of |

`ds.histogram`

function allows the user to plot
distinct histograms (one for each study) or a combined histogram that merges
the single plots.

In the argument `type`

can be specified two types of graphics to display:

`'combine'`

: a histogram that merges the single plot is displayed.`'split'`

: each histogram is plotted separately.

In the argument `method`

can be specified 3 different histograms to be created:

`'smallCellsRule'`

: the histogram of the actual variable is created but bins with low counts are removed.`'deterministic'`

: the histogram of the scaled centroids of each`k`

nearest neighbours of the original variable where the value of`k`

is set by the user.`'probabilistic'`

: the histogram shows the original distribution disturbed by the addition of random stochastic noise. The added noise follows a normal distribution with zero mean and variance equal to a percentage of the initial variance of the input variable. This percentage is specified by the user in the argument`noise`

.

In the `k`

argument the user can choose any value for `k`

equal
to or greater than the pre-specified threshold
used as a disclosure control for this method and lower than the number of observations
minus the value of this threshold. By default the value of `k`

is set to be equal to 3
(we suggest k to be equal to, or bigger than, 3). Note that the function fails if the user
uses the default value but the study has set a bigger threshold.
The value of `k`

is used only if the argument
`method`

is set to `'deterministic'`

.
Any value of k is ignored if the
argument `method`

is set to `'probabilistic'`

or `'smallCellsRule'`

.

In the `noise`

argument the percentage of the initial variance
that is used as the variance of the embedded
noise if the argument `method`

is set to `'probabilistic'`

.
Any value of noise is ignored if the argument
`method`

is set to `'deterministic'`

or `'smallCellsRule'`

.
The user can choose any value for noise equal to or greater
than the pre-specified threshold `'nfilter.noise'`

.
By default the value of noise is set to be equal to 0.25.

In the argument `vertical.axis`

can be specified two types of histograms:

`'Frequency'`

: the histogram of the frequencies is returned.`'Density'`

: the histogram of the densities is returned.

Server function called: `histogramDS2`

one or more histogram objects and plots depending on the argument `type`

DataSHIELD Development Team

```
## Not run:
## Version 6, for version 5 see the Wiki
# Connecting to the Opal servers
require('DSI')
require('DSOpal')
require('dsBaseClient')
builder <- DSI::newDSLoginBuilder()
builder$append(server = "study1",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM1", driver = "OpalDriver")
builder$append(server = "study2",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM2", driver = "OpalDriver")
builder$append(server = "study3",
url = "http://192.168.56.100:8080/",
user = "administrator", password = "datashield_test&",
table = "CNSIM.CNSIM3", driver = "OpalDriver")
logindata <- builder$build()
# Log onto the remote Opal training servers
connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D")
# Compute the histogram
# Example 1: generate a histogram for each study separately
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
type = "split",
datasources = connections) #all studies are used
# Example 2: generate a combined histogram with the default small cells counts
suppression rule
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
method = 'smallCellsRule',
type = 'combine',
datasources = connections[1]) #only the first study is used (study1)
# Example 3: if a variable is of type factor the function returns an error
ds.histogram(x = 'D$PM_BMI_CATEGORICAL',
datasources = connections)
# Example 4: generate a combined histogram with the deterministic method for k=50
ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
k = 50,
method = 'deterministic',
type = 'combine',
datasources = connections[2])#only the second study is used (study2)
# Example 5: create a histogram and the probability density on the plot
hist <- ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
method = 'probabilistic', type='combine',
num.breaks = 30,
vertical.axis = 'Density',
datasources = connections)
lines(hist$mids, hist$density)
# clear the Datashield R sessions and logout
datashield.logout(connections)
## End(Not run)
```

[Package *dsBaseClient* version 6.3.0 ]