ds.histogram {dsBaseClient} | R Documentation |

`ds.histogram`

function plots a non-disclosive histogram in the client-side.

ds.histogram( x = NULL, type = "split", num.breaks = 10, method = "smallCellsRule", k = 3, noise = 0.25, vertical.axis = "Frequency", datasources = NULL )

`x` |
a character string specifying the name of a numerical vector. |

`type` |
a character string that represents the type of graph to display.
The |

`num.breaks` |
a numeric specifying the number of breaks of the histogram. Default value
is |

`method` |
a character string that defines which histogram will be created.
The |

`k` |
the number of the nearest neighbours for which their centroid is calculated.
Default |

`noise` |
the percentage of the initial variance that is used as the variance of the embedded
noise if the argument |

`vertical.axis, ` |
a character string that defines what is shown in the vertical axis of the
plot. The |

`datasources` |
a list of |

`ds.histogram`

function allows the user to plot
distinct histograms (one for each study) or a combined histogram that merges
the single plots.

In the argument `type`

can be specified two types of graphics to display:

`'combine'`

: a histogram that merges the single plot is displayed.`'split'`

: each histogram is plotted separately.

In the argument `method`

can be specified 3 different histograms to be created:

`'smallCellsRule'`

: the histogram of the actual variable is created but bins with low counts are removed.`'deterministic'`

: the histogram of the scaled centroids of each`k`

nearest neighbours of the original variable where the value of`k`

is set by the user.`'probabilistic'`

: the histogram shows the original distribution disturbed by the addition of random stochastic noise. The added noise follows a normal distribution with zero mean and variance equal to a percentage of the initial variance of the input variable. This percentage is specified by the user in the argument`noise`

.

In the `k`

argument the user can choose any value for `k`

equal
to or greater than the pre-specified threshold
used as a disclosure control for this method and lower than the number of observations
minus the value of this threshold. By default the value of `k`

is set to be equal to 3
(we suggest k to be equal to, or bigger than, 3). Note that the function fails if the user
uses the default value but the study has set a bigger threshold.
The value of `k`

is used only if the argument
`method`

is set to `'deterministic'`

.
Any value of k is ignored if the
argument `method`

is set to `'probabilistic'`

or `'smallCellsRule'`

.

In the `noise`

argument the percentage of the initial variance
that is used as the variance of the embedded
noise if the argument `method`

is set to `'probabilistic'`

.
Any value of noise is ignored if the argument
`method`

is set to `'deterministic'`

or `'smallCellsRule'`

.
The user can choose any value for noise equal to or greater
than the pre-specified threshold `'nfilter.noise'`

.
By default the value of noise is set to be equal to 0.25.

In the argument `vertical.axis`

can be specified two types of histograms:

`'Frequency'`

: the histogram of the frequencies is returned.`'Density'`

: the histogram of the densities is returned.

Server function called: `histogramDS2`

one or more histogram objects and plots depending on the argument `type`

DataSHIELD Development Team

## Not run: ## Version 6, for version 5 see the Wiki # Connecting to the Opal servers require('DSI') require('DSOpal') require('dsBaseClient') builder <- DSI::newDSLoginBuilder() builder$append(server = "study1", url = "http://192.168.56.100:8080/", user = "administrator", password = "datashield_test&", table = "CNSIM.CNSIM1", driver = "OpalDriver") builder$append(server = "study2", url = "http://192.168.56.100:8080/", user = "administrator", password = "datashield_test&", table = "CNSIM.CNSIM2", driver = "OpalDriver") builder$append(server = "study3", url = "http://192.168.56.100:8080/", user = "administrator", password = "datashield_test&", table = "CNSIM.CNSIM3", driver = "OpalDriver") logindata <- builder$build() # Log onto the remote Opal training servers connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D") # Compute the histogram # Example 1: generate a histogram for each study separately ds.histogram(x = 'D$PM_BMI_CONTINUOUS', type = "split", datasources = connections) #all studies are used # Example 2: generate a combined histogram with the default small cells counts suppression rule ds.histogram(x = 'D$PM_BMI_CONTINUOUS', method = 'smallCellsRule', type = 'combine', datasources = connections[1]) #only the first study is used (study1) # Example 3: if a variable is of type factor the function returns an error ds.histogram(x = 'D$PM_BMI_CATEGORICAL', datasources = connections) # Example 4: generate a combined histogram with the deterministic method for k=50 ds.histogram(x = 'D$PM_BMI_CONTINUOUS', k = 50, method = 'deterministic', type = 'combine', datasources = connections[2])#only the second study is used (study2) # Example 5: create a histogram and the probability density on the plot hist <- ds.histogram(x = 'D$PM_BMI_CONTINUOUS', method = 'probabilistic', type='combine', num.breaks = 30, vertical.axis = 'Density', datasources = connections) lines(hist$mids, hist$density) # clear the Datashield R sessions and logout datashield.logout(connections) ## End(Not run)

[Package *dsBaseClient* version 6.1.1 ]