stats

stats

factory stats

Description:
  • Specifies some utility for statistics.

    • These functions are accessible via the aidesys:: prefix.

Methods

(static) random(whatopt, degreeopt) → {double}

Description:
  • Returns a deterministic pseudo-random number.

    • This function is accessible via the aidesys:: prefix.
Parameters:
Name Type Attributes Default Description
what char <optional>
'i'

Random distribution.

  • 'u', "uniform": drawn from a uniform distribution in [0, 1] (degree is not used).
  • 'n', "normal": drawn from a Gaussian normal distribution (degree is not used).
  • 'g', "gamma": drawn from a Gamma distribution, of a given degree.
degree uint <optional>
1

The Gamma distribution degree, usually 1, 2, 5 or 10.

Returns:

A random number.

Type
double

(static) getStat(data, length, hsizeopt, histoopt, maskopt) → {string}

Description:
  • Computes usual 1D statistics returned as a parsable weak json string.

    0X1 As a function of the sample momenta:

    • "count": the number of observed values.
    • "min": the observed minimal value.
    • "max": the observed maximal value.
    • "mean": the mean value.
    • "stdev": the standard-deviation. (2nd order).
      • Returns an unbiased estimator of the theoretical standard-deviation E[(x-m)^2]^(1/2) (Any standard-deviation lower than DBL_EPSILON (1e-15) is set to 0).
    • _"skew" : the skewness (3rd order).
      • Returns an estimator of the theoretical skewness skewness E[(x - m)^3 / stdev^3] (Skewness is zero for symmetric distributions).
    • _"kurt" : the kurtosis (4th order).
      • An estimator of the theoretical kurtosis kurtosis E[(x - m)^4 / stdev^4] - 3 (Kurtosis is equal to 0 for the Gaussian distribution).

    0X2 As a function of the sample momenta, with respect to usual distributions:

    • "gamma-degree": the degree of a Gamma distribution, with the same mean and variance as the empirical distribution.
      • Returns the degree d >= 1 or 0 if undefined, assuming that the probability distribution has the form
        p(t) = (t / tau)^(d - 1) exp(-t / tau) / (tau d!) e.g., d = 1 for a Poisson distribution.
    • _"gamma-rate" : the rate of a Gamma distribution, with the same mean and variance as the empirical distribution.
      • Returns the rate tau >= 1 or 0 if undefined, assuming a Gamma distribution as above.
    • "uniform-entropy": the entropy of a uniform distribution with the same mean and variance.
    • "normal-entropy": the entropy of a Gaussian distribution with the same mean and variance.

    0X4 As a function of the histogram:

    • "hsize": the histogram sampling size.
    • "hsize_from_standard_deviation": the histogram sampling size, according to (Scott 1979).
    • "hsize_from_inter_quartile_range: the histogram sampling size, according to (Izenman 1991).
    • "mode": the mode value, i.e. the most frequent value, with 2nd order interpolation.
    • _"median", _"lower-quartile", "upper-quartile", "inter-quartile-range": The median corresponds to the percentile for at 50%. The quartiles to the percentile at 25% and 75%. Values are calculated from the histogram with 1st order interpolation.
    • "median", "lower-quartile", "upper-quartile", "inter-quartile-range": Here, values are calculated from the sorted data.
    • "entropy": The distribution entropy, in bits.
      • We use the approximation:
        -sum_i p_i log_2(p_i) + log_2(epsilon),
        where p_i is the probability of the i-th box of size epsilon.

    0X8 As a function of the histogram, with respect to usual distributions:

    • "uniform-divergence": The Kullback-Leibler divergence between the empirical distribution and a uniform distribution having the same mean and variance.
    • "normal-divergence": The Kullback-Leibler divergence between the empirical distribution and a Gaussian distribution having the same mean and variance.
    • "gamma-1-divergence": The Kullback-Leibler divergence between the empirical distribution and a Gamma distribution of degree 1 having the same mean and variance.
    • "gamma-2-divergence": The Kullback-Leibler divergence between the empirical distribution and a Gamma distribution of degree 2 having the same mean and variance.
    • "gamma-5-divergence": The Kullback-Leibler divergence between the empirical distribution and a Gamma distribution of degree 5 having the same mean and variance.
    • "gamma-10-divergence": The Kullback-Leibler divergence between the empirical distribution and a Gamma distribution of degree 10 having the same mean and variance.
    • "best-model": Using the following numerical code: 0: "normal", 1, 2, 5, 10: "gamma-1", "gamma-2", "gamma-5", "gamma-10", -1`: "uniform" as a fallback option.
Parameters:
Name Type Attributes Default Description
data Array | Vector | function

The 1D data.

  • It can be provided as a double[] array or a std::vector<double> buffer.
  • It also can be given as double data() random function or as a lambda:
    aidesys::getStat([] () { return aidesys::random('g', 10); }, 1000);
length uint

The data length, if given as an array or density function.

hsize uint <optional>
0

An optional histogram size to compute histogram based values between min and max. If not specified but the mask includes estimation based on an histogram, an optimal size is calculated as the maximum between the (Scott 1979) and (Izenman 1991) estimation, with a fallback to 100, if estimations fails.

histo Array <optional>

An optional unsigned int[] histogram buffer to return histogram values between min and max.

mask uint <optional>
0xF

An optional mask to deselect some calculation:

  • 0X1 As a function of the sample momenta.
  • 0X2 As a function of the sample momenta, with respect to usual distributions.
  • 0X4 As a function of the histogram.
  • 0X8 As a function of the histogram, with respect to usual distributions.
Returns:

The 1D statistics parameters returned as a parsable JSON string.

  • The following construct allows to extract a given parameter:
    [getStatValue](.#getStatValue)(name, getStat(../..))
Type
string

(static) getStatValue(name, stat, The)

Description:
  • Extract a getStat value from the JSON string.

Parameters:
Name Type Description
name string

The parameter name.

stat string

The getStat string returned value.

The double

related parameter value, or NAN if not a number.

(static) getDivergence(data1, data2, hsize)

Description:
  • Calculates the Kullback-Leibler divergence between this empirical distribution and another probality density.

    • The Kullback–Leibler divergence d(p||q) = >_w p ln(p/q), where p is the empirical distribution and q the model distribution corresponds to the average number of bits difference when coding p with q.
Parameters:
Name Type Description
data1 Histogram | Array | Vector | function

The 1D data empirical data.

  • It can be provided as a unsigned int[] array corresponding to the data histogram.
  • It also can be provided as a double[] array or a std::vector<double> buffer or double p() random function or lambda.
data2 Histogram | Array | Vector | function

The 1D model distribution data.

  • It can be provided as a unsigned int[] array corresponding to the data histogram.
  • It alo can be provided as a double[] array or a std::vector<double> buffer or double p() random function or lam * @param {uint} length The data length, if given as arrays or density functions.
hsize uint

The histogram size.

Returns:

The Kullback-Leibler divergence in bits.

(static) plotHistogram(file, data, length, hsizeopt, model)

Description:
  • Plots an histogram, as exemplified here:

Parameters:
Name Type Attributes Default Description
file string

The plot base name, storing in $file(.png|.dat|.gnuplot.sh), as documented for gnuplot.

data Array | Vector | function | string

The 1D data.

  • It can be provided as a double[] array or a std::vector<double> buffer.
  • It also can be given as double data() random function or as a lambda.
length uint

The data length, if given as an array or density function.

hsize uint <optional>
0

The histogram size, automatically adjusted if 0.

model bool

If true also draw the best model adjustment.

(static) plotStatCurve(file, A, x0, x1)

Description:
  • Plots means and standard deviations as a curve, as exemplified here:

Parameters:
Name Type Description
file string

The plot base name, storing in $file(.png|.dat|.gnuplot.sh), as documented for gnuplot.

A Array

std::vector<std::string> of getStat strings.

x0 double

The abcissa of histograms[0].

x1 double

The abcissa of histograms[histograms.size()-1].

(static) plotStatBoxes(file, An)

Description:
  • Plots means and standard deviations as statistical boxes, as exemplified here:

Parameters:
Name Type Description
file string

The plot base name, storing in $file(.png|.dat|.gnuplot.sh), as documented for gnuplot.

An Array

std::map<std::string, std::string > of named getStat strings, i.e., getStat strings indexed by names.