Basic Statistics#

Basic statistics algorithm computes the following set of quantitative dataset characteristics:

  • minimums/maximums

  • sums

  • means

  • sums of squares

  • sums of squared differences from the means

  • second order raw moments

  • variances

  • standard deviations

  • variations

Operation

Computational methods

Programming Interface

Computing

dense

compute(…)

compute_input

compute_result

Mathematical formulation#

Computing#

Given a set \(X\) of \(n\) \(p\)-dimensional feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\), the problem is to compute the following sample characteristics for each feature in the data set:

Statistic

Definition

Minimum

\(min(j) = \smash{\displaystyle \min_i } \{x_{ij}\}\)

Maximum

\(max(j) = \smash{\displaystyle \max_i } \{x_{ij}\}\)

Sum

\(s(j) = \sum_i x_{ij}\)

Sum of squares

\(s_2(j) = \sum_i x_{ij}^2\)

Means

\(m(j) = \frac {s(j)} {n}\)

Second order raw moment

\(a_2(j) = \frac {s_2(j)} {n}\)

Sum of squared difference from the means

\(\text{SDM}(j) = \sum_i (x_{ij} - m(j))^2\)

Variance

\(k_2(j) = \frac {\text{SDM}(j) } {n - 1}\)

Standard deviation

\(\text{stdev}(j) = \sqrt {k_2(j)}\)

Variation coefficient

\(V(j) = \frac {\text{stdev}(j)} {m(j)}\)

Computation method: dense#

The method computes the basic statistics for each feature in the data set.

Programming Interface#

Refer to API Reference: Basic statistics.

Distributed mode#

The algorithm supports distributed execution in SPMD mode (only on GPU).

Usage Example#

Computing#

 void run_computing(const table& data) {
 const auto bs_desc = dal::basic_statistics::descriptor{};

 const auto result = dal::compute(bs_desc, data);

 std::cout << "Minimum:\n" << result.get_min() << std::endl;
 std::cout << "Maximum:\n" << result.get_max() << std::endl;
 std::cout << "Sum:\n" << result.get_sum() << std::endl;
 std::cout << "Sum of squares:\n" << result.get_sum_squares() << std::endl;
 std::cout << "Sum of squared difference from the means:\n"
     << result.get_sum_squares_centered() << std::endl;
 std::cout << "Mean:\n" << result.get_mean() << std::endl;
 std::cout << "Second order raw moment:\n" << result.get_second_order_raw_moment() << std::endl;
 std::cout << "Variance:\n" << result.get_variance() << std::endl;
 std::cout << "Standard deviation:\n" << result.get_standard_deviation() << std::endl;
 std::cout << "Variation:\n" << result.get_variation() << std::endl;
}

Examples#