# Basic Statistics#

Basic statistics algorithm computes the following set of quantitative dataset characteristics:

• minimums/maximums

• sums

• means

• sums of squares

• sums of squared differences from the means

• second order raw moments

• variances

• standard deviations

• variations

 Operation Computational methods Programming Interface Computing dense compute(…) compute_input compute_result Partial Computing dense partial_compute(…) partial_compute_input partial_compute_result Finalize Computing dense finalize_compute(…) partial_compute_result compute_result

## Mathematical formulation#

### Computing#

Given a set $$X$$ of $$n$$ $$p$$-dimensional feature vectors $$x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})$$, the problem is to compute the following sample characteristics for each feature in the data set:

Statistic

Definition

Minimum

$$min(j) = \smash{\displaystyle \min_i } \{x_{ij}\}$$

Maximum

$$max(j) = \smash{\displaystyle \max_i } \{x_{ij}\}$$

Sum

$$s(j) = \sum_i x_{ij}$$

Sum of squares

$$s_2(j) = \sum_i x_{ij}^2$$

Means

$$m(j) = \frac {s(j)} {n}$$

Second order raw moment

$$a_2(j) = \frac {s_2(j)} {n}$$

Sum of squared difference from the means

$$\text{SDM}(j) = \sum_i (x_{ij} - m(j))^2$$

Variance

$$k_2(j) = \frac {\text{SDM}(j) } {n - 1}$$

Standard deviation

$$\text{stdev}(j) = \sqrt {k_2(j)}$$

Variation coefficient

$$V(j) = \frac {\text{stdev}(j)} {m(j)}$$

### Partial Computing#

Given a block of a $$X = \{ x_1, \ldots, x_n \}$$ dataset with $$n$$ feature vectors of $$p$$ dimension, the sums is a $$1 \times p$$ matrix, the crossproduct is $$p \times p$$ square matrices. The sums and the cross product are computed with the following formulas:

Statistic

Definition

Partial Minimum

$$min(j) = \smash{\displaystyle \min_i } \{x_{ij}\}$$

Partial Maximum

$$max(j) = \smash{\displaystyle \max_i } \{x_{ij}\}$$

Partial Sum

$$s(j) = \sum_i x_{ij}$$

Partial Sum of squares

$$s_2(j) = \sum_i x_{ij}^2$$

### Finalize Computing#

Given a partial result with partial products, the means is a $$1 \times p$$ matrix, the covariance and correlation matrices are $$p \times p$$ square matrices. The means, the covariance, and the correlation are computed with the following formulas:

Statistic

Definition

Finalize Minimum

$$min(j) = \smash{\displaystyle \min_i } \{x_{ij}\}$$

Finalize Maximum

$$max(j) = \smash{\displaystyle \max_i } \{x_{ij}\}$$

Finalize Sum

$$s(j) = \sum_i x_{ij}$$

Finalize Sum of squares

$$s_2(j) = \sum_i x_{ij}^2$$

Finalize Means

$$m(j) = \frac {s(j)} {n}$$

Finalize Second order raw moment

$$a_2(j) = \frac {s_2(j)} {n}$$

Finalize Sum of squared difference from the means

$$\text{SDM}(j) = \sum_i (x_{ij} - m(j))^2$$

Finalize Variance

$$k_2(j) = \frac {\text{SDM}(j) } {n - 1}$$

Finalize Standard deviation

$$\text{stdev}(j) = \sqrt {k_2(j)}$$

Finalize Variation coefficient

$$V(j) = \frac {\text{stdev}(j)} {m(j)}$$

### Computation method: dense#

The method computes the basic statistics for each feature in the data set.

## Programming Interface#

Refer to API Reference: Basic statistics.

## Online mode#

The algorithm supports online mode.

## Distributed mode#

The algorithm supports distributed execution in SPMD mode (only on GPU).

## Usage Example#

### Computing#

 void run_computing(const table& data) {
const auto bs_desc = dal::basic_statistics::descriptor{};

const auto result = dal::compute(bs_desc, data);

std::cout << "Minimum:\n" << result.get_min() << std::endl;
std::cout << "Maximum:\n" << result.get_max() << std::endl;
std::cout << "Sum:\n" << result.get_sum() << std::endl;
std::cout << "Sum of squares:\n" << result.get_sum_squares() << std::endl;
std::cout << "Sum of squared difference from the means:\n"
<< result.get_sum_squares_centered() << std::endl;
std::cout << "Mean:\n" << result.get_mean() << std::endl;
std::cout << "Second order raw moment:\n" << result.get_second_order_raw_moment() << std::endl;
std::cout << "Variance:\n" << result.get_variance() << std::endl;
std::cout << "Standard deviation:\n" << result.get_standard_deviation() << std::endl;
std::cout << "Variation:\n" << result.get_variation() << std::endl;
}


## Examples#

Batch Processing:

Online Processing: