k-Nearest Neighbors Classification (k-NN)

\(k\)-NN classification, regression, and search algorithms are based on finding the \(k\) nearest observations to the training set. For classification, the problem is to infer the class of a new feature vector by computing the majority vote of its \(k\) nearest observations from the training set. For regression, the problem is to infer the target value of a new feature vector by computing the average target value of its \(k\) nearest observations from the training set. For search, the problem is to identify the \(k\) nearest observations from the training set to a new feature vector. The nearest observations are computed based on the chosen distance metric.

Operation

Computational methods

Programming Interface

Training

Brute-force

k-d tree

train(…)

train_input

train_result

Inference

Brute-force

k-d tree

infer(…)

infer_input

infer_result

Mathematical formulation

Refer to Developer Guide: k-Nearest Neighbors Classification.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::knn namespace and be available via inclusion of the oneapi/dal/algo/knn.hpp header file.

Enum classes

enum class voting_mode
voting_mode::uniform

Uniform weights for neighbors for prediction voting.

voting_mode::distance

Weight neighbors by the inverse of their distance.

Result options

class result_option_id

Public Methods

constexpr result_option_id() = default
constexpr result_option_id(const result_option_id_base &base)

Descriptor

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default, typename Distance = oneapi::dal::minkowski_distance::descriptor<Float>>
class descriptor
Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::brute_force or method::kd_tree.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::regression, or task::search.

  • Distance – The descriptor of the distance used for computations. Can be minkowski_distance::descriptor or chebyshev_distance::descriptor.

Constructors

descriptor(std::int64_t class_count, std::int64_t neighbor_count)

Creates a new instance of the class with the given class_count and neighbor_count property values.

template<typename M = Method, typename None = detail::enable_if_brute_force_t<M>>
descriptor(std::int64_t class_count, std::int64_t neighbor_count, const distance_t &distance)

Creates a new instance of the class with the given class_count, neighbor_count and distance property values. Used with method::brute_force only.

template<typename T = Task, typename None = detail::enable_if_not_classification_t<T>>
descriptor(std::int64_t neighbor_count)

Creates a new instance of the class with the given neighbor_count property value. Used with task::search and task::regression only.

template<typename T = Task, typename None = detail::enable_if_not_classification_t<T>>
descriptor(std::int64_t neighbor_count, const distance_t &distance)

Creates a new instance of the class with the given neighbor_count and distance property values. Used with task::search and task::regression only.

Properties

std::int64_t class_count

The number of classes c.

Getter & Setter
std::int64_t get_class_count() const
auto & set_class_count(std::int64_t value)
Invariants
result_option_id result_options

Choose which results should be computed and returned.

Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
std::int64_t neighbor_count

The number of neighbors k.

Getter & Setter
std::int64_t get_neighbor_count() const
auto & set_neighbor_count(std::int64_t value)
Invariants
voting_mode voting_mode

The voting mode.

Getter & Setter
voting_mode get_voting_mode() const
auto & set_voting_mode(voting_mode value)
const distance_t &distance

Choose distance type for calculations. Used with method::brute_force only.

Getter & Setter
template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> const distance_t & get_distance() const
template <typename M = Method, typename None = detail::enable_if_brute_force_t<M>> auto & set_distance(const distance_t &dist)

Method tags

struct brute_force

Tag-type that denotes brute-force computational method.

struct kd_tree

Tag-type that denotes k-d tree computational method.

using by_default = brute_force

Alias tag-type for brute-force computational method.

Task tags

struct classification

Tag-type that parameterizes entities used for solving classification problem.

struct regression

Tag-type that parameterizes entities used for solving the regression problem.

struct search

Tag-type that parameterizes entities used for solving the search problem.

using by_default = classification

Alias tag-type for classification task.

Model

template<typename Task = task::by_default>
class model
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification, task::search and task::regression.

Constructors

model()

Creates a new instance of the class with the default property values.

Training train(...)

Input

template<typename Task = task::by_default>
class train_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_input(const table &data, const table &responses)

Creates a new instance of the class with the given data and responses property values.

train_input(const table &data)

Properties

const table &labels

Vector of labels y for the training set X. Default value: table{}.

Getter & Setter
const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)
const table &data

The training set X. Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)
const table &responses

Vector of responses y for the training set X. Default value: table{}.

Getter & Setter
const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_responses(const table &responses)

Result

template<typename Task = task::by_default>
class train_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

const model<Task> &model

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Operation

template<typename Descriptor>
knn::train_result train(const Descriptor &desc, const knn::train_input &input)
Parameters
  • desc – k-NN algorithm descriptor knn::descriptor

  • input – Input data for the training operation

Preconditions
input.data.has_data == true
input.labels.has_data == true
input.data.row_count == input.labels.row_count
input.labels.column_count == 1
input.labels[i] >= 0
input.labels[i] < desc.class_count

Inference infer(...)

Input

template<typename Task = task::by_default>
class infer_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_input(const table &data, const model<Task> &model)

Creates a new instance of the class with the given model and data property values.

Properties

const model<Task> &model

The trained k-NN model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &m)
const table &data

The dataset for inference \(X'\). Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)

Result

template<typename Task = task::by_default>
class infer_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::classification or task::search.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

const table &distances

Distances to nearest neighbors. Default value: table{}.

Getter & Setter
const table & get_distances() const
auto & set_distances(const table &value)
const table &responses

The predicted responses. Default value: table{}.

Getter & Setter
const table & get_responses() const
template <typename T = Task, typename None = detail::enable_if_not_search_t<T>> auto & set_responses(const table &value)
const table &indices

Indices of nearest neighbors. Default value: table{}.

Getter & Setter
const table & get_indices() const
auto & set_indices(const table &value)
const table &labels

The predicted labels. Default value: table{}.

Getter & Setter
const table & get_labels() const
template <typename T = Task, typename None = detail::enable_if_classification_t<T>> auto & set_labels(const table &value)
const result_option_id &result_options

Result options that indicates availability of the properties.

Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)

Operation

template<typename Descriptor>
knn::infer_result infer(const Descriptor &desc, const knn::infer_input &input)
Parameters
  • desc – k-NN algorithm descriptor knn::descriptor

  • input – Input data for the inference operation

Preconditions
input.data.has_data == true
Postconditions
result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.class_count

Usage Example

Training

knn::model<> run_training(const table& data,
                        const table& labels) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = train(knn_desc, data, labels);

   return result.get_model();
}

Inference

table run_inference(const knn::model<>& model,
                  const table& new_data) {
   const std::int64_t class_count = 10;
   const std::int64_t neighbor_count = 5;
   const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

   const auto result = infer(knn_desc, model, new_data);

   print_table("labels", result.get_labels());
}

Examples