nearestNeighborScorer

Score events by finding nearest neighbor.

When adding standard score rules, the rules must match exactly, or they don't apply. This operator allows you to expand the scoring capability to events that don't match exactly by using the nearest or closest known score. See interpolateScorer for an alternative approach.

This operator gives the same score as the nearest or closest known score (see https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm for the k = 1 case). A known score is anything that you have scored explicitly using the score rules interface.

For example, assume that you have two known scores. Bytes = 0 has a score of 0.0, and bytes = 500 has a score is 5.0. For a row with bytes = 300, nearestNeighborScorer assigns the value 5.0, because bytes=300 is closer to bytes=500 than it is to byes=0. The operator assigns the same value 5.0 to bytes=600 because 600 is also closest to 500.

Note: This operator does not apply score rules as usual. It uses the score rules for interpolation. If you later provide additional known scores, this operator re-adjusts the scores based on the new information. This operator requires at least two known scores to be able to work properly.

Operator Usage in Easy Mode

  1. Click + on the parent node.
  2. Enter the Nearest Neighbor Scorer operator in the search field and select the operator from the Results to open the operator form.
  3. In the Input Table drop-down, enter or select the name of the table to run this operator on.
  4. In the Columns, click Add More to add the columns used to calculate nearness.
  5. Optional. In the Default Value for Nulls, click Add More to add the default value to replace nulls.
  6. Click Run to view the result.
  7. Click Save to add the operator to the playbook.
  8. Click Cancel to discard the operator form.

Usage Details

nearestNeighborScorer(inputTable, columns)

Input:
inputTable: Table containing the data to run this operator on.
columns: Columns used to calculate nearness.

Output:
A score table where each row is scored based on the nearest example in the given scores.

Input
table = github_logs


Did this page help you?