formClusters

Group events based on similar text

Group events based on similar content in text fields.

Assume the input includes millions of logs, and you want to group events into few groups. The group by operator in LQL groups events based on an exact match of a text or values. With the LQL operator, 'system was shutdown' and 'system shut down' would be in different groups because the text match is not exact. With the formCluster operator, both events would be placed in the same group.

Operator Usage in Easy Mode

  1. Click + on the parent node.
  2. Enter Form Clusters operator in the search field and select the operator from the Results to open the operator form.
  3. In the Table drop-down, enter or select the table to create clusters in the TABLE field.
  4. In the Fields for Clustering field, enter the list of column names to apply the clustering algorithm on.
  5. In the Number of Clusters field, enter the number of clusters that you wish to add.
  6. Click Run to view the result.
  7. Click Save to add the operator to the playbook.
  8. Click Cancel to discard the operator form.

Usage Details

formClusters(table, fieldsForClustering, numberOfClusters, fieldsForGrouping)

Input:

This is a clustering operator. It takes:
fieldsForClustering (String[]) - to apply clustering on
numberOfClusters - number of clusters to create
fieldsForGrouping (String*) - optional grouping, enforcing same fieldsForGrouping values will appear only in one cluster.

Output:

lhub_cluster_id: Group id. This operator splits the data into multiple groups and then assigns a numeric ID to each group, where the assigned group IDs are "cluster_1", "cluster_2", and so on.

Example

Input
table = github_logs

id address
1154 E. Dana st, Mountain View, CA, USA
2154 Data, Mountain View, CA, 94041
3Somewhere in Houston, TX
414054 San ramon st, Houston, TX, 77077
formClusters(table, ["address"], 2)

Output

id address lhub_cluster_id
1154 E. Dana st, Mountain View, CA, USAcluster_1
2154 Data, Mountain View, CA, 94041cluster_1
3Somewhere in Houston, TXcluster_0
414054 San ramon st, Houston, TX, 77077cluster_0

© 2017-2021 LogicHub®. All Rights Reserved.