Group events based on similar content in text fields.
Assume the input includes millions of logs, and you want to group events into few groups. The
group by operator in LQL groups events based on an exact match of a text or values. With the LQL operator, 'system was shutdown' and 'system shut down' would be in different groups because the text match is not exact. With the formCluster operator, both events would be placed in the same group.
- Click + on the parent node.
- Enter Form Clusters operator in the search field and select the operator from the Results to open the operator form.
- In the Table drop-down, enter or select the table to create clusters in the TABLE field.
- In the Fields for Clustering field, enter the list of column names to apply the clustering algorithm on.
- In the Number of Clusters field, enter the number of clusters that you wish to add.
- Click Run to view the result.
- Click Save to add the operator to the playbook.
- Click Cancel to discard the operator form.
formClusters(table, fieldsForClustering, numberOfClusters, fieldsForGrouping)
This is a clustering operator. It takes:
fieldsForClustering (String) - to apply clustering on
numberOfClusters - number of clusters to create
fieldsForGrouping (String*) - optional grouping, enforcing same fieldsForGrouping values will appear only in one cluster.
lhub_cluster_id: Group id. This operator splits the data into multiple groups and then assigns a numeric ID to each group, where the assigned group IDs are "cluster_1", "cluster_2", and so on.
table = github_logs
|1||154 E. Dana st, Mountain View, CA, USA|
|2||154 Data, Mountain View, CA, 94041|
|3||Somewhere in Houston, TX|
|4||14054 San ramon st, Houston, TX, 77077|
formClusters(table, ["address"], 2)
|1||154 E. Dana st, Mountain View, CA, USA||cluster_1|
|2||154 Data, Mountain View, CA, 94041||cluster_1|
|3||Somewhere in Houston, TX||cluster_0|
|4||14054 San ramon st, Houston, TX, 77077||cluster_0|
Updated almost 2 years ago