matchSimilarFromCorpusPerGroup

Find similar events from model per group.

This operator works with buildTermCorpusPerGroup operator, where buildTermCorpusPerGroup builds the model and matchSimilarFromCorpusPerGroup operator matches the text to the corpus in the model and adds the columns those kept during building the model.

Operator Usage in Easy Mode

Click + on the parent node.
Enter the Match Similar From Corpus Per Group operator in the search field and select the operator from the Results to open the operator form.
In the Table drop-down, enter or select the name of the table to run this operator on.
In the Model Name drop-down, enter or select the name of the model.
In the Grouping Column drop-down, enter or select the name of the column for grouping.
In the Column drop-down, enter or select the name of the column that contains the text to extract TF-IDF features.
Click Run to view the result.
Click Save to add the operator to the playbook.
Click Cancel to discard the operator form.

Usage Details

Uses the processed corpus from buildTermCorpusPerGroup and a new column of text to return the Cosine similarity.

matchSimilarFromCorpusPerGroup(table: TableReference, modelName:String, groupColumn:String, column: String)

Input Parameters
table (TableReference) - Table name
modelName (String) - model name
groupColumn (String) - Column name of a group
column (String) - Column name that contains the text to extract TF-IDF features

Returns:
Returns the greatest Cosine similarity score 'lhubcosineSimilarity', ranging from 0.0 - 1.0, where 0.0 doesn't match, 1.0 perfectly matches from the TF-IDF terms from the saved corpus along with the columns defined at corpus creation in the columnsKeep argument with 'lhub' prefix.

Example

Input
table and model name from buildTermCorpus operator

server	corpus
server1	h a c d i j b
server1	gg aa ff jj c i b
server2	k o m p n l q

matchSimilarFromCorpus(inputTable, "corpusModel", "server")
// table = inputTable
// model name that was created by buildTermCorpus operator = "corpusModel"
// buildTermCorpusPerGroup creates individual operator for each groups in "server" column so during matching it will load those models

Output

server	corpus	label	domain	lhub_confidence
server1	h a c d i j b	x	google	more than 0..5
server1	gg aa ff jj c i b	y	facebook	more than 0.5
server2	k o m p n l q	z	apple	more than 0.5

lable and domain columns are came from a corpusModel, where in the parameters it was set to keep ["label", "domain"] columns which would be added in the output based on matches. lhub_confidence is the best matches confidence score (e.g. cosine distance).

Updated almost 2 years ago