Current limitations of existing operators
Given multiple tables which contains lhub_scores we want to combine them and assign final score where the final score is the combination of manual scoring and using machine learning models.
Currently we can do it as follow:
- Combine tables using autoJoinScores operator
- select score columns
- select distinct rows
- assign score using machine learning IF score combiner model exists
- assign or update scores manually
- split scored table into two separate tables
- manually assigned scores (trainTable) + load previous trainTable from files
- no scores assigned (scoreTable)
- create a model using trainTable
- predict scores for no scores assigned table (scoreTable)
- store trainTable as a csv file so you can load it back in (6.a)
Numerous steps were required to create a fully working playbook, but even that is not enough because in every run we need to read all previous trainTables.
create a new operator that will automate whole process:
where it will
- combine tables using autojoin operator
- select distinct rows
- load previous rules those are stored for the table and the model
- scores the table if the model exists, and update with the rules
- for example if predictor assigns a score of 6 but in the rules we assigned 9 we need to give 9 as a final result
- show in the UI so user can assign score that s/he disagree
- provide extra column such as confidence score, where 1.0 will tell that this score is coming from rule
- if user updates the score in the UI in one of the rows
- retrain the model
- add new rule into train datasets
- Click + on the parent node.
- Enter the Supervised Scorer operator in the search field and select the operator from the Results to open the operator form.
- In the Table drop-down, enter or select a table to apply the operator.
- Click Run to view the result.
- Click Save to add the operator to the playbook.
- Click Cancel to discard the operator form.
listOfScoreTables: list of score tables to join
table1 with columns: lhubscore, user
table2 with columns: lhub_score, user
where lhub_scores are different in both tables (e.g. table1's _lhub_score is based on scoreByRandomness of user, and table2's lhub_score is based by scoreByAnomaly)
In the output you should see: lhub_score, table1, table2, user columns
where table1 and table2 columns should be same as lhub_scores of table1 and table2 tables.
Initially all lhub_score in the output table should be "-" where you can assign any value 0 - 10 to that specific row.
Once you assign 2 or more rows, the operator will train a model and score rest of the events and will assign new scores to unscored ones.
Updated 10 months ago