buildDecisionTree

build rule queries

Given multiple columns of numeric types and one column which has less than 3 distinct values (for labels) it will try to build multiple queries where each query will represent single label type.

Operator Usage in Easy Mode

Click + on the parent node.
Enter the Build Decision Tree operator in the search field and select the operator from the Results to open the operator form.
In the Table drop-down, enter or select the table to create a model.
In the Max Depth field, enter the maximum depth parameter to construct the decision tree.
In the Impurity field, enter the threshold for a node to be counted as decided vs. undecided.
Optional. In the Columns, click Add More to add an additional list of columns. The first column will be treated as label columns, the rest will be used as feature columns.
Click Run to view the result.
Click Save to add the operator to the playbook.
Click Cancel to discard the operator form.

Usage Details

buildDecisionTree(table: TableReference, maxDepth: Long, impurity: Double, columns:String*)

Parameters:
table (TableReference) - The table to create a model
maxDepth (Long) - Max depth parameter to construct decision tree
impurity (Double) - Impurity threshold for a node (e.g. query) to be counted as decided vs undecided. You can think of this as inverse confidence. For example: 0.01 means uncertainty = 0.01, e.g. if it is 99% certain it will create a rule.
columns: List of columns to manually specify label and feature columns, first column will act as a label, rest will be features. If you specify one column, then it will be label, rest numeric columns will be automatically set to be feature columns
**Returns:
Returns one row with one column where the cell contains JSON object with TreeModel and Data objects. TreeModel will contain the array of queries, Data will contain predicted data.

Example

df = `select rand() as f1, rand() as f2, rand() as f3 from table`
df1 = `select *, case when f1 > 0.5 then 1 else case when f2 > 0.5 
       then 2 else 0 end end as label from df`
df2 = buildDecisionTree(df1, 5, 0.05, "label", "f1", "f2", "f3")
//buildDecisionTree(df1, 5, 0.5, "label") should also work

Output
Output should be JSON object which will contain TreeModel and Data objects,
where TreeModel contains tree nodes, e.g. conditions, and Data will contain addition columns:lhub_decision_tree_node_impurity, lhub_decision_tree_path, lhub_decision_tree_predicted_label, lhub_isDecided, these columns are the output of decisionTreeModel.

"data":{
   "TreeModel":[...],
   "Data":[...]
}

you can extract the data into table as well as follow:

jsonToTable(df2, "RESULT.Data")

This would create an input table back with additional columns mentioned above which is the prediction and path information.