I am using FURIA rule induction algorithm in my thesis for inducing rules from a multi-class neural network, and I want to ask for your help to understand the FURIA output.
1. Two images below (please see hyperlinks) represent FURIA outputs for a 'talk.politics.guns' class from the 20 newsgroups scikit-learn dataset. The interesting part is that both outputs are produced by one model with the same hyperparameters. The top image is the result of the first optimisation step, and the bottom image is the result of the second optimisation step.
My question is, how can these outputs be so different from each other? Can I achieve more consistent behaviour?
2. Does anyone know why FURIA counts every "=> text_class=talk.politics.guns (CF = 0.0)" OR "=> text_class=merged (CF = 0.0)" as a separate rule (especially when CF = 0)? CF refers to a certainty factor of a rule.
3. Does anyone know what does " => text_class=merged (CF = 0.0)" mean? What does FURIA exactly do here?
4. If a rule states that "(israel = 0) and (serdar = 0) and (arab = 0) => text_class=merged (CF = 0.91)", how can CF be equal to 0.91 when there are no correlation between features values and the probability of the class?
5. I want to compare the interpretability of FURIA and RIPPER-k rule induction algorithms using the F-score/Fidelity score as an evaluation metric. I hypothesise that FURIA should be better than RIPPER-k (achieve higher F-score) since it provides independent rules as opposed to hierarchical rules. With that said, in all my experiments FURIA achieves a higher precision score, but a much lower recall score and therefore its F-score is always below RIPPER-k F-score.
Does anyone know what could be the reason for that?
I hope that someone can help to answer these questions. Thank you in advance.