Hello,

I am using M5Rules to predict the "Period (sec)" i.e. the 6th column in my

data. I have written a parser in .NET to convert the rules to Excel

formulas, but when i apply the formula in Excel I get completely erroneous

results. The formula seems correct and I am afraid there's something

fundamental that I do not grasp. I have put minNumInstances = 200 and

trained on the whole set to produce as few rules as possible for debugging

purposes. Here's the output:

=== Run information ===

Scheme: weka.classifiers.rules.M5Rules -M 200.0

Relation: data

Instances: 4026

Attributes: 6

Number of Storeys

Number of Spans

Length of Spans (m)

Opening percentage (%)

Masonry wall Stiffeness Et (x10^5 kN/m)

Period (Sec)

Test mode: evaluate on training data

=== Classifier model (full training set) ===

M5 pruned model rules

(using smoothed linear models) :

Number of Rules : 8

Rule: 1

IF

Number of Storeys > 9.5

Opening percentage (%) > 37.5

Number of Storeys > 15.5

THEN

Period (Sec) =

0.1215 * Number of Storeys

- 0.0883 * Number of Spans

+ 0.243 * Length of Spans (m)

+ 0.0002 * Opening percentage (%)

- 0.0002 * Masonry wall Stiffeness Et (x10^5 kN/m)

- 0.9072 [882/29.044%]

Rule: 2

IF

Number of Storeys > 6.5

Opening percentage (%) <= 62.5

THEN

Period (Sec) =

0.0532 * Number of Storeys

+ 0.0016 * Length of Spans (m)

+ 0.0108 * Opening percentage (%)

- 0.0233 * Masonry wall Stiffeness Et (x10^5 kN/m)

+ 0.1191 [1104/26.789%]

Rule: 3

IF

Number of Storeys > 6.5

Number of Storeys > 10.5

THEN

Period (Sec) =

0.1259 * Number of Storeys

+ 0.1856 * Length of Spans (m)

+ 0 * Opening percentage (%)

- 1.0226 [523/8.941%]

Rule: 4

IF

Number of Storeys > 6.5

THEN

Period (Sec) =

0.1222 * Number of Storeys

+ 0.113 * Length of Spans (m)

+ 0.0001 * Opening percentage (%)

- 0.6161 [419/11.291%]

Rule: 5

IF

Number of Storeys > 3.5

Opening percentage (%) > 37.5

THEN

Period (Sec) =

0.1229 * Number of Storeys

+ 0.061 * Length of Spans (m)

+ 0.0002 * Opening percentage (%)

- 0.3746 [378/32.319%]

Rule: 6

IF

Number of Storeys > 1.5

Opening percentage (%) <= 62.5

THEN

Period (Sec) =

0.0481 * Number of Storeys

+ 0.0008 * Length of Spans (m)

+ 0.0026 * Opening percentage (%)

+ 0.0041 [327/64.783%]

Rule: 7

IF

Number of Storeys > 1.5

THEN

Period (Sec) =

0.0071 * Number of Storeys

+ 0.0219 * Length of Spans (m)

+ 0.2009 [210/26.645%]

Rule: 8

Period (Sec) =

+ 0.1423 [183/100%]

Time taken to build model: 0.14 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.01 seconds

=== Summary ===

Correlation coefficient 0.9856

Mean absolute error 0.0824

Root mean squared error 0.1326

Relative absolute error 12.7376 %

Root relative squared error 16.896 %

Total Number of Instances 4026

The correlation coefficient is rather high. In Excel the corresponding data

are in the following columns:

A: Number of Storeys

B: Number of Spans

C: Length of Spans (m)

D: Opening percentage (%)

E: Masonry wall Stiffeness Et (x10^5 kN/m)

In column F, row 2, I evaluate the Period (Sec) using the formula

=IF(AND(A2>9.5,D2>37.5,A2>15.5),0.1215*A2-0.0883*B2+0.243*C2+0.0002*D2-0.0002*E2-0.9072,IF(AND(A2>6.5,D2<=62.5),0.0532*A2+0.0016*C2+0.0108*D2-0.0233*E2+0.1191,IF(AND(A2>6.5,A2>10.5),0.1259*A2+0.1856*C2+0*D2-1.0226,IF(A2>6.5,0.1222*A2+0.113*C2+0.0001*D2-0.6161,IF(AND(A2>3.5,D2>37.5),0.1229*A2+0.061*C2+0.0002*D2-0.3746,IF(AND(A2>1.5,D2<=62.5),0.0481*A2+0.0008*C2+0.0026*D2+0.0041,IF(A2>1.5,0.0071*A2+0.0219*C2+0.2009,0.1423)))))))

When I evaluate the predicted values, they are way, way off, as shown in the

picture. The points should be close to the 45 deg line:

<

https://weka.8497.n7.nabble.com/file/t6958/1.png>

I am completely buffled, and I suspect it is something obvious but cannot

find it. The same occurs with properly trained models and many rules. Any

ideas?

