Forecast goes way wrong

classic Classic list List threaded Threaded
3 messages Options
Tom
Reply | Threaded
Open this post in threaded view
|

Forecast goes way wrong

Tom
Hi,

I have a timeseries dataset (see end of mail), with a lot of zero-values. It represents historical sales data for a customer by month, with added zeros for the non-sale months.
I'm trying to forecast the next item in the series.

I'm using the timeseriesForecasting plugin version 1.0.27 (latest) in weka 3.8.4

As for settings, they are pretty basic:
- Target selection: 'omzet'
- Number of time units to forecast: 1
- Time stamp: 'maand_global'
- Perform evaluation: yes
- Base learner (this is just the default SMOreg, nothing configured): weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
- Lag length: custom: minimum = 1, maximum = 11 (half of my dataset); default "more options" of 'powers of time' and 'products of time' true
- Evaluation: RMSE; Evaluate on training: true; Evaluate on held out training: 0.1

Using these settings, the next forecasted value is 69288, which makes no sense whatsoever, since that value is huge in comparison with the others.

What did I do wrong, or is this algorithm wrong or unsuited?

(Note: I tried the same with the default MultilayerPerceptron, that one yields 3636 as output, which is lower, but still unreasonably high)

Best regards,
  Tom




Data (arff file):


@relation QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4

@attribute omzet numeric
@attribute maand_global numeric

@data
85.120003,231
0,232
0,233
0,234
0,235
0,236
0,237
0,238
0,239
0,240
0,241
2354.120117,242
0,243
0,244
1760.160034,245
0,246
0,247
0,248
0,249
0,250
0,251
0,252





Full run-log:

=== Run information ===

Scheme:
SMOreg -C 1.0 -N 0 -I "RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "PolyKernel -E 1.0 -C 250007"

Lagged and derived variable options:
-F omzet -L 1 -M 11 -G maand_global

Relation:     QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4
Instances:    22
Attributes:   2
              omzet
              maand_global

Transformed training data:

              omzet
              maand_global
              Lag_omzet-1
              Lag_omzet-2
              Lag_omzet-3
              Lag_omzet-4
              Lag_omzet-5
              Lag_omzet-6
              Lag_omzet-7
              Lag_omzet-8
              Lag_omzet-9
              Lag_omzet-10
              Lag_omzet-11
              maand_global^2
              maand_global^3
              maand_global*Lag_omzet-1
              maand_global*Lag_omzet-2
              maand_global*Lag_omzet-3
              maand_global*Lag_omzet-4
              maand_global*Lag_omzet-5
              maand_global*Lag_omzet-6
              maand_global*Lag_omzet-7
              maand_global*Lag_omzet-8
              maand_global*Lag_omzet-9
              maand_global*Lag_omzet-10
              maand_global*Lag_omzet-11

omzet:
SMOreg

weights (not support vectors):
 +       0.0392 * (normalized) maand_global
 +       0.0165 * (normalized) Lag_omzet-1
 +       0.0135 * (normalized) Lag_omzet-2
 +       0.3874 * (normalized) Lag_omzet-3
 -       0.0052 * (normalized) Lag_omzet-4
 -       0.005  * (normalized) Lag_omzet-5
 -       0.2884 * (normalized) Lag_omzet-6
 +       0.002  * (normalized) Lag_omzet-7
 -       0.003  * (normalized) Lag_omzet-8
 -       0.0282 * (normalized) Lag_omzet-9
 -       0.0351 * (normalized) Lag_omzet-10
 +       0.5198 * (normalized) Lag_omzet-11
 +       0.0401 * (normalized) maand_global^2
 +       0.0409 * (normalized) maand_global^3
 +       0.0185 * (normalized) maand_global*Lag_omzet-1
 +       0.0134 * (normalized) maand_global*Lag_omzet-2
 +       0.3817 * (normalized) maand_global*Lag_omzet-3
 -       0.0071 * (normalized) maand_global*Lag_omzet-4
 -       0.0058 * (normalized) maand_global*Lag_omzet-5
 -       0.287  * (normalized) maand_global*Lag_omzet-6
 +       0.0035 * (normalized) maand_global*Lag_omzet-7
 -       0.0014 * (normalized) maand_global*Lag_omzet-8
 -       0.0282 * (normalized) maand_global*Lag_omzet-9
 -       0.0351 * (normalized) maand_global*Lag_omzet-10
 +       0.5198 * (normalized) maand_global*Lag_omzet-11
 -       0.1092



Number of kernel evaluations: 210 (97.266% cached)

=== Future predictions from end of training data ===
inst#         omzet 
231           85.12 
232               0 
233               0 
234               0 
235               0 
236               0 
237               0 
238               0 
239               0 
240               0 
241               0 
242       2354.1201 
243               0 
244               0 
245         1760.16 
246               0 
247               0 
248               0 
249               0 
250               0 
251*      -4728.9192 

=== Future predictions from end of test data ===
inst#       omzet 
251             0 
252             0 
253*      69288.7472 

=== Evaluation on training data ===
Target                      1-step-ahead
========================================
omzet
  N                                    9
  Root mean squared error         2.2847

Total number of instances: 20

=== Evaluation on test data ===
Target                      1-step-ahead
========================================
omzet
  N                                    2
  Root mean squared error      4670.1525

Total number of instances: 2


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Forecast goes way wrong

Mark Hall
I don’t think you have enough data, and you are most likely overfitting by using up to 12 lagged variables :-) Furthermore, there is no apparent trend or seasonality, so including a timestamp is exacerbating overfitting. Try a maximum lag length of six, no timestamp (set “<None>” in the Basic config panel) and try either LinearRegression or SMOreg with default settings. At least the one-step-ahead prediction after the training step is not outrageous in this case. 

Cheers, 
Mark.
On 5 Jan 2021, 10:09 PM +1300, Tom <[hidden email]>, wrote:
Hi,

I have a timeseries dataset (see end of mail), with a lot of zero-values. It represents historical sales data for a customer by month, with added zeros for the non-sale months.
I'm trying to forecast the next item in the series.

I'm using the timeseriesForecasting plugin version 1.0.27 (latest) in weka 3.8.4

As for settings, they are pretty basic:
- Target selection: 'omzet'
- Number of time units to forecast: 1
- Time stamp: 'maand_global'
- Perform evaluation: yes
- Base learner (this is just the default SMOreg, nothing configured): weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
- Lag length: custom: minimum = 1, maximum = 11 (half of my dataset); default "more options" of 'powers of time' and 'products of time' true
- Evaluation: RMSE; Evaluate on training: true; Evaluate on held out training: 0.1

Using these settings, the next forecasted value is 69288, which makes no sense whatsoever, since that value is huge in comparison with the others.

What did I do wrong, or is this algorithm wrong or unsuited?

(Note: I tried the same with the default MultilayerPerceptron, that one yields 3636 as output, which is lower, but still unreasonably high)

Best regards,
  Tom




Data (arff file):


@relation QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4

@attribute omzet numeric
@attribute maand_global numeric

@data
85.120003,231
0,232
0,233
0,234
0,235
0,236
0,237
0,238
0,239
0,240
0,241
2354.120117,242
0,243
0,244
1760.160034,245
0,246
0,247
0,248
0,249
0,250
0,251
0,252





Full run-log:

=== Run information ===

Scheme:
SMOreg -C 1.0 -N 0 -I "RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "PolyKernel -E 1.0 -C 250007"

Lagged and derived variable options:
-F omzet -L 1 -M 11 -G maand_global

Relation:     QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4
Instances:    22
Attributes:   2
              omzet
              maand_global

Transformed training data:

              omzet
              maand_global
              Lag_omzet-1
              Lag_omzet-2
              Lag_omzet-3
              Lag_omzet-4
              Lag_omzet-5
              Lag_omzet-6
              Lag_omzet-7
              Lag_omzet-8
              Lag_omzet-9
              Lag_omzet-10
              Lag_omzet-11
              maand_global^2
              maand_global^3
              maand_global*Lag_omzet-1
              maand_global*Lag_omzet-2
              maand_global*Lag_omzet-3
              maand_global*Lag_omzet-4
              maand_global*Lag_omzet-5
              maand_global*Lag_omzet-6
              maand_global*Lag_omzet-7
              maand_global*Lag_omzet-8
              maand_global*Lag_omzet-9
              maand_global*Lag_omzet-10
              maand_global*Lag_omzet-11

omzet:
SMOreg

weights (not support vectors):
 +       0.0392 * (normalized) maand_global
 +       0.0165 * (normalized) Lag_omzet-1
 +       0.0135 * (normalized) Lag_omzet-2
 +       0.3874 * (normalized) Lag_omzet-3
 -       0.0052 * (normalized) Lag_omzet-4
 -       0.005  * (normalized) Lag_omzet-5
 -       0.2884 * (normalized) Lag_omzet-6
 +       0.002  * (normalized) Lag_omzet-7
 -       0.003  * (normalized) Lag_omzet-8
 -       0.0282 * (normalized) Lag_omzet-9
 -       0.0351 * (normalized) Lag_omzet-10
 +       0.5198 * (normalized) Lag_omzet-11
 +       0.0401 * (normalized) maand_global^2
 +       0.0409 * (normalized) maand_global^3
 +       0.0185 * (normalized) maand_global*Lag_omzet-1
 +       0.0134 * (normalized) maand_global*Lag_omzet-2
 +       0.3817 * (normalized) maand_global*Lag_omzet-3
 -       0.0071 * (normalized) maand_global*Lag_omzet-4
 -       0.0058 * (normalized) maand_global*Lag_omzet-5
 -       0.287  * (normalized) maand_global*Lag_omzet-6
 +       0.0035 * (normalized) maand_global*Lag_omzet-7
 -       0.0014 * (normalized) maand_global*Lag_omzet-8
 -       0.0282 * (normalized) maand_global*Lag_omzet-9
 -       0.0351 * (normalized) maand_global*Lag_omzet-10
 +       0.5198 * (normalized) maand_global*Lag_omzet-11
 -       0.1092



Number of kernel evaluations: 210 (97.266% cached)

=== Future predictions from end of training data ===
inst#         omzet 
231           85.12 
232               0 
233               0 
234               0 
235               0 
236               0 
237               0 
238               0 
239               0 
240               0 
241               0 
242       2354.1201 
243               0 
244               0 
245         1760.16 
246               0 
247               0 
248               0 
249               0 
250               0 
251*      -4728.9192 

=== Future predictions from end of test data ===
inst#       omzet 
251             0 
252             0 
253*      69288.7472 

=== Evaluation on training data ===
Target                      1-step-ahead
========================================
omzet
  N                                    9
  Root mean squared error         2.2847

Total number of instances: 20

=== Evaluation on test data ===
Target                      1-step-ahead
========================================
omzet
  N                                    2
  Root mean squared error      4670.1525

Total number of instances: 2

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Tom
Reply | Threaded
Open this post in threaded view
|

Re: Forecast goes way wrong

Tom
Hi, thanks for the reply!

Yes, the results of what you suggest are much more in line of what I would expect (-666, which I would set to 0 in the output, which is a normal value given the input).

I guess I don't understand enough of what the lagging does internally, so I just cobbled up something, together with some internet sources.

However, why is it that the timestamp choice matters? The feature I chose increments by +1, the same as when choosing 'None', only that it starts at 231 instead of 1. The forecasts are -989 and -666 respectively.

In this dataset, there really is no seasonality that I can see either, but for other customers and their sales, there might be, and for some I have more data than these 22 records.
Would it be reasonable to say "Take a maximum lag of `max(12, nr of samples / 4)`"? (in my code, I currently have nr_samples / 2, hence the 11) Or is this something that cannot be generalized?
Using the default lag options (as in, no 'custom' setting), my output is -4661 with the chosen timestamp of 'maand_global', so I guess internally there isn't really a clever setting I could use either?

Best regards,
  Tom


On Wed, Jan 6, 2021 at 1:42 AM Mark Hall <[hidden email]> wrote:
I don’t think you have enough data, and you are most likely overfitting by using up to 12 lagged variables :-) Furthermore, there is no apparent trend or seasonality, so including a timestamp is exacerbating overfitting. Try a maximum lag length of six, no timestamp (set “<None>” in the Basic config panel) and try either LinearRegression or SMOreg with default settings. At least the one-step-ahead prediction after the training step is not outrageous in this case. 

Cheers, 
Mark.
On 5 Jan 2021, 10:09 PM +1300, Tom <[hidden email]>, wrote:
Hi,

I have a timeseries dataset (see end of mail), with a lot of zero-values. It represents historical sales data for a customer by month, with added zeros for the non-sale months.
I'm trying to forecast the next item in the series.

I'm using the timeseriesForecasting plugin version 1.0.27 (latest) in weka 3.8.4

As for settings, they are pretty basic:
- Target selection: 'omzet'
- Number of time units to forecast: 1
- Time stamp: 'maand_global'
- Perform evaluation: yes
- Base learner (this is just the default SMOreg, nothing configured): weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
- Lag length: custom: minimum = 1, maximum = 11 (half of my dataset); default "more options" of 'powers of time' and 'products of time' true
- Evaluation: RMSE; Evaluate on training: true; Evaluate on held out training: 0.1

Using these settings, the next forecasted value is 69288, which makes no sense whatsoever, since that value is huge in comparison with the others.

What did I do wrong, or is this algorithm wrong or unsuited?

(Note: I tried the same with the default MultilayerPerceptron, that one yields 3636 as output, which is lower, but still unreasonably high)

Best regards,
  Tom




Data (arff file):


@relation QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4

@attribute omzet numeric
@attribute maand_global numeric

@data
85.120003,231
0,232
0,233
0,234
0,235
0,236
0,237
0,238
0,239
0,240
0,241
2354.120117,242
0,243
0,244
1760.160034,245
0,246
0,247
0,248
0,249
0,250
0,251
0,252





Full run-log:

=== Run information ===

Scheme:
SMOreg -C 1.0 -N 0 -I "RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "PolyKernel -E 1.0 -C 250007"

Lagged and derived variable options:
-F omzet -L 1 -M 11 -G maand_global

Relation:     QueryResult-weka.filters.unsupervised.attribute.Remove-R2-4
Instances:    22
Attributes:   2
              omzet
              maand_global

Transformed training data:

              omzet
              maand_global
              Lag_omzet-1
              Lag_omzet-2
              Lag_omzet-3
              Lag_omzet-4
              Lag_omzet-5
              Lag_omzet-6
              Lag_omzet-7
              Lag_omzet-8
              Lag_omzet-9
              Lag_omzet-10
              Lag_omzet-11
              maand_global^2
              maand_global^3
              maand_global*Lag_omzet-1
              maand_global*Lag_omzet-2
              maand_global*Lag_omzet-3
              maand_global*Lag_omzet-4
              maand_global*Lag_omzet-5
              maand_global*Lag_omzet-6
              maand_global*Lag_omzet-7
              maand_global*Lag_omzet-8
              maand_global*Lag_omzet-9
              maand_global*Lag_omzet-10
              maand_global*Lag_omzet-11

omzet:
SMOreg

weights (not support vectors):
 +       0.0392 * (normalized) maand_global
 +       0.0165 * (normalized) Lag_omzet-1
 +       0.0135 * (normalized) Lag_omzet-2
 +       0.3874 * (normalized) Lag_omzet-3
 -       0.0052 * (normalized) Lag_omzet-4
 -       0.005  * (normalized) Lag_omzet-5
 -       0.2884 * (normalized) Lag_omzet-6
 +       0.002  * (normalized) Lag_omzet-7
 -       0.003  * (normalized) Lag_omzet-8
 -       0.0282 * (normalized) Lag_omzet-9
 -       0.0351 * (normalized) Lag_omzet-10
 +       0.5198 * (normalized) Lag_omzet-11
 +       0.0401 * (normalized) maand_global^2
 +       0.0409 * (normalized) maand_global^3
 +       0.0185 * (normalized) maand_global*Lag_omzet-1
 +       0.0134 * (normalized) maand_global*Lag_omzet-2
 +       0.3817 * (normalized) maand_global*Lag_omzet-3
 -       0.0071 * (normalized) maand_global*Lag_omzet-4
 -       0.0058 * (normalized) maand_global*Lag_omzet-5
 -       0.287  * (normalized) maand_global*Lag_omzet-6
 +       0.0035 * (normalized) maand_global*Lag_omzet-7
 -       0.0014 * (normalized) maand_global*Lag_omzet-8
 -       0.0282 * (normalized) maand_global*Lag_omzet-9
 -       0.0351 * (normalized) maand_global*Lag_omzet-10
 +       0.5198 * (normalized) maand_global*Lag_omzet-11
 -       0.1092



Number of kernel evaluations: 210 (97.266% cached)

=== Future predictions from end of training data ===
inst#         omzet 
231           85.12 
232               0 
233               0 
234               0 
235               0 
236               0 
237               0 
238               0 
239               0 
240               0 
241               0 
242       2354.1201 
243               0 
244               0 
245         1760.16 
246               0 
247               0 
248               0 
249               0 
250               0 
251*      -4728.9192 

=== Future predictions from end of test data ===
inst#       omzet 
251             0 
252             0 
253*      69288.7472 

=== Evaluation on training data ===
Target                      1-step-ahead
========================================
omzet
  N                                    9
  Root mean squared error         2.2847

Total number of instances: 20

=== Evaluation on test data ===
Target                      1-step-ahead
========================================
omzet
  N                                    2
  Root mean squared error      4670.1525

Total number of instances: 2

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html