Remove duplicate instances

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove duplicate instances

GiselyP
Hi guys,

I have some duplicate rows in my dataset. Is there any filters to remove
these?



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Remove duplicate instances

Peter Reutemann
> I have some duplicate rows in my dataset. Is there any filters to remove
> these?

Not aware of a filter as part of Weka or one of its packages.
However, the ADAMS framework has a RemoveDuplicates Weka filter that
removes duplicate rows.
I recommend downloading the "adams-ml-app-snapshot" zip file:
  https://adams.cms.waikato.ac.nz/download/snapshot/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Remove duplicate instances

Eibe Frank-3
I actually made a WEKA filter for this not that long ago so you can do this from within WEKA now:


Cheers,
Eibe

On Wed, Apr 28, 2021 at 10:26 AM Peter Reutemann <[hidden email]> wrote:
> I have some duplicate rows in my dataset. Is there any filters to remove
> these?

Not aware of a filter as part of Weka or one of its packages.
However, the ADAMS framework has a RemoveDuplicates Weka filter that
removes duplicate rows.
I recommend downloading the "adams-ml-app-snapshot" zip file:
  https://adams.cms.waikato.ac.nz/download/snapshot/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Remove duplicate instances

Edward Wiskers
Hi Eibe, 

Assuming that there are 2 attributes (non of them is the class) and they have the exact instances. I applied the ''RemoveDuplicates" filter but nothing has been changed-ie, I still can see the attributes together with their instances. Any advice on that would be highly appreciated. 

Cheers, 
Edward

On Sat, May 1, 2021 at 5:11 PM Eibe Frank <[hidden email]> wrote:
I actually made a WEKA filter for this not that long ago so you can do this from within WEKA now:


Cheers,
Eibe

On Wed, Apr 28, 2021 at 10:26 AM Peter Reutemann <[hidden email]> wrote:
> I have some duplicate rows in my dataset. Is there any filters to remove
> these?

Not aware of a filter as part of Weka or one of its packages.
However, the ADAMS framework has a RemoveDuplicates Weka filter that
removes duplicate rows.
I recommend downloading the "adams-ml-app-snapshot" zip file:
  https://adams.cms.waikato.ac.nz/download/snapshot/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Remove duplicate instances

Eibe Frank-2
Administrator
My guess is that the class value must be the same as well. Have you tried it for that case?

Cheers,
Eibe

On Sun, May 2, 2021 at 4:48 AM Edward Wiskers <[hidden email]> wrote:
Hi Eibe, 

Assuming that there are 2 attributes (non of them is the class) and they have the exact instances. I applied the ''RemoveDuplicates" filter but nothing has been changed-ie, I still can see the attributes together with their instances. Any advice on that would be highly appreciated. 

Cheers, 
Edward

On Sat, May 1, 2021 at 5:11 PM Eibe Frank <[hidden email]> wrote:
I actually made a WEKA filter for this not that long ago so you can do this from within WEKA now:


Cheers,
Eibe

On Wed, Apr 28, 2021 at 10:26 AM Peter Reutemann <[hidden email]> wrote:
> I have some duplicate rows in my dataset. Is there any filters to remove
> these?

Not aware of a filter as part of Weka or one of its packages.
However, the ADAMS framework has a RemoveDuplicates Weka filter that
removes duplicate rows.
I recommend downloading the "adams-ml-app-snapshot" zip file:
  https://adams.cms.waikato.ac.nz/download/snapshot/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Remove duplicate instances

Edward Wiskers
HI Eibe,


My guess is that the class value must be the same as well. Have you tried it for that case?

I also tried that but the filter doesn't work. Maybe there is a bug.

Cheers, 
Edward 

Cheers,
Eibe

On Sun, May 2, 2021 at 4:48 AM Edward Wiskers <[hidden email]> wrote:
Hi Eibe, 

Assuming that there are 2 attributes (non of them is the class) and they have the exact instances. I applied the ''RemoveDuplicates" filter but nothing has been changed-ie, I still can see the attributes together with their instances. Any advice on that would be highly appreciated. 

Cheers, 
Edward

On Sat, May 1, 2021 at 5:11 PM Eibe Frank <[hidden email]> wrote:
I actually made a WEKA filter for this not that long ago so you can do this from within WEKA now:


Cheers,
Eibe

On Wed, Apr 28, 2021 at 10:26 AM Peter Reutemann <[hidden email]> wrote:
> I have some duplicate rows in my dataset. Is there any filters to remove
> these?

Not aware of a filter as part of Weka or one of its packages.
However, the ADAMS framework has a RemoveDuplicates Weka filter that
removes duplicate rows.
I recommend downloading the "adams-ml-app-snapshot" zip file:
  https://adams.cms.waikato.ac.nz/download/snapshot/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html