Duplicate Report 3.0 [MM2+3]
Kevinowpb,
The script hasn't yet been modified to work with MM3, hence the [MM2] in the thread title.
The script hasn't yet been modified to work with MM3, hence the [MM2] in the thread title.
Advanced Duplicate Find & Fix Find More From Same - Custom Search. | Transfer PlayStat & Copy-Paste Tags/AlbumArt between any tracks.
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!
All My Scripts
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!
All My Scripts
Updating this script to MM3 is on my todo list.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
No, but it is still on my list!
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
It's on my (incredibly long) list!
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
Hi Trixmoto,
I really like your script.
The idea to create a report with the possibility to do some actions on the duplicates is brilliant.
The only thing I was missing is some sort of fuzzy or approximate string search.
More often than not, I don't have exact duplicate names.
There are some differences like:
- spelling errors ("abracadabra" -> "abracababra")
or
- One title has more info than the other ("I Want You" -> "I Want You (Chica Cherry Cola)")
For this type of coparision there exist metric functions that tell you the "distance" between to strings. The distance is a natural number that more or less says that you need n operations to transform String1 into string2. An operation is s.th. like insert, delete, exchange a character.
I searched a little bit in the internet and found the following informations:
http://en.wikipedia.org/wiki/Fuzzy_string_searching
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau-Le ... n_distance
http://en.wikipedia.org/wiki/Bitap_algorithm
http://en.wikibooks.org/wiki/Algorithm_ ... plications
I took the above VBA implementation and converted it into VBS.
It's a simple VBS script, that runs in WSH.
First I hoped, the comparison in your script is a function, that I could easily exchange with the above function, but then I discovered, that you use dictionary objects...
So to integrate the above function would mean a major change in your code structure, and I wasn't sure If you would like it when I copletely change your script...
So therefore, if you like my idea, maybe you can integrate the above function in your great script for the next release?
I would really love it
CIAo, uwe..
I really like your script.
The idea to create a report with the possibility to do some actions on the duplicates is brilliant.
The only thing I was missing is some sort of fuzzy or approximate string search.
More often than not, I don't have exact duplicate names.
There are some differences like:
- spelling errors ("abracadabra" -> "abracababra")
or
- One title has more info than the other ("I Want You" -> "I Want You (Chica Cherry Cola)")
For this type of coparision there exist metric functions that tell you the "distance" between to strings. The distance is a natural number that more or less says that you need n operations to transform String1 into string2. An operation is s.th. like insert, delete, exchange a character.
I searched a little bit in the internet and found the following informations:
http://en.wikipedia.org/wiki/Fuzzy_string_searching
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau-Le ... n_distance
http://en.wikipedia.org/wiki/Bitap_algorithm
http://en.wikibooks.org/wiki/Algorithm_ ... plications
I took the above VBA implementation and converted it into VBS.
It's a simple VBS script, that runs in WSH.
Code: Select all
' VB Script Document
option explicit
dim text1 : text1 = "abracadabra"
dim text2 : text2 = "abarcadabra"
dim text3 : text3 = "abarcababra"
dim text4 : text4 = "MozartSinatra"
wscript.echo damerau_levenshtein( text1, text2, 3 )
wscript.echo damerau_levenshtein( text2, text3, 3 )
wscript.echo damerau_levenshtein( text1, text3, 3 )
wscript.echo damerau_levenshtein( text1, text4, 20 )
Function damerau_levenshtein( s1, s2, limit )
ReDim result(Len(s1), Len(s2))
damerau_levenshtein = damerau_levenshtein_recurse( s1, s2, limit, result )
end function
Function damerau_levenshtein_recurse( s1, s2, limit, result )
'This function returns the Levenshtein distance capped by the limit parameter.
'Usage : e.g. damerau_levenshtein("Thibault","Gorisse") to get the exact distance
' or damerau_levenshtein("correctly written words","corectly writen words",4) to identify similar spellings
Dim diagonal
Dim horizontal
Dim vertical
Dim swap
Dim final
'Start of the strings analysis
If result(Len(s1), Len(s2)) < 1 Then
If Abs(Len(s1) - Len(s2)) >= limit Then
final = limit
Else
If Len(s1) = 0 Or Len(s2) = 0 Then
'End of recursivity
final = Len(s1) + Len(s2)
Else
'Core of levenshtein algorithm
If Mid(s1, 1, 1) = Mid(s2, 1, 1) Then
final = damerau_levenshtein_recurse(Mid(s1, 2), Mid(s2, 2), limit, result)
Else
If Mid(s1, 1, 1) = Mid(s2, 2, 1) And Mid(s1, 2, 1) = Mid(s2, 1, 1) Then
'Damerau extension counting swapped letters
swap = damerau_levenshtein_recurse(Mid(s1, 3), Mid(s2, 3), limit - 1, result)
final = 1 + swap
Else
'The function minimum is implemented via the limit parameter.
'The diagonal search usually reaches the limit the quickest.
diagonal = damerau_levenshtein_recurse(Mid(s1, 2), Mid(s2, 2), limit - 1, result)
horizontal = damerau_levenshtein_recurse(Mid(s1, 2), s2, diagonal, result)
vertical = damerau_levenshtein_recurse(s1, Mid(s2, 2), horizontal, result)
final = 1 + vertical
End If
End If
End If
End If
Else
'retrieve intermediate result
final = result(Len(s1), Len(s2)) - 1
End If
'returns the distance capped by the limit
If final < limit Then
damerau_levenshtein_recurse = final
'store intermediate result
result(Len(s1), Len(s2)) = final + 1
Else
damerau_levenshtein_recurse = limit
End If
End Function
So to integrate the above function would mean a major change in your code structure, and I wasn't sure If you would like it when I copletely change your script...
So therefore, if you like my idea, maybe you can integrate the above function in your great script for the next release?
I would really love it
CIAo, uwe..
Autor of Radio-DJ
Yeah, fuzzy matching is certainly something that I plan to invest some time in and probably add to a number of my scripts once I've got something working well. Thanks for doing this work for me, I'll certainly try to make good use of it.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
Bug
Hi Trixmoto
Really love this script! However, I get this bug when I try to flag a lot of files (500+).
Heres the errors:
By the way, I'm using MM3 and your version 2.1. Oh, and it's first now that I see it isn't even supposed to work in MM3. Sorry!
Really love this script! However, I get this bug when I try to flag a lot of files (500+).
Heres the errors:
By the way, I'm using MM3 and your version 2.1. Oh, and it's first now that I see it isn't even supposed to work in MM3. Sorry!
Yeah, it's not ready for MM3 yet, but it is on my list to update.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
New version (2.2) is now available to download from my website. Changes include...
- Made compatible with MM3
- Fixed commit errors when deleting dupes
- Added option to fuzzy match titles (thanks to Uwuerfel)
The fuzzy matching is based on the Damerau-Levenshtein distance algorithm. Please increase the "distance" value slowly as it will add a significant delay to the processing. There is a timeout though so it shouldn't ever jam the script.
- Made compatible with MM3
- Fixed commit errors when deleting dupes
- Added option to fuzzy match titles (thanks to Uwuerfel)
The fuzzy matching is based on the Damerau-Levenshtein distance algorithm. Please increase the "distance" value slowly as it will add a significant delay to the processing. There is a timeout though so it shouldn't ever jam the script.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
-
- Posts: 5559
- Joined: Wed Feb 07, 2007 11:07 pm
- Location: New Jersey, USA
- Contact:
Are you trying to fake me out?? I get the following error when trying to download the script:
Am I moving too fast??
Nyn
Code: Select all
File does not exist. Make sure you specified correct file name.
Nyn
3.2x - Win7 Ultimate (Zen Touch 2 16 GB/Zen 8GB)
Link to Favorite Scripts/Skins
Join Dropbox, the online site to share your files
Link to Favorite Scripts/Skins
Join Dropbox, the online site to share your files
I'm having plans to rob your script on the Damerau-Levenshtein distance algorithm.
But it cant be downloaded from your site:
But it cant be downloaded from your site:
File does not exist. Make sure you specified correct file name.
Advanced Duplicate Find & Fix Find More From Same - Custom Search. | Transfer PlayStat & Copy-Paste Tags/AlbumArt between any tracks.
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!
All My Scripts
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!
All My Scripts
My apologies, I forgot to actually upload the files - a minor flaw!
@Bex - the algorithm works well if the two strings are actually similar (ie: Monkey and Mokney), but if they are completely different and you use a distance above about 4 it just takes forever (literally hours in some cases) - that's why I put in the "CheckFuzzy" function and the timeout. If you find a way to improve it, please let me know!
@Bex - the algorithm works well if the two strings are actually similar (ie: Monkey and Mokney), but if they are completely different and you use a distance above about 4 it just takes forever (literally hours in some cases) - that's why I put in the "CheckFuzzy" function and the timeout. If you find a way to improve it, please let me know!
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
SQL errors
Hi trixmoto,
thanks for a great script - I love the concept, and finding the dupes is working well with my collection, but I get SQL errors when trying to do the processing (removal of dupes).
This is what I get:
Any idea what the problem might be?
I'm using MM version 3.0.3.1140
Thanks.
thanks for a great script - I love the concept, and finding the dupes is working well with my collection, but I get SQL errors when trying to do the processing (removal of dupes).
This is what I get:
Any idea what the problem might be?
I'm using MM version 3.0.3.1140
Thanks.