Duplicate Report 3.0 [MM2+3]

Download and get help for different MediaMonkey for Windows 4 Addons.

Moderators: Peke, Gurus

Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

Kevinowpb,
The script hasn't yet been modified to work with MM3, hence the [MM2] in the thread title.
Advanced Duplicate Find & Fix Find More From Same - Custom Search. | Transfer PlayStat & Copy-Paste Tags/AlbumArt between any tracks.
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!

All My Scripts
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

Updating this script to MM3 is on my todo list.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
Kevinowpb
Posts: 129
Joined: Sat Dec 22, 2007 10:18 am
Location: West Palm Beach
Contact:

Post by Kevinowpb »

done yet? LOL
:)
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

No, but it is still on my list! :)
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
sommo

Post by sommo »

Is there a MM3 of this plugin, cos I really would like it :D

Thanks
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

It's on my (incredibly long) list! :)
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
uwuerfel
Posts: 76
Joined: Tue Jan 08, 2008 3:29 pm
Location: Germany

Post by uwuerfel »

Hi Trixmoto,

I really like your script.
The idea to create a report with the possibility to do some actions on the duplicates is brilliant.

The only thing I was missing is some sort of fuzzy or approximate string search.

More often than not, I don't have exact duplicate names.
There are some differences like:
- spelling errors ("abracadabra" -> "abracababra")
or
- One title has more info than the other ("I Want You" -> "I Want You (Chica Cherry Cola)")

For this type of coparision there exist metric functions that tell you the "distance" between to strings. The distance is a natural number that more or less says that you need n operations to transform String1 into string2. An operation is s.th. like insert, delete, exchange a character.

I searched a little bit in the internet and found the following informations:

http://en.wikipedia.org/wiki/Fuzzy_string_searching
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau-Le ... n_distance
http://en.wikipedia.org/wiki/Bitap_algorithm
http://en.wikibooks.org/wiki/Algorithm_ ... plications


I took the above VBA implementation and converted it into VBS.
It's a simple VBS script, that runs in WSH.

Code: Select all

' VB Script Document
option explicit

dim text1 : text1 = "abracadabra"
dim text2 : text2 = "abarcadabra"
dim text3 : text3 = "abarcababra"
dim text4 : text4 = "MozartSinatra"

wscript.echo  damerau_levenshtein( text1, text2, 3 )
wscript.echo  damerau_levenshtein( text2, text3, 3 )
wscript.echo  damerau_levenshtein( text1, text3, 3 )
wscript.echo  damerau_levenshtein( text1, text4, 20 )

Function damerau_levenshtein( s1, s2, limit )
    ReDim result(Len(s1), Len(s2))
    damerau_levenshtein = damerau_levenshtein_recurse( s1, s2, limit, result )
end function


Function damerau_levenshtein_recurse( s1, s2, limit, result )
'This function returns the Levenshtein distance capped by the limit parameter.
'Usage : e.g. damerau_levenshtein("Thibault","Gorisse") to get the exact distance
' or damerau_levenshtein("correctly written words","corectly writen words",4) to identify similar spellings
                    
    Dim diagonal 
    Dim horizontal 
    Dim vertical 
    Dim swap 
    Dim final 
    
    
    'Start of the strings analysis
    If result(Len(s1), Len(s2)) < 1 Then
        If Abs(Len(s1) - Len(s2)) >= limit Then
            final = limit
        Else
            If Len(s1) = 0 Or Len(s2) = 0 Then
                'End of recursivity
                final = Len(s1) + Len(s2)
            Else
            
                'Core of levenshtein algorithm
                If Mid(s1, 1, 1) = Mid(s2, 1, 1) Then
                    final = damerau_levenshtein_recurse(Mid(s1, 2), Mid(s2, 2), limit, result)
                Else
                    
                    If Mid(s1, 1, 1) = Mid(s2, 2, 1) And Mid(s1, 2, 1) = Mid(s2, 1, 1) Then
                        'Damerau extension counting swapped letters
                        swap = damerau_levenshtein_recurse(Mid(s1, 3), Mid(s2, 3), limit - 1, result)
                        final = 1 + swap
                    Else
                        'The function minimum is implemented via the limit parameter.
                        'The diagonal search usually reaches the limit the quickest.
                        diagonal = damerau_levenshtein_recurse(Mid(s1, 2), Mid(s2, 2), limit - 1, result)
                        horizontal = damerau_levenshtein_recurse(Mid(s1, 2), s2, diagonal, result)
                        vertical = damerau_levenshtein_recurse(s1, Mid(s2, 2), horizontal, result)
                        final = 1 + vertical
                    End If
                End If
                
            End If
        End If
    Else
        'retrieve intermediate result
        final = result(Len(s1), Len(s2)) - 1
    End If
        
    'returns the distance capped by the limit
    If final < limit Then
        damerau_levenshtein_recurse = final
        'store intermediate result
        result(Len(s1), Len(s2)) = final + 1
    Else
        damerau_levenshtein_recurse = limit
    End If
    
End Function

First I hoped, the comparison in your script is a function, that I could easily exchange with the above function, but then I discovered, that you use dictionary objects...
So to integrate the above function would mean a major change in your code structure, and I wasn't sure If you would like it when I copletely change your script...

So therefore, if you like my idea, maybe you can integrate the above function in your great script for the next release?

I would really love it :-)


CIAo, uwe..
Autor of Radio-DJ
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

Yeah, fuzzy matching is certainly something that I plan to invest some time in and probably add to a number of my scripts once I've got something working well. Thanks for doing this work for me, I'll certainly try to make good use of it. :)
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
chester
Posts: 26
Joined: Sun Jul 15, 2007 1:58 pm

Bug

Post by chester »

Hi Trixmoto

Really love this script! However, I get this bug when I try to flag a lot of files (500+).

Heres the errors:


Image

Image

By the way, I'm using MM3 and your version 2.1. Oh, and it's first now that I see it isn't even supposed to work in MM3. Sorry!
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

Yeah, it's not ready for MM3 yet, but it is on my list to update. :)
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

New version (2.2) is now available to download from my website. Changes include...

- Made compatible with MM3
- Fixed commit errors when deleting dupes
- Added option to fuzzy match titles (thanks to Uwuerfel)

The fuzzy matching is based on the Damerau-Levenshtein distance algorithm. Please increase the "distance" value slowly as it will add a significant delay to the processing. There is a timeout though so it shouldn't ever jam the script.
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
nynaevelan
Posts: 5559
Joined: Wed Feb 07, 2007 11:07 pm
Location: New Jersey, USA
Contact:

Post by nynaevelan »

Are you trying to fake me out?? :-? I get the following error when trying to download the script:

Code: Select all

File does not exist. Make sure you specified correct file name.
Am I moving too fast??

Nyn
3.2x - Win7 Ultimate (Zen Touch 2 16 GB/Zen 8GB)
Link to Favorite Scripts/Skins

Join Dropbox, the online site to share your files
Bex
Posts: 6316
Joined: Fri May 21, 2004 5:44 am
Location: Sweden

Post by Bex »

I'm having plans to rob your script on the Damerau-Levenshtein distance algorithm. :lol:
But it cant be downloaded from your site:
File does not exist. Make sure you specified correct file name.
Advanced Duplicate Find & Fix Find More From Same - Custom Search. | Transfer PlayStat & Copy-Paste Tags/AlbumArt between any tracks.
Tagging Inconsistencies Do you think you have your tags in order? Think again...
Play History & Stats Node Like having your Last-FM account stored locally, but more advanced.
Case & Leading Zero Fixer Works on filenames too!

All My Scripts
trixmoto
Posts: 10024
Joined: Fri Aug 26, 2005 3:28 am
Location: Hull, UK
Contact:

Post by trixmoto »

My apologies, I forgot to actually upload the files - a minor flaw! :oops:

@Bex - the algorithm works well if the two strings are actually similar (ie: Monkey and Mokney), but if they are completely different and you use a distance above about 4 it just takes forever (literally hours in some cases) - that's why I put in the "CheckFuzzy" function and the timeout. If you find a way to improve it, please let me know! :)
Download my scripts at my own MediaMonkey fansite.
All the code for my website and scripts is safely backed up immediately and for free using Dropbox.
tbok
Posts: 9
Joined: Tue Dec 05, 2006 9:37 am

SQL errors

Post by tbok »

Hi trixmoto,

thanks for a great script - I love the concept, and finding the dupes is working well with my collection, but I get SQL errors when trying to do the processing (removal of dupes).

This is what I get:

Image

Any idea what the problem might be? :(

I'm using MM version 3.0.3.1140

Thanks.
Post Reply