Duplicate record elimination

Post a reply

In an effort to prevent automatic submissions, we require that you complete the following challenge.
:D :) :( :o :-? 8) :lol: :x :P :oops: :cry: :evil: :roll: :wink:

BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON

Topic review

Expand view Topic review: Duplicate record elimination

Re: Duplicate record elimination

by dtsig » Sun Jan 03, 2016 1:42 pm

indiramovilla .. so that I understand .. you are talking about removing tracks from *albums* that are also on other *albums*. Is this true? .. So you would delete 'Lady Jane' from the Rollingstones Flowers album because it is also on their Aftermath album and is the same (bit etc)?
Or do you actually have duplicates of say 'Lady Jane' on the Flowers album?

Re: Duplicate record elimination

by Peke » Sun Jan 03, 2016 1:24 pm

Nice thing to read thank you for posting, it rise interesting questions about duplicate handling.

Personally I'm thinking that full automatic Duplicate decision is bad, but semi one can be archived with Scripting.

Advanced Duplicate find and fix is one of them.

Duplicate record elimination

by indiramovilla » Sun Jan 03, 2016 1:21 pm

As a user of MediaMonkey for the last 8 years, and a database developer by profession, I have been concerned with the issue of duplicate record elimination for many years now.

Some, perhaps, may discount the importance of that issue, because to them the only cost is some waste in disk space, and that cost is insignificant, these days.

However, for me the issue is different. I needed a utility program that would allow me to identify duplicates, and from the set of those records to automatically choose the one with the highest quality, or bitrate, or some other criterion.

To that effect, I researched what programs were available, and about a year ago I chose one called Similarity. After a year’s use, I am very satisfied with the program, and for that reason I wanted to share my experience with other MM users via this forum.

Like MediaMonkey, Similarity is very efficient with large databases and all file types. My own database has 130,000 records, and Similarity can complete a re-scan that includes 10,000 new records in under 3 hours. For those with much larger databases, I was told by one of Similarity Developers that Similarity’s forthcoming new version will be strong enough to handle efficiently databases of over a million records.

After having located the duplicate groups, Similarity analyzes those files to determine the quality rating of each recording. It then chooses automatically from each duplicate group which file to keep and which to delete based on predetermined criteria and priorities. Obviously, the quality of each recording is the most important criterion, with the possible exception of old, historical recordings. Other factors include bitrate, sampling rate, size, length, and location (incoming vs. permanent location of database).

All in all, with the help of Similarity I have brought my music database in optimal shape, and every time I add new recordings, I let Similarity doing the thinking and deciding about which records to keep, and which to delete. So, when I am listening to music, I am sure to be listening to the best available recording of each particular piece.

Naturally, if same capability can be archived with MMW I'm open to learn.