Topic: Weighted Top Album Rating - Expression Help (Read 5495 times)

vagskal · « **on:** December 20, 2012, 12:58:23 pm »

I have recently added information from several all time top albums lists (see http://www.besteveralbums.com/index.php for one source for this information). Now I am wondering how I can get a weighted value to show with reasonable accuracy how well an album is doing on those lists. The problem is that the lists are of various length, containing the top 50, 75, 100, 200, 500 and 1,000 top albums ever with a numbered rank for each album, i.e. no. 3 of 50 or no. 763 of 1,000, so I would need some kind of weighted value that would ideally take into account also if an album is included in the unnumbered list of 1,001 albums to hear before you die. If the 1-5 star rating in half star increments by All Music Guide (Rovi) could be taken into account as well, it would be great.

I am not good at math or statistics and could not figure it out by myself so I thought I would asked all the smart people here for any ideas how to create a reasonable value for the relative/weighted popularity of an album. Feeling up to the challenge, MrC?

If an album is in a numbered list, the number on the list for that album is entered in an integer type field, and the AMG 1-5 star rating is entered in a decimal type field (0.5-5). If the album is in the unnumbered list of 1,001 albums to hear before you die, 1 is entered in a particular integer type field in MC.

MrC · « **Reply #1 on:** December 20, 2012, 02:16:32 pm »

When you have various population sizes, you need to normalize the pools. You can either scale all values to the largest pool size (to work in integers), or unitize by dividing each value by their respective pool size (to work in fractions between 0 and 1). Then, you can add each sample's ranks and divide by the number of pools to get a overall weighted rank.

You'll have to decide what bias to apply to the 1001 albums, probably as a percentage, and include that in the calculation.

I'm not certain about this: "how well an album is doing on those lists.". Are you asking to compare the overall weighted rank against each original list again (i.e. to see how your own biased rank relates to the original rank)?

More info:

http://en.wikipedia.org/wiki/Weighted_mean

vagskal · « **Reply #2 on:** December 20, 2012, 06:53:44 pm »

Thanks for the reply!

I found that wiki page while googling before posting but did not understand what to do in my case.

I am not trying to weigh anything against my own preferences/ratings, just use the ratings in those list. Is there a reasonable solution if we forget about the 1,001 albums unnumbered list and the .5-5 AMG listing for the time being? Presume that there is no preferred list of albums, i.e. the numbered list with 50, 75, 100 etc items have equal weight so that no. 1 of a list with 50 items has equal weight as no. 1 of a list with 1,000 items, but no. 49 of the first list has more weight than no. 999 of the second list.

MrC · « **Reply #3 on:** December 20, 2012, 10:06:35 pm »

For this discussion, I'll assume that larger values are better, so we're talking about ratings vs. rankings.

What I meant by your own preference, was:

  - what weight you'd assign to items for the non-rated 1001 list (sounds like we can ignore this for now)

  - how do you evaluate items on some lists, but not others - are they absent because they weren't rated, or because they didn't make the cut? These could be forced to the min values, or just ignored in the overall computation. Either way, the basic formula per track would be:

  (value_L1 / size_L1) + (value_L2 / size_L2) + ... + (value_Ln / size_Ln)

divided by

   the number of values

To deal with missing values, they could be set to 0.

Since the list sizes are different, a point of correction: a 1 from one list will not have equal weight with a 1 from another list of different size. They each have a normalized value of 1 / list max value.

The above formulas assume that each list contributes 1/n'th portion. But you could add a weighting factor to each list: list_1 might be a heavyweight, contributing 50% to the total, the others 25% each. In that case, you'd multiple the normalized weighting factor against each summand above.

Example lists, sizes and values:

Size 1 - 5 1 - 100 1 - 300
Value 2.5 20 90

Normalized values:

  (2.5 / 5) + (20 / 100) + (90 / 300)
   .50 + .20 + .30

Unweighted mean (each value contributes 1/list size):

(.50 * .33) + (.20 * .33) + (.30 * .33)
   .165 + .066    .1
.33

So on a 1 - 10 scale, this puts .33 at 3.3.

For our Weighted mean (with our example 50% : 25% : 25% weighting ratio), each value's contribution is biased:

   (.50 * .5) + (.20 * .25) + (.30 * .25)
   .25 + .05 + .075
   .375

or 3.75 on that 1 - 10 scale.

If you wanted to add in your 1001 list, with some bias, that list really is a list of max size 1 (with 0 or 1 values), and you could then use the above weighted mean formula, with say a weighting factor of maybe 10%, a slight bias.

vagskal · « **Reply #4 on:** December 21, 2012, 04:09:36 am »

Thanks!

Quote from: MrC on December 20, 2012, 10:06:35 pm

For this discussion, I'll assume that larger values are better

It is actually the other way around, album no. 1 is the very best album. Since you stated this I assume a fix is not as easy as sorting the normalized mean values descending instead of ascending.

If an album does not have a value for a list it did not make that list, so I think a 0 value for that list is fine when dividing with the total number of lists.

I think the normalized values will be enough for the numbered lists with no need to weigh those lists against each other. Would it be possible to use these values and somehow add a value for the AMG 0.5-5 rating heavily downplayed and a value if the album is on the 1,001 albums list?

MrC · « **Reply #5 on:** December 21, 2012, 05:41:56 pm »

The arbitrary human value of a rating isn't important for doing the math, only that all lists agree on the same ordering. I mentioned it because I was thinking that large-valued AMG ratings were good, whereas top N lists are typically the converse.

For the AMG ratings - what is the minimal increment interval? Is it .5, yielding 10 values? If an arbitrary decimal, it will have to be limited to some number of decimal places so that the value can be scaled to a finite, max integer. Once scaled, then the weighted rating just falls in line with the above formula.

vagskal · « **Reply #6 on:** December 22, 2012, 04:22:10 am »

Thanks!

I am still struggling with this. I did state at the beginning that I am not good at math.

This is how far I have come:
FormatNumber(Math((((FormatNumber([Esq. 75 INTERNT])/75)+(FormatNumber([Guardian 100 INTERNT])/100)+(FormatNumber([Melody Maker 100 INTERNT])/100)+(FormatNumber([Mojo 100 INTERNT])/100)+(FormatNumber([NARM 200 INTERNT])/200)+(FormatNumber([NME 100 INTERNT])/100)+(FormatNumber([Q 100 INTERNT])/100)+(FormatNumber([RSM INTERNT])/500)+(FormatNumber([VH1 100 INTERNT])/100)+(FormatNumber([Virgin INTERNT])/1000))/10+((650/1000)*FormatNumber([1001 Albums])/10))*1000),2)

As you can see I have put the same weight to an album in the unnumbered 1,001 albums list as if that album was no. 650 on a list with 1,000 entries.

I must reverse the order for the numbered lists (no. 1 is the very best) otherwise the normalized mean value will not be correct - The Beach Boys' Pet Sound which is usually on top of the lists is way down the list with normalized means, for example. It was not as easy as doing (101-[Some Numbered 100 List])/100, because the field can be empty. Do I have to enter repeated tests for empty fields or is there an easier way?

As for the AMG ratings they are between 1 and 5 in .5 increments, i.e. 9 different values (at least I do not have a .5 star album). 5 is the very best.

MrC · « **Reply #7 on:** December 22, 2012, 07:22:50 pm »

First, I've corrected some egregious typos and errors above. Let's ignore those and I'll restate how to proceed here, showing examples.

But let's simplify.

- Instead of inverting many lists, let's just invert the AMG list.
- Convert all lists to range from 0 to n, where a 0 value is worst, n is best
- Use the max value of a list as our normalizing scaling factor
- Use the weighted mean formula, setting the scale factors appropriately
- I've use simple list names - change to your own as necessary.

AMG now ranges from 0 to 5, at .5 increments, scaled as 0 to 10 with an increment of 1. Inverting, means a raw score of .5 (scaled 1) value will be a 4.5 (scaled 9) value.

Using the original example scores from above plus another sample, attached Example 1 shows the raw scores, AMG scaled and inverted scores, normalized scores, and both unweighted and weighted means.

We'll simply use the weighted score formula, setting the weights to be equal. Then, you can just change the weights if you want later, easily enough.

The formula for the weighted scores is then simple:

For all n's, the sum of all (weighted_scale_factor_n * normalized score_n)

For AMG, the normalized scores are calculated as:

   (max scaled AMG - scaled AMG) / max AMG value
   = (10 - ([AMG] * 2)) / 10

All other normalized scores are calculated as:

   list score / max list value

In MC, here's the calculation:

math(
   (.33 * ((10 - ([_AMG] * 2)) // 10)) +
   (.33 * ([_L2] // 100 )) +
   (.33 * ([_L3] // 300 ))
   )

Now, let's include the 1001 list. It is simply a list of 0 or 1 values. You wanted to set the default value as 650/100, so that's equivalent to .65 normalized. And now that we have 4 lists, and each list's score contributes equally in weight (25%), we can just scale .65 by 25% to get a weight factor of .1625 for the 1001 list. If the 1001 list has a weight of .1625, all the other lists must contribute (1 - .1625) / 3 or 0.2791666667.

Example 2 shows some sample calculated values.

So the new formula for our 4 lists is now:

math(
   (.2791666667 * ((10 - (formatnumber([_AMG],2) * 2)) // 10)) +
   (.2791666667 * (formatnumber([_L2]) // 100 )) +
   (.2791666667 * (formatnumber([_L3]) // 300 )) +
   (.1625 * formatnumber([_L4]))
   )

You can use calculated values for the weights if you want, rather than the hardcoded values I used - I wanted to simplify how the expression appear.

If you want to know how your new calculation compares to any given list, multiple it by the list size. For AMG, you'll have to multiply, invert, and then scale by .5.

vagskal · « **Reply #8 on:** December 23, 2012, 03:20:40 am »

Thanks, MrC! You have put a lot of work into this, as usual. I am grateful.

But I am still struggling with just the numbered lists. The overall (compound) value does not seem right to me. I thought it had something to do with 0 being the best overall value when the lists are not inverted. The album Pet Sounds appears on all numbered lists with low values (indicating it is a very good album), but an album that appears on only one list as number 270 of 500 of course gets a better/lower overall rating if the numbered lists are not inverted.

I have tried to illustrate the issue in the attached image (using comma as the decimal point).

I want to invert the numbered lists also for other reasons (I want to use the overall ratings in thumbnail text and in groupings and there it is easier to see and understand if a high value means a highly rated album, and I want to sort by the overall rating also in views where there are albums that do not appear on any list without changing the global Sort empty strings last setting).

Is there an easy way to invert the numbered lists without having to enter repeated tests for empty fields in the already large expression?

As you can see in the attached image I, for the inverted lists, added 1 to the max number - ((Max no + 1) - Position) / Max no - since I thought even the worst album on a list should have some credit for just entering the list. Is that correct math?

Thanks again!

vagskal · « **Reply #9 on:** December 23, 2012, 06:20:26 am »

OK, I think I have got it now. I had to test for empty fields for every list since I could not find an easier way:

FormatNumber(Math(((If(IsEmpty([Esq. 75 INTERNT]),0,/(76-[Esq. 75 INTERNT]/)//75)+If(IsEmpty([Guardian 100 INTERNT]),0,/(101-[Guardian 100 INTERNT]/)//100)+If(IsEmpty([Melody Maker 100 INTERNT]),0,/(101-[Melody Maker 100 INTERNT]/)//100)+If(IsEmpty([Mojo 100 INTERNT]),0,/(101-[Mojo 100 INTERNT]/)//100)+If(IsEmpty([NARM 200 INTERNT]),0,/(201-[NARM 200 INTERNT]/)//200)+If(IsEmpty([NME 100 INTERNT]),0,/(101-[NME 100 INTERNT]/)//100)+If(IsEmpty([Q 100 INTERNT]),0,/(101-[Q 100 INTERNT]/)//100)+If(IsEmpty([RSM INTERNT]),0,/(501-[RSM INTERNT]/)//500)+If(IsEmpty([VH1 100 INTERNT]),0,/(101-[VH1 100 INTERNT]/)//100)+If(IsEmpty([Virgin INTERNT]),0,/(1001-[Virgin INTERNT]/)//1000)+((1001-650)/1000)*FormatNumber([1001 Albums]))/10)*1000),2)

It would sure be easier to debug if the math() function could return Expression Error instead of 0 when something is wrong.

And here are the overall top albums (see the image). The first column contains the compound ranking and the second column shows on how many lists the album appears.

Thank you for your help, MrC!

MrC · « **Reply #10 on:** December 24, 2012, 05:27:25 am »

You can use formatnumber() on an empty value to give you zero. This avoids your If(isempty()) test.

No need to escape parens inside Math().

A trick on Math() debugging. Make the function name Math bogus, like Xath, and then the expression column will show you MC's substitutions. It makes at least reading the formula possible. This helps catch forgotten escaped forward slashes and missing values due to empty field replacements.

The problem you describe in the previous post, about outlier sample scores, or missing sample scores, usually requires probability functions or techniques such as tossing out the high and low scores, and taking what remains, or more complex techniques. Some require the data set, so you're not easily going to do that in MC (w/out global vars anyway).

In your case, since the lists are incongruous wrt. time on a list, rating periods, etc., you'll just have to make your own judgement calls as to how you want to weight or score such items.

For your needs, inverting a score x in simple linear range from min to max is done by using the formula: (max - x) + min.

vagskal · « **Reply #11 on:** December 24, 2012, 08:57:17 am »

Thanks for the reply!

Quote from: MrC on December 24, 2012, 05:27:25 am

You can use formatnumber() on an empty value to give you zero. This avoids your If(isempty()) test.

I found that, with FormatNumber() giving me 0 for empty fields, the IsEmpty() tests were necessary for this bit: (76-FormatNumber([Esq. 75 INTERNT]))/75, i.e. to avoid having (76-0)/75 calculated; omitting the FormatNumber() resulted in giving me 0 for not just that test inside the Math() function but for the whole Math() function.

Quote from: MrC on December 24, 2012, 05:27:25 am

No need to escape parens inside Math().

I found that it was necessary when the Math() function included If() functions with parens and / inside.

Quote

A trick on Math() debugging. Make the function name Math bogus, like Xath, and then the expression column will show you MC's substitutions. It makes at least reading the formula possible. This helps catch forgotten escaped forward slashes and missing values due to empty field replacements.

Thanks, I did not know that.

(I found that forward slashes do not have to be escaped inside the math() function, unless they are there inside another function like an if() function.)

Quote from: MrC on December 24, 2012, 05:27:25 am

The problem you describe in the previous post, about outlier sample scores, or missing sample scores, usually requires probability functions or techniques such as tossing out the high and low scores, and taking what remains, or more complex techniques. Some require the data set, so you're not easily going to do that in MC (w/out global vars anyway).

In your case, since the lists are incongruous wrt. time on a list, rating periods, etc., you'll just have to make your own judgement calls as to how you want to weight or score such items.

For your needs, inverting a score x in simple linear range from min to max is done by using the formula: (max - x) + min.

Thanks. I think what I have now is good enough for my purposes.

Merry Christmas!

PS. Now I was thinking of how I could add up the compound album scores for an album artist to have a ranking of album artists. I guess I will have to look at other threads for ideas for creative use of global variables.

INTERACT FORUM

Author Topic: Weighted Top Album Rating - Expression Help (Read 5495 times)

vagskal

Weighted Top Album Rating - Expression Help

MrC

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help

MrC

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help

MrC

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help

MrC

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help

MrC

Re: Weighted Top Album Rating - Expression Help

vagskal

Re: Weighted Top Album Rating - Expression Help