Please Implement This: Intrinsic Performance Ratings

Flourish

Would it be possible to give a "intrinsic performance rating" (IPR) on the Elo scale on a per-game basis?

This paper outlines the methodology behind calculating an intrinsic performance rating (http://www.cse.buffalo.edu/~regan/papers/pdf/ReHa11c.pdf).

Because we already have an engine analyzing each move with the analysis button, it should be trivially easy to also give calculate the IPR from that analysis.

Couple of advantages:
1) It would give a good estimate of one's "true skill", which is biased in one way or another by the quality of the chess pool on any given site.

2) It would help quickly identify cheaters. People with an IPR of, say, 3000, should be flagged for cheating.

Hellball

From reading a couple of paragraphs, it appears the paper is built around the Elo system.

Lichess uses Glicko-2. I'm not sure how 'portable' the metric you describe is.

Toadofsky

I'm very curious whether IPR can be applied to lichess tactics puzzles to differentiate between good and bad moves. In my spare time I'm seeing whether my idea would be feasible...

Flourish

I'm quite confident Elo and Glicko can be converted to each other quite easily.

NorwegianHobbyplayer

Cool idea!

Toadofsky

Flourish, that paper references Regan's personal web site, stating that the code and methodology are published there.

I cannot find the code or methodology. How do you believe this change to be trivial? (What am I overlooking here?)

static_shadow

How could ELO and Glicko-2 be converted quite easily? Glicko-2 accounts for rating volatility and so the amount one gains/loses in any given match depends on the volatility of each person's rating. It isn't a simple matter of just subtracting a certain amount of points, it factors in rating confidence (the original Glicko system applied a Rating Deviation to the end of a rating to describe confidence in the rating, however Glicko-2 factors this confidence into the actual rating calculations themselves). Because of the calculation of volatility, I'm not sure how you could easily convert said ratings, they aren't compatible with each other in the slightest.

There are two other issues I can think of. First of all, not every game is analyzed by the engine, it's only analyzed when someone requests the analysis. hundreds if not thousands of games every day are left without engine analysis conducted. The other problem is that the engine analysis is limited, and therefore at times inaccurate. The same engine running on someone's computer independently at a depth of say 15 might produce more accurate results. While at lower levels this isn't very important, at higher levels it might falsely flag cheaters, and there might be a huge blowback by top players being told their "true" rating is far below what their rating against the current pool of players indicates and that weaker players' "true rating" is actually substantially higher simply because of inaccurate analysis.

Any ideas on how those 2 issues could be accounted for?

Shipwreck

I think it would be great! I don't see the need to combine this with ratings though. Just leave the ratings of the site as they are, this is more adequate as an addition to the *average centipawn loss* stat; of course it would be great to have the overall average across games as well.

static_shadow

ok, now that's something I think is interesting Shipwreck. It would be neat to track average centipawn loss over all of someone's games. But again there is the pesky problem I mentioned that not all games are analyzed by the engine. For something like that to be accurate the system would have to force analysis at the end of every game, and that would strain resources quite a bit.

yanez

#10

This topic has been archived and can no longer be replied to.