lichess.org
Donate

How Elo Ratings Actually Work

@CheckRaiseMate This was a great post. I have ratings turned off. The biggest issue for me isn’t losing rating points...I do play quite a few casual games, it’s just losing. Losing to a much better player is expected and therefore ok, when I play like a fool, it is unbearable even though there’s nothing at stake. Crazy, I know.
I mostly agree with the author, yet with two reservations. Other user (Jezzat) has already mentioned one of them in #3. Having a peak blitz rating above 2800 is not very relevant when one is rated around 2675, and even less relevant when we recall a few players with dubious reputation (if not largely unknown), who reached incredibly high ratings by playing their friends. (There were such cases.)
As for the Lichess ratings, they in many cases provide the only information which I have about the opponent at the start of the game. Say, if someone has a rapid rating 300 points higher than a blitz rating and yet another 300 points higher than the bullet rating, I am more likely to play Berlin with black, pay more attention to the clock and not rematch. (Many of such players are just older or slower for various legitimate reasons, but I have seen too many closed accounts which looked like that. As for Berlin, it is a part of my regular repertoire, but also the only opening where I have chances against Stockfish.) When playing blitz without increment, it might be good to know the opponent's bullet rating, at least to understand who is likely to do better when both sides drop to the last 30 seconds. And the look at the opponent's ratings also helps me to decide whether I should berserk or not. (That said, I rarely do that, as I do not like bullet.) And opponents' titles also give me some information. I almost never play just for flagging against GMs, WGMs or IMs, but might occasionally do that when the opponent has no title or hides it. The ratings or titles can be misleading in many ways, but when you have no other information about the opponent and like to vary your play depending on whom you are playing, they might provide at least some vague information, which is still better than nothing if you know how to handle it.
My question is how to calculate the winnind losing and drawing odds based on rating?

Let's say i am 1500 elo and my opponent is 1800 elo. How to calculate the odds of the game?

Thanks!

As you can see my username is based on ELO, because i love the system but i am beginner, don't understand it.
Thanks for the article! I've been on the lookout for something like this for a long time.

The article raises some questions:

1) Is there any rational for adopting the logistics curve to predict play result (other than the fact that it has the right shape?) For instance, to take up your war analogy (the card game), would a game of War between two players with decks of unequal quality (how to quantify?) produce the expected win prediction?

2) I'm guessing that the K factor used to update ratings will decrease as the number of rated games goes up. How is K calculated?

3) Do we have numbers showing the accuracy of predicted performance based on the Elo rating system?

4) Have investigations been made to determine whether on not alternative rating systems have superior predictive power to Elo?

Thanks again
@RealDavidNavara Honored that you took the time to comment! I'm a little surprised though that you take so much trouble to maximize your online blitz results. I tend to see online blitz as mostly practice. Is there any reward for having a higher Lichess rating?
@Jane_Blond said in #14:
> Thanks for the article! I've been on the lookout for something like this for a long time.
>
> The article raises some questions:
>
> 1) Is there any rational for adopting the logistics curve to predict play result (other than the fact that it has the right shape?) For instance, to take up your war analogy (the card game), would a game of War between two players with decks of unequal quality (how to quantify?) produce the expected win prediction?

Elo gave some justifications, but I'm not sure how much sense they really make. It seems to make little difference what kind of curve you use as long as it's sort of bell-shaped.

>
> 2) I'm guessing that the K factor used to update ratings will decrease as the number of rated games goes up. How is K calculated?
>

Different federations use different rules. Most do decrease the k-factor in some way with more games or higher rating.

> 3) Do we have numbers showing the accuracy of predicted performance based on the Elo rating system?
>
> 4) Have investigations been made to determine whether on not alternative rating systems have superior predictive power to Elo?
>
> Thanks again

Yes, there have been some studies. It seems it works pretty well, but there are some quirks. I've heard the lower-rated player slightly outperforms their expectation based on the system. The Kaggle competition linked in the article has many attempts at improving the system.
did someone say "accuracy". what accuracy.. with respect to what referential.

sorry, but even (and maybe foremost) engine play, has no measure system to be able to figure out if ELO is chess accuracy (hear, if we had the full legal tree at the tip of our hands, and could compute all that backward to the position of interest, kind of rigorous accuracy about perfect play).

it is purely a "averaging" statistics based on large populations of games and players in a given pool, with some random pairing schemes (and I would argue the less tiered the better, i am for well stirred tanks in general, my motto... kidding). You could be playing aki in pairings (if that was a competitive game).

But I guess some have tried to link ELO to something tangible in chess land.. ELOMETER web site might give those asking a taste. @toscani in some recent thread, also mentioned another one.. But in case of ELOMETER, as usual, it is individual queries only for output, but population data not part of it (not public output), as far as i know. The site has been around for a while, and it keeps accumulating interesting population data, but the market is about individuals, wanting their measure. Anonymity on the internet being possible, and anonymisation of population data also, I do not understand how such data can be left dormant as long.. I hope to be mistaken..

It would be a good place to start answering such questions about the associations between such an averaging statistics as competitive pairing pool based ratings (the words matter here), and well defined chess challenges that can be measured and reproduced. A chess skill set representation via a well chosen set of positions.. (but again, how is the challenge response success measured)....

The typical thing, is to hide an engine very deep in some construction of move quality measures, as the referential. but then see above.
Am I correct in believing that Lichess doesn't use the Elo system to determine rating points?
@RadRuss said in #18:
> Am I correct in believing that Lichess doesn't use the Elo system to determine rating points?

glicko 2. FAQ of lichess has a rating system article.. linking to more detailed explanations too.
this is more adaptive to dynamics of pairings in a general pool on online players. It has interaction between the average rating estimate and some model of Deviation time dependency between game events.. (it is per game event, normally, but there are performance ratings too for tournaments.. but i might be wrong about that part...).
<Comment deleted by user>