Great work Ed and Richard!
Some very good points by Afmob. I agree that SS scores shouldn't go in the rating sticky because they depend on the version of SS. BTW the original rating sticky for A50 to A74 said that a few puzzles couldn't be solved by SS, stating how many steps were done. I wonder if those entries are still valid. I'm not suggesting that statements of how many steps were done should be removed, just updated if appropriate since we know that SS has been significantly improved since that sticky was first posted.
However there is still a valid place for SS scores in messages like Ed's one in
this thread, where they are given in the context of a particular version of SS.
Is it really true that the only acceptable chains are AIC and that any others are T&E? Mike has made the point in previous discussions that AIC don't alter the grid state whereas some other steps that start with a "what if" assumption do change the state as they progress through a chain. That is clearly a valid distinction. All chains, whether AIC, contradiction chains, forcing chains or pure T&E must start with a "what if" assumption.
Then there is methodical combination/permutation analysis. Sometimes in my walkthroughs I present this as a series of sub-steps so it may at first sight appear to look like T&E. Others will just give a long string of combinations and permutations which don't give that same impression. How it contributes to the rating will clearly depend on the level of the analysis. In most cases a puzzle needing methodical combination/permutation analysis will be at least a Hard 1.25 and more likely a 1.5; some will rate higher than that if really heavy analysis or repeated analysis is required.
Ed asked us to comment on ones that are Just In or are Out. Of course that depends on one's interpretation of the range covered by each rating level. I must admit I was surprised when, in discussion with Ed, I learned that he considers the rating 1.25 to cover scores from 1.25 to 1.49; I subsequently learned that Para has the same interpretation. I had interpreted 1.25 to cover roughly 1.15 to 1.35 and still take that view. On that basis some of the ones that Ed scores In are, in my view out, and vice versa. The difference of interpretation only matters when comparing human ratings with SS scores which clearly must follow Ed's interpretation. Among humans the difference doesn't matter. If I rate a puzzle as High 1.25 meaning 1.35 and Ed gives the same rating meaning 1.45, we both know what we mean by a Hard 1.25; one that isn't quite hard enough for a 1.5 rating.
Even on Ed's interpretation there are several that are Just In, ranging from 0.21 to 0.24 above the Sticky rating but there is one that is 0.25 above so is listed as OUT.
Maybe it would be more realistic, IMHO, to consider that SS scores ought only to be quoted to the nearest 0.05, particularly since Ed mentions that puzzle rotation is used with sometimes different scores for the various orientations. I realise there would be a downside to this since any that are currently Just In at 0.23 and 0.24 would become OUT at 0.25.
There were a few cases where I was amazed at the differences.
A71 (Full Border) was one that Mike and I did as a "tag", mostly Mike with a few steps from me in the middle. While it was a relatively easy puzzle for a "tag" solution, I'm completely surprised that SS rated it as low as 1.11.
Mav 4 is scored at 2.57 by SS. Having solved that puzzle, although I haven't yet gone through posted walkthroughs or posted my one, I don't see how it can be that high. It took a bit of methodical combination/permutation analysis but not particularly heavy work.
A85 (Original Version), in which I participated a little in the "tag" solution was a real brute. I don't understand how SS rates it as 1.94 but at the same time T&E is stated. If a software solver needs to use T&E how can the rating be that low?
Discrepancies for the very highest rated ones are hardly surprising. I think Afmob was suggesting that we shouldn't be concerned about that, which I agree with.
Finally Ed's NOTE 4. "We may need to have a discussion about adapting Mike's original rating definitions now that we have some firm numbers to work with". Sorry but I must disagree with that. After all the discussions and work that has been put in, surely it's too late for that unless some minor clarification changes are being suggested. Some existing ratings for individual puzzles in the sticky may need to be changed; that suggestion has been around for some time. I know for example that several people think that Mav 1 should be downgraded from 1.75 to 1.5. If that happens I'll accept the majority view but it won't change my personal view of that puzzle
.