(Edited 18 Apr 2007 to improve wording of hypotheses and add another example.)
Mike,
Please don't feel you need to excuse yourself for any delays in responding. I'm usually not timely myself even in more limited discussions. Anyway, I have given all this some more thought in the interim.
After reviewing everything to date, I find that I still didn't quite state things in the most general way, and that some more examples would be useful. So here I will attempt to consolidate and make some minor adjustments on everything stated so far, and also provide several hypothetical examples.
*************************************************************************************
Generalized Two Sector Sue de Coq
Suppose, for some box and some line intersecting that box, that A is some set of unsolved cells in the box, B is some set of unsolved cells in the line, and C is a set of two or three unsolved cells in the line/box intersection, such that:
- A, B, and C are pairwise disjoint (no two share a common cell).
A and B have no common candidate digits.
The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C.
Then, letting T denote the set of all cells in the pattern, i.e., T = A union B union C:
- All candidate digits in T which are not candidates in B can be eliminated from any cell in the box which is in neither A nor C.
All candidate digits in T which are not candidates in A can be eliminated from any cell in the line which is in neither B nor C.
Note: For a "classic" Sue de Coq pattern, an additional constraint is added -- that all candidate digits in A, and all candidate digits in B, are also candidates in C. However, this constraint is not necessary and the same eliminations follow without it. I suspect that in practice that the vast majority of generalized patterns will in fact be classic patterns. However, I'm wondering if any solver programs have been coded to recognize the generalized pattern described here.
*************************************************************************************
The following diagram is intended to facilitate discussion, not to be physically suggestive of the Sue de Coq pattern.
Code: Select all
.--------.--------.
| | |
| A | A' |
| | |
'--------'--------+--------. A, A', C are all in the box
| |
| C |
| |
'--------'
| |
| B' | B, B', C are all in the line
| |
'--------'
| |
| B | T is the set A union B union C
| |
'--------'
A' is the set of all unsolved cells in the box which are in neither A nor C, and similarly B' is the set of all unsolved cells in the line which are in neither B nor C. In an actual Sudoku grid, the cells in any of these sets do not need to be physically contiguous or connected. Also, if there are three unsolved cells in the line/box intersection and C contains only two of them, then the remaining cell of the line/box intersection will be in both A' and B'.
In terms of the diagram, the conclusions are that all digits appearing in T (A union B union C) which are not candidates in B can be removed from A', and that all digits in T which are not candidates in A can be removed from B'.
My assumption has been that the argument from the Sue de Coq perspective is something like this:
By hypothesis, if there are N total cells in T, then in total there are N distinct candidate digits in T. None of these digits can be placed twice in T: certainly no digit can be placed twice in the box, and similarly no digit can be placed twice in the line; and, since by hypothesis A and B have no common candidates, no digit can appear in both A and B. Thus, in order for all N cells of T to be filled, each of the N digits must appear (exactly once) in it.
Each candidate digit of T which is not a candidate in B must reside somewhere in A or C, and thus can be eliminated from all cells in A'. Likewise, each candidate digit of T which is not a candidate in A must reside somewhere in B or C, and thus can be eliminated from all cells in B'.
However, upon a careful rereading of the original Sue de Coq thread in the Sudoku Players Forum,
(here), I see that the argument used there is not along the line of reasoning presented here, but in fact resembles the Sudopedia argument. I perhaps should back off from my claim earlier in this thread that the Sudopedia presentation is an Almost Locked Candidates argument, although it does have much of that flavor.
I believe that it was after I read the statement in the Sudopedia that the Sue de Coq technique is a special case of subset counting, that (after a bit of thought) I jumped to the conclusion that the entire pattern was the subset being counted. (This view was recently reinforced when I read the second external reference link given on the Sudopedia page
(here). The author, Andries E. Brouwer, explains the Sue de Coq eliminations with an argument similar to the one given above -- a subset counting argument based on the set of cells in the entire pattern. That reference also states some general subset counting principles which are useful.) What we see in the Sudopedia, and in the Sudoku Player's Forum thread, is really a subset counting argument for the set A union C (and similarly for B union C). I don't feel that the argument from this viewpoint is as easily understood as the reasoning presented above, nor is it clearly expressed in general terms in either source. At the risk of being long-winded, I will attempt to elaborate the argument from that viewpoint, since some interesting insights can be gained along the way. (Some readers may prefer to skip to the "Examples" section below.)
To this end, some symbology and definitions (some of which have appeared earlier in this thread) will be useful:
Definitions
If S is any set of cells (or widgets, etc.), then |S| denotes the number of elements in S.
If S denotes a set of cells, then D(S) denotes the set of all candidate digits in S.
If a set of N cells in some house contains N + k candidate digits, then the set will be said to be an Almost ... Almost Locked Set (A...ALS) with k degrees of freedom.
So k is just the candidate digit count minus the cell count.
If S is an A...ALS with k degrees of freedom, then
For k = 0, we have a locked (or naked) set; for k = 1, we have an Almost Locked Set; and for k = 2, we have an Almost Almost Locked Set, etc.
If we agree that two sets of cells "see" each other if the two sets are disjoint, and all cells of the two sets reside in some common house, then we have the following self-evident principle:
- If A is an A...ALS with k degrees of freedom, and S is any set of cells which sees A, then S can contain at most k digits of D(A).
*************************************************************************************
Now suppose that A, B, and C are sets of cells satisfying the generalized two sector hypotheses, and again let T denote the combined set of cells: T = A union B union C.
First we note that from the hypothesis that sets A, B, and C are pairwise disjoint, the number of cells in T is |A| + |B| + |C|.
Now let kA and kB denote the number of degrees of freedom of sets A and B, respectively.
Since A and B have no common candidates, the total set of candidate digits in T can be broken down into three mutually exclusive sets:
- Those digits which appear as candidates in A (i.e., those in D(A)). There are |A| + kA such digits.
Those digits which appear as candidates in B. There are |B| + kB such digits.
Those digits which appear as candidates only in C (in neither of A or B). I'll call these digits "C only" candidates. This set may be empty. Let nCO denote the number of such digits.
From the hypothesis that the total number of cells in T equals the total number of candidate digits appearing in T, we have the relation
- |A| + |B| + |C| = |A| + kA + |B| + kB + nCO
which reduces to
**********************************************************************************
Aside: As Mike pointed out earlier in the thread, to make things interesting, we can assume that both kA and kB are at least one. (If either of A or B is a naked subset, there are corresponding eliminations in its house, independent of the other sets in the pattern.)
Now since |C| is 2 or 3, and for "interesting" patterns we have kA, kB >= 1, the above relation shows that there's not much leeway in the configuration:
- If |C| = 2, we must have kA = kB = 1, nCO = 0.
If nCO > 0, we must have |C| = 3, kA = kB = nCO = 1.
So nCO is 0 or 1 -- there is at most one "C only" digit.
We also note that at least one of the sets A, B must be an Almost Locked Set. The other can be an Almost Almost Locked Set if |C| = 3 and nCO = 0.
So it's not surprising that many Sue de Coq configurations can be entirely understood in terms of Almost Locked Set arguments.
*********************************************************************************
Diagram repeated for convenience:
Code: Select all
.--------.--------.
| | |
| A | A' |
| | |
'--------'--------+--------. A, A', C are all in the box
| |
| C |
| |
'--------'
| |
| B' | B, B', C are all in the line
| |
'--------'
| |
| B | T is the set A union B union C
| |
'--------'
Now consider the set C. Again we can break down the candidate digits in C into three mutually exclusive sets -- those in D(A), those in D(B), and the "C only" digit, if any. Since C sees set A (in the box), which has kA degrees of freedom, C can contain at most kA digits of D(A). Likewise, C can contain at most kB digits of D(B). Finally, C can contain at most nCO of the "C only" candidate digits (because that's all there are).
So C can contain at most
but we've shown above that C has exactly that many cells.
So, in order for all cells of C to be filled, C must contain (some set of) kA digits of D(A), (some set of) kB digits of D(B), and the "C only" digit, if any.
In the box, the |A| cells of set A will combine with the kA + nCO cells of C which contain the kA digits of D(A), and the "C only" digit (if any), to form a locked set -- |A| + kA + nCO cells with |A| + kA + nCO digits. So, these digits can be removed as candidates from all cells in A'. Note that this set of digits contains exactly those digits in T which are not in B.
Of course, everything in this discussion is symmetrical with respect to the box and the line so we similarly derive the conclusion that those digits in T which are not in A can be removed from all cells in B'.
***************************************************************************
Examples
Note: In the following diagrams, all eliminated digits are shown after the "-" sign.
In this classic 4-cell Sue de Coq pattern, |C| = 2, and the remaining cell in the line box intersection benefits from both the eliminations in the box and in the line.
Code: Select all
.----------------------.-----------------.------------------.
row 1 | A12 *-12 *-12 | . . . | . . . |
row 2 | *-12 *-12 *-12 | . . . | . . . |
row 3 | *-1234 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.----------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 34
This 5-cell example is not a classic Sue de Coq pattern, since digit 5 is in A but not in C. Again we have |C| = 2, and the remaining cell of the line/box intersection benefits from both the eliminations in the box and in the line.
Code: Select all
.-----------------------.-----------------.------------------.
row 1 | A125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | *-12345 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.-----------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1|r2c2 2 cells with digits 125
Set B = r3c4, 1 cell with digits 34
This 5-cell pattern shows that if |C| = 2, it is possible that one of the sets A or B can contain the remaining cell in the line/box intersection:
Code: Select all
.---------------------.-----------------.------------------.
row 1 | *-125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | A125 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.---------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r2c2|r3c1, 2 cells with digits 125
Set B = r3c4, 1 cell with digits 34
In this 6-cell pattern, |C| = 2 and each of sets A and B contains a digit not in C.
Code: Select all
.------------------------.-------------------.---------------------.
row 1 | A125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | *-123456 C1234 C1234 | B346 *-346 *-346| B346 *-346 *-346 |
.------------------------.-------------------.---------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1|r2c2, 2 cells with digits 125
Set B = r3c47, 2 cells with digits 346
For completeness, this is a 5-cell classic Sue de Coq pattern with a "C only" digit (namely, 5):
Code: Select all
.------------------------.-------------------.---------------------.
row 1 | A12 *-125 *-125 | . . . | . . . |
row 2 | *-125 *-125 *-125 | . . . | . . . |
row 3 | C12345 C12345 C12345 | B34 *-345 *-345 | *-345 *-345 *-345 |
.------------------------.-------------------.---------------------.
Set C = r3c123, 3 cells with digits 12345
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 34
Clearly, this last example could be extended to a 6-cell non-classic pattern if B consisted of two cells in the line, say r3c47, containing digits 3,4, and 6, for example.
This is a 5-cell pattern with set A an Almost Locked Set and set B an Almost Almost Locked Set.
Code: Select all
.------------------------.--------------------.---------------------.
row 1 | A12 *-12 *-12 | . . . | . . . |
row 2 | *-12 *-12 *-12 | . . . | . . . |
row 3 | C12345 C12345 C12345 | B345 *-345 *-345 | *-345 *-345 *-345 |
.------------------------.--------------------.---------------------.
Set C = r3c123, 3 cells with digits 12345
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 345