Sue de Coq Help Needed
Sue de Coq Help Needed
All,
I am hoping someone can help me in understanding the criteria for the subject technique. Please use the Sudopedia example for this technique in any replies.
The following is a quote from the Sudopedia: "The Sue de Coq technique uses two intersecting sets A and B, where A is a set of N cells with N candidates in a line, B is a set of N cells with N candidates in a box, and the sets A - B and B - A have no common candidates. Candidates can be eliminated from the cells in the line that are not in A, and the cells in the box that are not in B."
My questions are:
a. Please list the members of set A and B. I think they are:
A={1, 2, 4, 5, 8, 9} and B={1, 2, 5, 6, 8, 9}
b. I presume N equals 6 in this example. I presume since N is used in the above quote for both line and box, that N has to be the same for both the line and box. In other words, do the unsolved number of cells in the line and in the box have to be the same?
c. What is A-B and B-A? I think the answer is 4 and 6 respectively. I am puzzled how you could have common candidates.
d. The last sentence totally puzzles me. In the line, 1, 2, and 9 are eliminated, but aren't they in Set A? Likewise, 1 is eliminated in the box, but 1 is in B.
Anyway, any help, using the example in the Suopedia and/or additional examples, will be most appreciated. I guess I must have missed the chapter on sets or just totally have forgotten the concept.
Once I have an explanation I can grasp, then the next trick will be how to easily identify this technique.
Regards,
George
I am hoping someone can help me in understanding the criteria for the subject technique. Please use the Sudopedia example for this technique in any replies.
The following is a quote from the Sudopedia: "The Sue de Coq technique uses two intersecting sets A and B, where A is a set of N cells with N candidates in a line, B is a set of N cells with N candidates in a box, and the sets A - B and B - A have no common candidates. Candidates can be eliminated from the cells in the line that are not in A, and the cells in the box that are not in B."
My questions are:
a. Please list the members of set A and B. I think they are:
A={1, 2, 4, 5, 8, 9} and B={1, 2, 5, 6, 8, 9}
b. I presume N equals 6 in this example. I presume since N is used in the above quote for both line and box, that N has to be the same for both the line and box. In other words, do the unsolved number of cells in the line and in the box have to be the same?
c. What is A-B and B-A? I think the answer is 4 and 6 respectively. I am puzzled how you could have common candidates.
d. The last sentence totally puzzles me. In the line, 1, 2, and 9 are eliminated, but aren't they in Set A? Likewise, 1 is eliminated in the box, but 1 is in B.
Anyway, any help, using the example in the Suopedia and/or additional examples, will be most appreciated. I guess I must have missed the chapter on sets or just totally have forgotten the concept.
Once I have an explanation I can grasp, then the next trick will be how to easily identify this technique.
Regards,
George
Sue de Coq Hypotheses
George,
I commented on some problems with the Sue de Coq page in the Sudopedia in my post in this thread, although I didn't complete my thoughts on how everything should read and how the technique can be explained. So I will attempt to do so here (you don't need to consult that thread).
First of all, just a comment on the notation S - T, where S and T are two sets of cells (or widgets, or whatever). This denotes the set of cells (or whatever) which are in S but not in T.
As you've noticed, Sue de Coq in its simplest form is sometimes referred to as the "Two Sector Disjoint Subsets" technique. The usage of the term "disjoint" here is probably behind some of your confusion. In the presentation in the Sudopedia, A and B are sets of cells. Now A - B and B - A are the sectors here, but we always have (tautologically) that A - B and B - A are disjoint, so that's not what is really meant here. What is meant is that the set of candidate digits in A - B is disjoint from the set of candidate digits in B - A, or alternatively, A - B and B - A have no common candidates.
It's not absolutely necessary, but I think things will be clearer if some simple notation is used to avoid this confusion. Here I'll adopt this notation:
*************************************************************
Suppose that A is a set of cells in a box, and B is a set of cells in a line which intersects that box, and that (A intersect B) contains 2 or 3 cells from the line box/intersection; furthermore, suppose that D(A - B) is disjoint from D(B - A), that both D(A - B) and D(B - A) are subsets of D(A intersect B), and that the number of cells in (A union B) equals the number of digits in D(A union B). Then for any digit d in D(A union B), d can be removed as a candidate from any cell outside of (A union B) which sees all cells with candidate d in (A union B). This means that:
For any digit d in D(A - B), d can be removed as a candidate from any cell in the box which is not in A.
Similarly, for any digit d in D(B - A), d can be removed as a candidate from any cell in the line which is not in B.
Finally, for any digit d which appears in (A intersect B) only, that is, any d in D(A intersect B) - D(A - B) - D(B - A), d may be removed as a candidate from any cell in the line or the box which is not in (A union B). Note that in many Sue de Coq positions, this set of digits will be empty, in which case this conclusion is not applicable.
*************************************************************
Below is the relevant portion of the grid in the Sudopedia example, with sets A and B identified by corresponding prefixes to the cells. Just a note on a notational convention -- I'm listing all eliminated digits after the "-" sign.
Set A is r1c7, r3c789.
Set B is r3c2, r3c789.
Set A - B is r1c7, with D(A - B) = {5,8}
Set B - A is r3c2, with D(B - A) = {2,9}. Note that D(B - A) and D(A - B) are disjoint.
Set (A intersect B) is r3c789, with D(A intersect B) = {1,2,5,8,9}. Note that D(B - A) and D(A - B) are subsets of this set.
Set (A union B) contains 5 cells and D(A union B) contains 5 digits, {1,2,5,8,9}.
Then the first conclusion tells us that 5 and 8 (the digits in D(A - B)) can be removed from any cell in box 3 which is not in A. In this case, the first conclusion yields nothing.
The second conclusion tells us that 2 and 9 (the digits in D(B - A)) can be removed from any cell in row 3 which is not in B -- thus the eliminations of 2 and 9 shown in r3c45.
The third conclusion does apply in this case. Digit 1 appears in the intersection only; that is, "1" is in D(A intersect B), but in neither of D(A - B) nor D(B - A). So the conclusion tells us that 1 can be removed from all cells in the box and the line which are not in (A union B) -- thus, the eliminations of 1 shown in r2c89 and r3c4.
It's possible to understand all of this from the reasoning given in the Sudopedia, but that's not really looking at the position from the Sue de Coq perspective. By hypothesis we have the same number of cells in the entire structure (A union B) as the number of candidates in the entire structure -- D(A union B). It follows from our hypotheses that no digit d can be placed twice in the structure. Why is this true? Well, digit d certainly cannot appear twice in box 3, nor twice in row 3; thus, the only conceivable way to place d twice in the entire structure would be for d to appear once in A - B and once in B - A. However, this would be contrary to the hypothesis that D(A - B) and D(B - A) are disjoint.
Thus, in order to fill all 5 cells of the entire structure (A union B), each digit d in D(A union B) -- which contains 5 digits -- must appear exactly once in the structure. Thus, if any cell outside of (A union B) sees all d candidates in (A union B), it has to see digit d somewhere and thus cannot itself contain d. From this the three conclusions above follow, although you can simply work from this principle alone if you prefer.
To answer one of your direct questions, A and B do not have to be the same size.
Another observation: If (A intersect B) contains 2 cells, then the remaining cell in the line/box intersection may be the target for both of the eliminations in the box and the eliminations in the line (the first and second conclusions).
The hypotheses for the Sue de Coq eliminations are a bit restrictive, but they do produce a nice symmetry between the eliminations in the line and the box. There may be some "almost" Sue de Coq positions from which eliminations of multiple digits can be made using similar counting arguments, but the symmetry between the line and box eliminations is lost. If you like, I can continue on with an example, but for the moment I'll stop here and let you absorb the strictly Sue de Coq positions.
I commented on some problems with the Sue de Coq page in the Sudopedia in my post in this thread, although I didn't complete my thoughts on how everything should read and how the technique can be explained. So I will attempt to do so here (you don't need to consult that thread).
First of all, just a comment on the notation S - T, where S and T are two sets of cells (or widgets, or whatever). This denotes the set of cells (or whatever) which are in S but not in T.
As you've noticed, Sue de Coq in its simplest form is sometimes referred to as the "Two Sector Disjoint Subsets" technique. The usage of the term "disjoint" here is probably behind some of your confusion. In the presentation in the Sudopedia, A and B are sets of cells. Now A - B and B - A are the sectors here, but we always have (tautologically) that A - B and B - A are disjoint, so that's not what is really meant here. What is meant is that the set of candidate digits in A - B is disjoint from the set of candidate digits in B - A, or alternatively, A - B and B - A have no common candidates.
It's not absolutely necessary, but I think things will be clearer if some simple notation is used to avoid this confusion. Here I'll adopt this notation:
- If A is a set of cells, then D(A) denotes the set of all candidate digits in A.
*************************************************************
Suppose that A is a set of cells in a box, and B is a set of cells in a line which intersects that box, and that (A intersect B) contains 2 or 3 cells from the line box/intersection; furthermore, suppose that D(A - B) is disjoint from D(B - A), that both D(A - B) and D(B - A) are subsets of D(A intersect B), and that the number of cells in (A union B) equals the number of digits in D(A union B). Then for any digit d in D(A union B), d can be removed as a candidate from any cell outside of (A union B) which sees all cells with candidate d in (A union B). This means that:
For any digit d in D(A - B), d can be removed as a candidate from any cell in the box which is not in A.
Similarly, for any digit d in D(B - A), d can be removed as a candidate from any cell in the line which is not in B.
Finally, for any digit d which appears in (A intersect B) only, that is, any d in D(A intersect B) - D(A - B) - D(B - A), d may be removed as a candidate from any cell in the line or the box which is not in (A union B). Note that in many Sue de Coq positions, this set of digits will be empty, in which case this conclusion is not applicable.
*************************************************************
Below is the relevant portion of the grid in the Sudopedia example, with sets A and B identified by corresponding prefixes to the cells. Just a note on a notational convention -- I'm listing all eliminated digits after the "-" sign.
Code: Select all
.------------------.---------------------.-----------------------.
row 1 | 1 29 7 | 259 6 28 | A58 4 3 |
row 2 | 5 8 4 | 129 3 12 | 7 69-1 126-1 |
row 3 | 3 B29 6 | 45-129 48-29 7 | AB158 AB159 AB125 |
.------------------.---------------------.-----------------------.
Set A is r1c7, r3c789.
Set B is r3c2, r3c789.
Set A - B is r1c7, with D(A - B) = {5,8}
Set B - A is r3c2, with D(B - A) = {2,9}. Note that D(B - A) and D(A - B) are disjoint.
Set (A intersect B) is r3c789, with D(A intersect B) = {1,2,5,8,9}. Note that D(B - A) and D(A - B) are subsets of this set.
Set (A union B) contains 5 cells and D(A union B) contains 5 digits, {1,2,5,8,9}.
Then the first conclusion tells us that 5 and 8 (the digits in D(A - B)) can be removed from any cell in box 3 which is not in A. In this case, the first conclusion yields nothing.
The second conclusion tells us that 2 and 9 (the digits in D(B - A)) can be removed from any cell in row 3 which is not in B -- thus the eliminations of 2 and 9 shown in r3c45.
The third conclusion does apply in this case. Digit 1 appears in the intersection only; that is, "1" is in D(A intersect B), but in neither of D(A - B) nor D(B - A). So the conclusion tells us that 1 can be removed from all cells in the box and the line which are not in (A union B) -- thus, the eliminations of 1 shown in r2c89 and r3c4.
It's possible to understand all of this from the reasoning given in the Sudopedia, but that's not really looking at the position from the Sue de Coq perspective. By hypothesis we have the same number of cells in the entire structure (A union B) as the number of candidates in the entire structure -- D(A union B). It follows from our hypotheses that no digit d can be placed twice in the structure. Why is this true? Well, digit d certainly cannot appear twice in box 3, nor twice in row 3; thus, the only conceivable way to place d twice in the entire structure would be for d to appear once in A - B and once in B - A. However, this would be contrary to the hypothesis that D(A - B) and D(B - A) are disjoint.
Thus, in order to fill all 5 cells of the entire structure (A union B), each digit d in D(A union B) -- which contains 5 digits -- must appear exactly once in the structure. Thus, if any cell outside of (A union B) sees all d candidates in (A union B), it has to see digit d somewhere and thus cannot itself contain d. From this the three conclusions above follow, although you can simply work from this principle alone if you prefer.
To answer one of your direct questions, A and B do not have to be the same size.
Another observation: If (A intersect B) contains 2 cells, then the remaining cell in the line/box intersection may be the target for both of the eliminations in the box and the eliminations in the line (the first and second conclusions).
The hypotheses for the Sue de Coq eliminations are a bit restrictive, but they do produce a nice symmetry between the eliminations in the line and the box. There may be some "almost" Sue de Coq positions from which eliminations of multiple digits can be made using similar counting arguments, but the symmetry between the line and box eliminations is lost. If you like, I can continue on with an example, but for the moment I'll stop here and let you absorb the strictly Sue de Coq positions.
Great answer, Ron!
Nevertheless, it has to be said that the Sudopedia definition is just plain wrong! Somebody should change it. I also didn't understand the Sudopedia definition, and only started to understand the technique after consulting other sources, which defeats the whole purpose of a centralized knowledge base, which is what Sudopedia is aspiring to be.
I also don't like the fact the Sudopedia definition ends with the sentence "Candidates can be eliminated from the cells...", without being explicit about which candidates can be eliminated. This information should be part of any formal definition.
Also confusing is the fact that the explanation of the example does not seem to bear any resemblance to the formal definition. This encourages people to ignore the confusing definition, and latch onto the concrete example instead. Unfortunately, there is only one example provided, which also happens to correspond to the same pattern used in the original sources and most other solving guides (i.e., three common cells, and a single bivalue cell in each of A - B and B - A). This encourages people to think that once they understand this one pattern, they have completely understood the principle, which is clearly not the case.
Nevertheless, it has to be said that the Sudopedia definition is just plain wrong! Somebody should change it. I also didn't understand the Sudopedia definition, and only started to understand the technique after consulting other sources, which defeats the whole purpose of a centralized knowledge base, which is what Sudopedia is aspiring to be.
I also don't like the fact the Sudopedia definition ends with the sentence "Candidates can be eliminated from the cells...", without being explicit about which candidates can be eliminated. This information should be part of any formal definition.
Also confusing is the fact that the explanation of the example does not seem to bear any resemblance to the formal definition. This encourages people to ignore the confusing definition, and latch onto the concrete example instead. Unfortunately, there is only one example provided, which also happens to correspond to the same pattern used in the original sources and most other solving guides (i.e., three common cells, and a single bivalue cell in each of A - B and B - A). This encourages people to think that once they understand this one pattern, they have completely understood the principle, which is clearly not the case.
Cheers,
Mike
Mike
Alternate Formulation of Sue de Coq Technique
George and Mike,
I formulated my first post in terms of two intersecting sets of cells A and B, as I was trying to maintain some similarity with the presentation in the Sudopedia. However, as Mike points out, there is really little which can be salvaged from that page, so I wish I hadn't limited myself that way. It's a bit cleaner (especially with my inability to use some special characters) to present this in terms of three disjoint sets of cells, as was done on the Sudoku Players Forum, in the thread in which the "Sue de Coq" pattern was first presented (here). So George, I'm sorry if you've been struggling to understand my first post, but I'm going to shift gears and present things slightly differently. I think it will be easier in the long run. None of this is original to me -- it's just my understanding of the thread in the Sudoku Players Forum. However, I don't really see any one easily identifiable place in that thread where the Sue de Coq constraints and eliminations are summarized. This may be why there seems to be some confusion over exactly what "Sue de Coq" is.
********************************************************************************
Suppose, for some box and some line intersecting that box, that C is a set of two or three cells in the line/box intersection, that A is a set of cells in the box, but disjoint from the line/box intersection, and that B is a set of cells in the line, but disjoint from the line/box intersection, such that:
Of course, the counting argument that was used in the first formulation is still applicable here. From the hypothesis that A and B have no common candidates, we conclude that each of the candidate digits in C must placed (exactly once) somewhere in the entire structure, A union B union C, and from this the three conclusions follow.
As a second example, consider the position below, which is reached after initial basic techniques are used in the 19 September 2006 Daily Nightmare. By way of explanation, when I first worked on this puzzle, I was trying to learn the ALS XZ technique, and in my eagerness to use this, I overlooked a simpler way to proceed. Please ignore any simpler approaches you may see.
Actually, something good came from my oversight. For those readers familiar with Almost Locked Sets (ALS's), I observed that the sets
In the position (repeated below), sets A, B, and C in the new Sue de Coq formulation are identified by corresponding prefixes to the cells.
In column 6, the digits 3 and 6 appear only in r5c6, and in the line/box intersection, r79c6. Digits 3 and 6 cannot both be placed in the line/box intersection, as that would conflict with the bivalued "36" cell in r8c4. In column 6, digits 3 and 6 cannot both lie outside of the line box intersection, since there is only one cell available to hold them outside of the line/box intersection, namely r5c6. Therefore, one of 3, 6 must appear in r5c6 (so that r5c6 can't be 1) and the other must appear in one of r79c6. Whichever it is will pair up with r8c4 to eliminate 3's and 6's in other cells of the box.
If you've followed this then you've probably realized that the reasoning presented in the Sudopedia example is really using the ALC viewpoint rather than the Sue de Coq viewpoint.
Anyway, we've seen that there are many ways to skin this cat. I'd be interested to see a real life Sue de Coq position which can't be considered as a "2 ALS's with two restricted common digits" position. (I don't mean to suggest that Sue de Coq is not useful. Players not familiar with ALS's can still use Sue de Coq to advantage.)
I formulated my first post in terms of two intersecting sets of cells A and B, as I was trying to maintain some similarity with the presentation in the Sudopedia. However, as Mike points out, there is really little which can be salvaged from that page, so I wish I hadn't limited myself that way. It's a bit cleaner (especially with my inability to use some special characters) to present this in terms of three disjoint sets of cells, as was done on the Sudoku Players Forum, in the thread in which the "Sue de Coq" pattern was first presented (here). So George, I'm sorry if you've been struggling to understand my first post, but I'm going to shift gears and present things slightly differently. I think it will be easier in the long run. None of this is original to me -- it's just my understanding of the thread in the Sudoku Players Forum. However, I don't really see any one easily identifiable place in that thread where the Sue de Coq constraints and eliminations are summarized. This may be why there seems to be some confusion over exactly what "Sue de Coq" is.
********************************************************************************
Suppose, for some box and some line intersecting that box, that C is a set of two or three cells in the line/box intersection, that A is a set of cells in the box, but disjoint from the line/box intersection, and that B is a set of cells in the line, but disjoint from the line/box intersection, such that:
- A and B have no common candidate digits.
All candidate digits in A, and all candidate digits in B, are candidates in C.
The total number of cells in A, B, and C equals the number of distinct candidate digits in C.
- All candidate digits in A can be eliminated from any cell in the box which is not in either of A or C.
All candidate digits in B can be eliminated from any cell in the line which is not in either of B or C.
If C contains some candidate d which does not appear as a candidate in either of A or B, then d can be eliminated from any cell in the line or the box which is not in C.
Of course, the counting argument that was used in the first formulation is still applicable here. From the hypothesis that A and B have no common candidates, we conclude that each of the candidate digits in C must placed (exactly once) somewhere in the entire structure, A union B union C, and from this the three conclusions follow.
As a second example, consider the position below, which is reached after initial basic techniques are used in the 19 September 2006 Daily Nightmare. By way of explanation, when I first worked on this puzzle, I was trying to learn the ALS XZ technique, and in my eagerness to use this, I overlooked a simpler way to proceed. Please ignore any simpler approaches you may see.
Code: Select all
.---------------------.----------------------.---------------------.
| 2 1 3 | 489 49 #48 | 57 57 6 |
| 7 5 9 | 16 16 2 | 8 34 34 |
| 6 8 4 | 7 3 5 | 9 2 1 |
:---------------------+----------------------+---------------------:
| 3 9 1 | 456 46 7 | 2 456 8 |
| 8 4 2 | 13569 169 136 | 567 1567 57 |
| 5 6 7 | 2 8 #14 | 3 149 49 |
:---------------------+----------------------+---------------------:
| 9 237 56 | 1368 27 #1368 | 4 35678 2357 |
| 4 237 8 | *36 5 9 | 1 367 237 |
| 1 237 56 | 3468 27 #3468 | 567 356789 23579 |
'---------------------'----------------------'---------------------'
- r8c4 -- 1 cell with 2 candidates, {3,6}
- r1679c6 -- 4 cells with 5 candidates, {1,3,4,6,8}
In the position (repeated below), sets A, B, and C in the new Sue de Coq formulation are identified by corresponding prefixes to the cells.
Code: Select all
.---------------------.----------------------.---------------------.
| 2 1 3 | 489 49 B48 | 57 57 6 |
| 7 5 9 | 16 16 2 | 8 34 34 |
| 6 8 4 | 7 3 5 | 9 2 1 |
:---------------------+----------------------+---------------------:
| 3 9 1 | 456 46 7 | 2 456 8 |
| 8 4 2 | 13569 169 36-1 | 567 1567 57 |
| 5 6 7 | 2 8 B14 | 3 149 49 |
:---------------------+----------------------+---------------------:
| 9 237 56 | 18-36 27 C1368 | 4 35678 2357 |
| 4 237 8 | A36 5 9 | 1 367 237 |
| 1 237 56 | 48-36 27 C3468 | 567 356789 23579 |
'---------------------'----------------------'---------------------'
- set A = r8c4, with candidates {3,6}
set B = r16c6, with candidates {1,4,8}
set C = r79c6, with candidates {1,3,4,6,8}
- A and B have no common candidates.
All candidates in A, and all candidates in B, are candidates in C.
Together, sets A, B, and C contain 5 cells, and 5 distinct candidate digits.
- All candidate digits in A can be removed from any cell in the box which is not in either of A or C -- thus the eliminations of 3 and 6 in r79c4.
All candidate digits in B can be removed from any cell in the line which is not in either of B or C -- thus the elimination of 1 shown in r5c6.
In this case, there are no candidates which appear only in C.
Code: Select all
.---------------------.----------------------.---------------------.
| 2 1 3 | 489 49 48 | 57 57 6 |
| 7 5 9 | 16 16 2 | 8 34 34 |
| 6 8 4 | 7 3 5 | 9 2 1 |
:---------------------+----------------------+---------------------:
| 3 9 1 | 456 46 7 | 2 456 8 |
| 8 4 2 | 13569 169 *36-1 | 567 1567 57 |
| 5 6 7 | 2 8 14 | 3 149 49 |
:---------------------+----------------------+---------------------:
| 9 237 56 | 18-36 27 *1368 | 4 35678 2357 |
| 4 237 8 | *36 5 9 | 1 367 237 |
| 1 237 56 | 48-36 27 *3468 | 567 356789 23579 |
'---------------------'----------------------'---------------------'
If you've followed this then you've probably realized that the reasoning presented in the Sudopedia example is really using the ALC viewpoint rather than the Sue de Coq viewpoint.
Anyway, we've seen that there are many ways to skin this cat. I'd be interested to see a real life Sue de Coq position which can't be considered as a "2 ALS's with two restricted common digits" position. (I don't mean to suggest that Sue de Coq is not useful. Players not familiar with ALS's can still use Sue de Coq to advantage.)
Ron,
Many thanks for another impressive post, and for taking the time to explain this so thoroughly. I certainly find your reasoning significantly easier to follow than the original Sue de coq thread!
One point I didn't realize up to now is the fact that the intersection of the two sets (C in your example) must consist of at least 2 cells. But this has now become clear. To avoid being a naked subset, the disjoint sets A and B must each have at least one excess candidate (compared to the number of cells of which they consist). Because of the condition that (A + B + C) must consist of N candidates distributed across N cells, these excess candidates from A and B must be "absorbed" by the intersection C, which must therefore consist of at least 2 cells.
What I'm less sure about is your comment that:
"All candidate digits in A, and all candidate digits in B, are candidates in C."
By similar logic to that applied above, I can see that this must indeed be the case in all the examples I've seen so far, where A and B both consist of a single cell. Otherwise, if C were to lose a candidate, either (A + C) or (B + C) would reduce to a naked subset of (N - 1) candidates distributed across (N - 1) cells.
However, in a more complicated example, where either A or B (or both) consist of more than one cell, this restriction would (according to my current understanding) no longer apply. I assume that, in general, C can afford to lose up to 2 candidates less than the combined number of cells in (A + B).
Ron, am I on the right track here?
Many thanks for another impressive post, and for taking the time to explain this so thoroughly. I certainly find your reasoning significantly easier to follow than the original Sue de coq thread!
One point I didn't realize up to now is the fact that the intersection of the two sets (C in your example) must consist of at least 2 cells. But this has now become clear. To avoid being a naked subset, the disjoint sets A and B must each have at least one excess candidate (compared to the number of cells of which they consist). Because of the condition that (A + B + C) must consist of N candidates distributed across N cells, these excess candidates from A and B must be "absorbed" by the intersection C, which must therefore consist of at least 2 cells.
What I'm less sure about is your comment that:
"All candidate digits in A, and all candidate digits in B, are candidates in C."
By similar logic to that applied above, I can see that this must indeed be the case in all the examples I've seen so far, where A and B both consist of a single cell. Otherwise, if C were to lose a candidate, either (A + C) or (B + C) would reduce to a naked subset of (N - 1) candidates distributed across (N - 1) cells.
However, in a more complicated example, where either A or B (or both) consist of more than one cell, this restriction would (according to my current understanding) no longer apply. I assume that, in general, C can afford to lose up to 2 candidates less than the combined number of cells in (A + B).
Ron, am I on the right track here?
Cheers,
Mike
Mike
Mike,
I don't have time for a response with examples, but I did want to acknowledge your comments. I believe you're definitely on the right track, maybe a bit ahead of me. I included the condition that you question because, as I remember, that was the way the originator (our "boy named Sue," if you've read the "Discussion" tab in the Sudopedia topic) described the pattern in the original Sudoku Player's Forum thread. I don't have time at the moment to confirm that, because, as you suggest, it takes a while to plod through all that's in that thread.
You made some good observations. I agree with what you seem to be suggesting, but it's late for me so you should check me on this.
We can delete the condition in question (the second one) if the third is reworded slightly, something like:
The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C. (Formerly, this was the same as the number of distinct candidate digits in C.)
Then the conclusions can be something like this:
Every digit (in the combined structure) which is not a candidate in B can be removed from any cell in the box which is in neither A nor C.
Similarly, every digit (in the combined structure) which is not a candidate in A can be removed from any cell in the line which is in neither B nor C.
Now we don't have to mention specifically the third conclusion as it's covered by these two. So it looks like this more general formulation is actually easier to express.
I don't have time for a response with examples, but I did want to acknowledge your comments. I believe you're definitely on the right track, maybe a bit ahead of me. I included the condition that you question because, as I remember, that was the way the originator (our "boy named Sue," if you've read the "Discussion" tab in the Sudopedia topic) described the pattern in the original Sudoku Player's Forum thread. I don't have time at the moment to confirm that, because, as you suggest, it takes a while to plod through all that's in that thread.
You made some good observations. I agree with what you seem to be suggesting, but it's late for me so you should check me on this.
We can delete the condition in question (the second one) if the third is reworded slightly, something like:
The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C. (Formerly, this was the same as the number of distinct candidate digits in C.)
Then the conclusions can be something like this:
Every digit (in the combined structure) which is not a candidate in B can be removed from any cell in the box which is in neither A nor C.
Similarly, every digit (in the combined structure) which is not a candidate in A can be removed from any cell in the line which is in neither B nor C.
Now we don't have to mention specifically the third conclusion as it's covered by these two. So it looks like this more general formulation is actually easier to express.
Ron,
Sorry it took so long for me to get back to you on this one.
I just want to say that the new generalized description looks fine to me now. Maybe it can serve as the basis for a new formal definition in Sudopedia.
Thanks ever so much for your elaborate and very helpful answers. I look forward to "meeting" you again on another topic in this forum soon!
Sorry it took so long for me to get back to you on this one.
I just want to say that the new generalized description looks fine to me now. Maybe it can serve as the basis for a new formal definition in Sudopedia.
Thanks ever so much for your elaborate and very helpful answers. I look forward to "meeting" you again on another topic in this forum soon!
Cheers,
Mike
Mike
Generalized 2 Sector Sue de Coq
(Edited 18 Apr 2007 to improve wording of hypotheses and add another example.)
Mike,
Please don't feel you need to excuse yourself for any delays in responding. I'm usually not timely myself even in more limited discussions. Anyway, I have given all this some more thought in the interim.
After reviewing everything to date, I find that I still didn't quite state things in the most general way, and that some more examples would be useful. So here I will attempt to consolidate and make some minor adjustments on everything stated so far, and also provide several hypothetical examples.
*************************************************************************************
Generalized Two Sector Sue de Coq
Suppose, for some box and some line intersecting that box, that A is some set of unsolved cells in the box, B is some set of unsolved cells in the line, and C is a set of two or three unsolved cells in the line/box intersection, such that:
*************************************************************************************
The following diagram is intended to facilitate discussion, not to be physically suggestive of the Sue de Coq pattern.
A' is the set of all unsolved cells in the box which are in neither A nor C, and similarly B' is the set of all unsolved cells in the line which are in neither B nor C. In an actual Sudoku grid, the cells in any of these sets do not need to be physically contiguous or connected. Also, if there are three unsolved cells in the line/box intersection and C contains only two of them, then the remaining cell of the line/box intersection will be in both A' and B'.
In terms of the diagram, the conclusions are that all digits appearing in T (A union B union C) which are not candidates in B can be removed from A', and that all digits in T which are not candidates in A can be removed from B'.
My assumption has been that the argument from the Sue de Coq perspective is something like this:
By hypothesis, if there are N total cells in T, then in total there are N distinct candidate digits in T. None of these digits can be placed twice in T: certainly no digit can be placed twice in the box, and similarly no digit can be placed twice in the line; and, since by hypothesis A and B have no common candidates, no digit can appear in both A and B. Thus, in order for all N cells of T to be filled, each of the N digits must appear (exactly once) in it.
Each candidate digit of T which is not a candidate in B must reside somewhere in A or C, and thus can be eliminated from all cells in A'. Likewise, each candidate digit of T which is not a candidate in A must reside somewhere in B or C, and thus can be eliminated from all cells in B'.
However, upon a careful rereading of the original Sue de Coq thread in the Sudoku Players Forum, (here), I see that the argument used there is not along the line of reasoning presented here, but in fact resembles the Sudopedia argument. I perhaps should back off from my claim earlier in this thread that the Sudopedia presentation is an Almost Locked Candidates argument, although it does have much of that flavor.
I believe that it was after I read the statement in the Sudopedia that the Sue de Coq technique is a special case of subset counting, that (after a bit of thought) I jumped to the conclusion that the entire pattern was the subset being counted. (This view was recently reinforced when I read the second external reference link given on the Sudopedia page (here). The author, Andries E. Brouwer, explains the Sue de Coq eliminations with an argument similar to the one given above -- a subset counting argument based on the set of cells in the entire pattern. That reference also states some general subset counting principles which are useful.) What we see in the Sudopedia, and in the Sudoku Player's Forum thread, is really a subset counting argument for the set A union C (and similarly for B union C). I don't feel that the argument from this viewpoint is as easily understood as the reasoning presented above, nor is it clearly expressed in general terms in either source. At the risk of being long-winded, I will attempt to elaborate the argument from that viewpoint, since some interesting insights can be gained along the way. (Some readers may prefer to skip to the "Examples" section below.)
To this end, some symbology and definitions (some of which have appeared earlier in this thread) will be useful:
Definitions
If S is any set of cells (or widgets, etc.), then |S| denotes the number of elements in S.
If S denotes a set of cells, then D(S) denotes the set of all candidate digits in S.
If a set of N cells in some house contains N + k candidate digits, then the set will be said to be an Almost ... Almost Locked Set (A...ALS) with k degrees of freedom.
So k is just the candidate digit count minus the cell count.
If S is an A...ALS with k degrees of freedom, then
If we agree that two sets of cells "see" each other if the two sets are disjoint, and all cells of the two sets reside in some common house, then we have the following self-evident principle:
Now suppose that A, B, and C are sets of cells satisfying the generalized two sector hypotheses, and again let T denote the combined set of cells: T = A union B union C.
First we note that from the hypothesis that sets A, B, and C are pairwise disjoint, the number of cells in T is |A| + |B| + |C|.
Now let kA and kB denote the number of degrees of freedom of sets A and B, respectively.
Since A and B have no common candidates, the total set of candidate digits in T can be broken down into three mutually exclusive sets:
Aside: As Mike pointed out earlier in the thread, to make things interesting, we can assume that both kA and kB are at least one. (If either of A or B is a naked subset, there are corresponding eliminations in its house, independent of the other sets in the pattern.)
Now since |C| is 2 or 3, and for "interesting" patterns we have kA, kB >= 1, the above relation shows that there's not much leeway in the configuration:
*********************************************************************************
Diagram repeated for convenience:
Now consider the set C. Again we can break down the candidate digits in C into three mutually exclusive sets -- those in D(A), those in D(B), and the "C only" digit, if any. Since C sees set A (in the box), which has kA degrees of freedom, C can contain at most kA digits of D(A). Likewise, C can contain at most kB digits of D(B). Finally, C can contain at most nCO of the "C only" candidate digits (because that's all there are).
So C can contain at most
So, in order for all cells of C to be filled, C must contain (some set of) kA digits of D(A), (some set of) kB digits of D(B), and the "C only" digit, if any.
In the box, the |A| cells of set A will combine with the kA + nCO cells of C which contain the kA digits of D(A), and the "C only" digit (if any), to form a locked set -- |A| + kA + nCO cells with |A| + kA + nCO digits. So, these digits can be removed as candidates from all cells in A'. Note that this set of digits contains exactly those digits in T which are not in B.
Of course, everything in this discussion is symmetrical with respect to the box and the line so we similarly derive the conclusion that those digits in T which are not in A can be removed from all cells in B'.
***************************************************************************
Examples
Note: In the following diagrams, all eliminated digits are shown after the "-" sign.
In this classic 4-cell Sue de Coq pattern, |C| = 2, and the remaining cell in the line box intersection benefits from both the eliminations in the box and in the line.
This 5-cell example is not a classic Sue de Coq pattern, since digit 5 is in A but not in C. Again we have |C| = 2, and the remaining cell of the line/box intersection benefits from both the eliminations in the box and in the line.
This 5-cell pattern shows that if |C| = 2, it is possible that one of the sets A or B can contain the remaining cell in the line/box intersection:
In this 6-cell pattern, |C| = 2 and each of sets A and B contains a digit not in C.
For completeness, this is a 5-cell classic Sue de Coq pattern with a "C only" digit (namely, 5):
Clearly, this last example could be extended to a 6-cell non-classic pattern if B consisted of two cells in the line, say r3c47, containing digits 3,4, and 6, for example.
This is a 5-cell pattern with set A an Almost Locked Set and set B an Almost Almost Locked Set.
Mike,
Please don't feel you need to excuse yourself for any delays in responding. I'm usually not timely myself even in more limited discussions. Anyway, I have given all this some more thought in the interim.
After reviewing everything to date, I find that I still didn't quite state things in the most general way, and that some more examples would be useful. So here I will attempt to consolidate and make some minor adjustments on everything stated so far, and also provide several hypothetical examples.
*************************************************************************************
Generalized Two Sector Sue de Coq
Suppose, for some box and some line intersecting that box, that A is some set of unsolved cells in the box, B is some set of unsolved cells in the line, and C is a set of two or three unsolved cells in the line/box intersection, such that:
- A, B, and C are pairwise disjoint (no two share a common cell).
A and B have no common candidate digits.
The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C.
- All candidate digits in T which are not candidates in B can be eliminated from any cell in the box which is in neither A nor C.
All candidate digits in T which are not candidates in A can be eliminated from any cell in the line which is in neither B nor C.
*************************************************************************************
The following diagram is intended to facilitate discussion, not to be physically suggestive of the Sue de Coq pattern.
Code: Select all
.--------.--------.
| | |
| A | A' |
| | |
'--------'--------+--------. A, A', C are all in the box
| |
| C |
| |
'--------'
| |
| B' | B, B', C are all in the line
| |
'--------'
| |
| B | T is the set A union B union C
| |
'--------'
In terms of the diagram, the conclusions are that all digits appearing in T (A union B union C) which are not candidates in B can be removed from A', and that all digits in T which are not candidates in A can be removed from B'.
My assumption has been that the argument from the Sue de Coq perspective is something like this:
By hypothesis, if there are N total cells in T, then in total there are N distinct candidate digits in T. None of these digits can be placed twice in T: certainly no digit can be placed twice in the box, and similarly no digit can be placed twice in the line; and, since by hypothesis A and B have no common candidates, no digit can appear in both A and B. Thus, in order for all N cells of T to be filled, each of the N digits must appear (exactly once) in it.
Each candidate digit of T which is not a candidate in B must reside somewhere in A or C, and thus can be eliminated from all cells in A'. Likewise, each candidate digit of T which is not a candidate in A must reside somewhere in B or C, and thus can be eliminated from all cells in B'.
However, upon a careful rereading of the original Sue de Coq thread in the Sudoku Players Forum, (here), I see that the argument used there is not along the line of reasoning presented here, but in fact resembles the Sudopedia argument. I perhaps should back off from my claim earlier in this thread that the Sudopedia presentation is an Almost Locked Candidates argument, although it does have much of that flavor.
I believe that it was after I read the statement in the Sudopedia that the Sue de Coq technique is a special case of subset counting, that (after a bit of thought) I jumped to the conclusion that the entire pattern was the subset being counted. (This view was recently reinforced when I read the second external reference link given on the Sudopedia page (here). The author, Andries E. Brouwer, explains the Sue de Coq eliminations with an argument similar to the one given above -- a subset counting argument based on the set of cells in the entire pattern. That reference also states some general subset counting principles which are useful.) What we see in the Sudopedia, and in the Sudoku Player's Forum thread, is really a subset counting argument for the set A union C (and similarly for B union C). I don't feel that the argument from this viewpoint is as easily understood as the reasoning presented above, nor is it clearly expressed in general terms in either source. At the risk of being long-winded, I will attempt to elaborate the argument from that viewpoint, since some interesting insights can be gained along the way. (Some readers may prefer to skip to the "Examples" section below.)
To this end, some symbology and definitions (some of which have appeared earlier in this thread) will be useful:
Definitions
If S is any set of cells (or widgets, etc.), then |S| denotes the number of elements in S.
If S denotes a set of cells, then D(S) denotes the set of all candidate digits in S.
If a set of N cells in some house contains N + k candidate digits, then the set will be said to be an Almost ... Almost Locked Set (A...ALS) with k degrees of freedom.
So k is just the candidate digit count minus the cell count.
If S is an A...ALS with k degrees of freedom, then
- |D(S)| = |S| + k
If we agree that two sets of cells "see" each other if the two sets are disjoint, and all cells of the two sets reside in some common house, then we have the following self-evident principle:
- If A is an A...ALS with k degrees of freedom, and S is any set of cells which sees A, then S can contain at most k digits of D(A).
Now suppose that A, B, and C are sets of cells satisfying the generalized two sector hypotheses, and again let T denote the combined set of cells: T = A union B union C.
First we note that from the hypothesis that sets A, B, and C are pairwise disjoint, the number of cells in T is |A| + |B| + |C|.
Now let kA and kB denote the number of degrees of freedom of sets A and B, respectively.
Since A and B have no common candidates, the total set of candidate digits in T can be broken down into three mutually exclusive sets:
- Those digits which appear as candidates in A (i.e., those in D(A)). There are |A| + kA such digits.
Those digits which appear as candidates in B. There are |B| + kB such digits.
Those digits which appear as candidates only in C (in neither of A or B). I'll call these digits "C only" candidates. This set may be empty. Let nCO denote the number of such digits.
- |A| + |B| + |C| = |A| + kA + |B| + kB + nCO
- |C| = kA + kB + nCO
Aside: As Mike pointed out earlier in the thread, to make things interesting, we can assume that both kA and kB are at least one. (If either of A or B is a naked subset, there are corresponding eliminations in its house, independent of the other sets in the pattern.)
Now since |C| is 2 or 3, and for "interesting" patterns we have kA, kB >= 1, the above relation shows that there's not much leeway in the configuration:
- If |C| = 2, we must have kA = kB = 1, nCO = 0.
If nCO > 0, we must have |C| = 3, kA = kB = nCO = 1.
So nCO is 0 or 1 -- there is at most one "C only" digit.
We also note that at least one of the sets A, B must be an Almost Locked Set. The other can be an Almost Almost Locked Set if |C| = 3 and nCO = 0.
*********************************************************************************
Diagram repeated for convenience:
Code: Select all
.--------.--------.
| | |
| A | A' |
| | |
'--------'--------+--------. A, A', C are all in the box
| |
| C |
| |
'--------'
| |
| B' | B, B', C are all in the line
| |
'--------'
| |
| B | T is the set A union B union C
| |
'--------'
So C can contain at most
- kA + kB + nCO digits
So, in order for all cells of C to be filled, C must contain (some set of) kA digits of D(A), (some set of) kB digits of D(B), and the "C only" digit, if any.
In the box, the |A| cells of set A will combine with the kA + nCO cells of C which contain the kA digits of D(A), and the "C only" digit (if any), to form a locked set -- |A| + kA + nCO cells with |A| + kA + nCO digits. So, these digits can be removed as candidates from all cells in A'. Note that this set of digits contains exactly those digits in T which are not in B.
Of course, everything in this discussion is symmetrical with respect to the box and the line so we similarly derive the conclusion that those digits in T which are not in A can be removed from all cells in B'.
***************************************************************************
Examples
Note: In the following diagrams, all eliminated digits are shown after the "-" sign.
In this classic 4-cell Sue de Coq pattern, |C| = 2, and the remaining cell in the line box intersection benefits from both the eliminations in the box and in the line.
Code: Select all
.----------------------.-----------------.------------------.
row 1 | A12 *-12 *-12 | . . . | . . . |
row 2 | *-12 *-12 *-12 | . . . | . . . |
row 3 | *-1234 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.----------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 34
Code: Select all
.-----------------------.-----------------.------------------.
row 1 | A125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | *-12345 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.-----------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1|r2c2 2 cells with digits 125
Set B = r3c4, 1 cell with digits 34
Code: Select all
.---------------------.-----------------.------------------.
row 1 | *-125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | A125 C1234 C1234 | B34 *-34 *-34 | *-34 *-34 *-34 |
.---------------------.-----------------.------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r2c2|r3c1, 2 cells with digits 125
Set B = r3c4, 1 cell with digits 34
Code: Select all
.------------------------.-------------------.---------------------.
row 1 | A125 *-125 *-125 | . . . | . . . |
row 2 | *-125 A125 *-125 | . . . | . . . |
row 3 | *-123456 C1234 C1234 | B346 *-346 *-346| B346 *-346 *-346 |
.------------------------.-------------------.---------------------.
Set C = r3c23, 2 cells with digits 1234
Set A = r1c1|r2c2, 2 cells with digits 125
Set B = r3c47, 2 cells with digits 346
Code: Select all
.------------------------.-------------------.---------------------.
row 1 | A12 *-125 *-125 | . . . | . . . |
row 2 | *-125 *-125 *-125 | . . . | . . . |
row 3 | C12345 C12345 C12345 | B34 *-345 *-345 | *-345 *-345 *-345 |
.------------------------.-------------------.---------------------.
Set C = r3c123, 3 cells with digits 12345
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 34
This is a 5-cell pattern with set A an Almost Locked Set and set B an Almost Almost Locked Set.
Code: Select all
.------------------------.--------------------.---------------------.
row 1 | A12 *-12 *-12 | . . . | . . . |
row 2 | *-12 *-12 *-12 | . . . | . . . |
row 3 | C12345 C12345 C12345 | B345 *-345 *-345 | *-345 *-345 *-345 |
.------------------------.--------------------.---------------------.
Set C = r3c123, 3 cells with digits 12345
Set A = r1c1, 1 cell with digits 12
Set B = r3c4, 1 cell with digits 345
Last edited by Ron Moore on Wed Apr 18, 2007 9:05 pm, edited 1 time in total.
Ron,
All of what you say really appeals to me. Indeed, I can't really fault any of it, and particularly liked your formalization based on degrees of freedom. I can honestly say - and I'm not just saying this to be polite - it's the best description of Sue de Coq I've ever read.
The only thing I would like to point out is that maybe it would be a good idea to generalize even more (if and when the definitions need to be migrated to another source such as Sudopedia)! In particular, I'm thinking about jigsaws here, where a box/line intersection may consist of more than 3 cells. For example, instead of saying:
"C is a set of two or more unsolved cells in the line/box intersection".
I would certainly be very interested in seeing a generalized Sue de Coq implementation, especially when properly adapted and applied to jigsaws, where I have high hopes for it. Now, with Ruud's Daily Jigsaws (in particular, the "Ultra Hard" ones at the weekend), we have - possibly for the first time on the web (?!) - a consistent source of high-quality jigsaws that require advanced vanilla techniques to solve. I'm hoping that these jigsaws will start attracting a new "hard-core" audience because of this.
All of what you say really appeals to me. Indeed, I can't really fault any of it, and particularly liked your formalization based on degrees of freedom. I can honestly say - and I'm not just saying this to be polite - it's the best description of Sue de Coq I've ever read.
The only thing I would like to point out is that maybe it would be a good idea to generalize even more (if and when the definitions need to be migrated to another source such as Sudopedia)! In particular, I'm thinking about jigsaws here, where a box/line intersection may consist of more than 3 cells. For example, instead of saying:
it might instead be better to say:C is a set of two or three unsolved cells in the line/box intersection
"C is a set of two or more unsolved cells in the line/box intersection".
I would certainly be very interested in seeing a generalized Sue de Coq implementation, especially when properly adapted and applied to jigsaws, where I have high hopes for it. Now, with Ruud's Daily Jigsaws (in particular, the "Ultra Hard" ones at the weekend), we have - possibly for the first time on the web (?!) - a consistent source of high-quality jigsaws that require advanced vanilla techniques to solve. I'm hoping that these jigsaws will start attracting a new "hard-core" audience because of this.
Cheers,
Mike
Mike
Mike,
I made some edits to my previous post to improve the wording of the hypotheses and add another example (with set B being an AALS). I didn't change the basic hypotheses as you mention, since I haven't dared venture into jigsaws yet. So here I'll limit myself to standard Sudokus. I see from your posts elsewhere that you're very much into jigsaws. By all means feel free to adapt anything here to that (or other) variants in another topic.
Thanks for your interest and comments.
I made some edits to my previous post to improve the wording of the hypotheses and add another example (with set B being an AALS). I didn't change the basic hypotheses as you mention, since I haven't dared venture into jigsaws yet. So here I'll limit myself to standard Sudokus. I see from your posts elsewhere that you're very much into jigsaws. By all means feel free to adapt anything here to that (or other) variants in another topic.
Thanks for your interest and comments.