Sue de Coq Help Needed

If you invented that new way to solve these little puzzles, tell us about it
Post Reply
nj3h
Gold Member
Gold Member
Posts: 111
Joined: Mon Jul 10, 2006 12:58 pm
Location: Virginia / USA

Sue de Coq Help Needed

Post by nj3h »

All,

I am hoping someone can help me in understanding the criteria for the subject technique. Please use the Sudopedia example for this technique in any replies.

The following is a quote from the Sudopedia: "The Sue de Coq technique uses two intersecting sets A and B, where A is a set of N cells with N candidates in a line, B is a set of N cells with N candidates in a box, and the sets A - B and B - A have no common candidates. Candidates can be eliminated from the cells in the line that are not in A, and the cells in the box that are not in B."

My questions are:
a. Please list the members of set A and B. I think they are:
A={1, 2, 4, 5, 8, 9} and B={1, 2, 5, 6, 8, 9}
b. I presume N equals 6 in this example. I presume since N is used in the above quote for both line and box, that N has to be the same for both the line and box. In other words, do the unsolved number of cells in the line and in the box have to be the same?
c. What is A-B and B-A? I think the answer is 4 and 6 respectively. I am puzzled how you could have common candidates.
d. The last sentence totally puzzles me. In the line, 1, 2, and 9 are eliminated, but aren't they in Set A? Likewise, 1 is eliminated in the box, but 1 is in B.

Anyway, any help, using the example in the Suopedia and/or additional examples, will be most appreciated. I guess I must have missed the chapter on sets or just totally have forgotten the concept.

Once I have an explanation I can grasp, then the next trick will be how to easily identify this technique.

Regards,
George
Ron Moore
Addict
Addict
Posts: 72
Joined: Sun Aug 13, 2006 3:34 am
Location: New Mexico

Sue de Coq Hypotheses

Post by Ron Moore »

George,

I commented on some problems with the Sue de Coq page in the Sudopedia in my post in this thread, although I didn't complete my thoughts on how everything should read and how the technique can be explained. So I will attempt to do so here (you don't need to consult that thread).

First of all, just a comment on the notation S - T, where S and T are two sets of cells (or widgets, or whatever). This denotes the set of cells (or whatever) which are in S but not in T.

As you've noticed, Sue de Coq in its simplest form is sometimes referred to as the "Two Sector Disjoint Subsets" technique. The usage of the term "disjoint" here is probably behind some of your confusion. In the presentation in the Sudopedia, A and B are sets of cells. Now A - B and B - A are the sectors here, but we always have (tautologically) that A - B and B - A are disjoint, so that's not what is really meant here. What is meant is that the set of candidate digits in A - B is disjoint from the set of candidate digits in B - A, or alternatively, A - B and B - A have no common candidates.

It's not absolutely necessary, but I think things will be clearer if some simple notation is used to avoid this confusion. Here I'll adopt this notation:
  • If A is a set of cells, then D(A) denotes the set of all candidate digits in A.
Now the Sue de Coq position and eliminations can be described as follows. Things would be neater if the standard symbols for the union and intersection of two sets were available, but I can't figure out how to make those work. In the abstract, this may seem a bit cumbersome but things should clear up with the subsequent discussion of the example in the Sudopedia:

*************************************************************

Suppose that A is a set of cells in a box, and B is a set of cells in a line which intersects that box, and that (A intersect B) contains 2 or 3 cells from the line box/intersection; furthermore, suppose that D(A - B) is disjoint from D(B - A), that both D(A - B) and D(B - A) are subsets of D(A intersect B), and that the number of cells in (A union B) equals the number of digits in D(A union B). Then for any digit d in D(A union B), d can be removed as a candidate from any cell outside of (A union B) which sees all cells with candidate d in (A union B). This means that:

For any digit d in D(A - B), d can be removed as a candidate from any cell in the box which is not in A.

Similarly, for any digit d in D(B - A), d can be removed as a candidate from any cell in the line which is not in B.

Finally, for any digit d which appears in (A intersect B) only, that is, any d in D(A intersect B) - D(A - B) - D(B - A), d may be removed as a candidate from any cell in the line or the box which is not in (A union B). Note that in many Sue de Coq positions, this set of digits will be empty, in which case this conclusion is not applicable.

*************************************************************

Below is the relevant portion of the grid in the Sudopedia example, with sets A and B identified by corresponding prefixes to the cells. Just a note on a notational convention -- I'm listing all eliminated digits after the "-" sign.

Code: Select all

      .------------------.---------------------.-----------------------.
row 1 | 1     29    7    | 259     6      28   |  A58     4      3     |
row 2 | 5     8     4    | 129     3      12   |   7      69-1   126-1 |
row 3 | 3    B29    6    | 45-129  48-29  7    | AB158  AB159  AB125   |
      .------------------.---------------------.-----------------------.


Set A is r1c7, r3c789.

Set B is r3c2, r3c789.

Set A - B is r1c7, with D(A - B) = {5,8}

Set B - A is r3c2, with D(B - A) = {2,9}. Note that D(B - A) and D(A - B) are disjoint.

Set (A intersect B) is r3c789, with D(A intersect B) = {1,2,5,8,9}. Note that D(B - A) and D(A - B) are subsets of this set.

Set (A union B) contains 5 cells and D(A union B) contains 5 digits, {1,2,5,8,9}.

Then the first conclusion tells us that 5 and 8 (the digits in D(A - B)) can be removed from any cell in box 3 which is not in A. In this case, the first conclusion yields nothing.

The second conclusion tells us that 2 and 9 (the digits in D(B - A)) can be removed from any cell in row 3 which is not in B -- thus the eliminations of 2 and 9 shown in r3c45.

The third conclusion does apply in this case. Digit 1 appears in the intersection only; that is, "1" is in D(A intersect B), but in neither of D(A - B) nor D(B - A). So the conclusion tells us that 1 can be removed from all cells in the box and the line which are not in (A union B) -- thus, the eliminations of 1 shown in r2c89 and r3c4.

It's possible to understand all of this from the reasoning given in the Sudopedia, but that's not really looking at the position from the Sue de Coq perspective. By hypothesis we have the same number of cells in the entire structure (A union B) as the number of candidates in the entire structure -- D(A union B). It follows from our hypotheses that no digit d can be placed twice in the structure. Why is this true? Well, digit d certainly cannot appear twice in box 3, nor twice in row 3; thus, the only conceivable way to place d twice in the entire structure would be for d to appear once in A - B and once in B - A. However, this would be contrary to the hypothesis that D(A - B) and D(B - A) are disjoint.

Thus, in order to fill all 5 cells of the entire structure (A union B), each digit d in D(A union B) -- which contains 5 digits -- must appear exactly once in the structure. Thus, if any cell outside of (A union B) sees all d candidates in (A union B), it has to see digit d somewhere and thus cannot itself contain d. From this the three conclusions above follow, although you can simply work from this principle alone if you prefer.

To answer one of your direct questions, A and B do not have to be the same size.

Another observation: If (A intersect B) contains 2 cells, then the remaining cell in the line/box intersection may be the target for both of the eliminations in the box and the eliminations in the line (the first and second conclusions).

The hypotheses for the Sue de Coq eliminations are a bit restrictive, but they do produce a nice symmetry between the eliminations in the line and the box. There may be some "almost" Sue de Coq positions from which eliminations of multiple digits can be made using similar counting arguments, but the symmetry between the line and box eliminations is lost. If you like, I can continue on with an example, but for the moment I'll stop here and let you absorb the strictly Sue de Coq positions.
mhparker
Grandmaster
Grandmaster
Posts: 345
Joined: Sat Jan 20, 2007 10:47 pm
Location: Germany

Post by mhparker »

Great answer, Ron!

Nevertheless, it has to be said that the Sudopedia definition is just plain wrong! Somebody should change it. I also didn't understand the Sudopedia definition, and only started to understand the technique after consulting other sources, which defeats the whole purpose of a centralized knowledge base, which is what Sudopedia is aspiring to be.

I also don't like the fact the Sudopedia definition ends with the sentence "Candidates can be eliminated from the cells...", without being explicit about which candidates can be eliminated. This information should be part of any formal definition.

Also confusing is the fact that the explanation of the example does not seem to bear any resemblance to the formal definition. This encourages people to ignore the confusing definition, and latch onto the concrete example instead. Unfortunately, there is only one example provided, which also happens to correspond to the same pattern used in the original sources and most other solving guides (i.e., three common cells, and a single bivalue cell in each of A - B and B - A). This encourages people to think that once they understand this one pattern, they have completely understood the principle, which is clearly not the case.
Cheers,
Mike
Ron Moore
Addict
Addict
Posts: 72
Joined: Sun Aug 13, 2006 3:34 am
Location: New Mexico

Alternate Formulation of Sue de Coq Technique

Post by Ron Moore »

George and Mike,

I formulated my first post in terms of two intersecting sets of cells A and B, as I was trying to maintain some similarity with the presentation in the Sudopedia. However, as Mike points out, there is really little which can be salvaged from that page, so I wish I hadn't limited myself that way. It's a bit cleaner (especially with my inability to use some special characters) to present this in terms of three disjoint sets of cells, as was done on the Sudoku Players Forum, in the thread in which the "Sue de Coq" pattern was first presented (here). So George, I'm sorry if you've been struggling to understand my first post, but I'm going to shift gears and present things slightly differently. I think it will be easier in the long run. None of this is original to me -- it's just my understanding of the thread in the Sudoku Players Forum. However, I don't really see any one easily identifiable place in that thread where the Sue de Coq constraints and eliminations are summarized. This may be why there seems to be some confusion over exactly what "Sue de Coq" is.

********************************************************************************

Suppose, for some box and some line intersecting that box, that C is a set of two or three cells in the line/box intersection, that A is a set of cells in the box, but disjoint from the line/box intersection, and that B is a set of cells in the line, but disjoint from the line/box intersection, such that:
  • A and B have no common candidate digits.

    All candidate digits in A, and all candidate digits in B, are candidates in C.

    The total number of cells in A, B, and C equals the number of distinct candidate digits in C.
Then:
  • All candidate digits in A can be eliminated from any cell in the box which is not in either of A or C.

    All candidate digits in B can be eliminated from any cell in the line which is not in either of B or C.

    If C contains some candidate d which does not appear as a candidate in either of A or B, then d can be eliminated from any cell in the line or the box which is not in C.
********************************************************************************

Of course, the counting argument that was used in the first formulation is still applicable here. From the hypothesis that A and B have no common candidates, we conclude that each of the candidate digits in C must placed (exactly once) somewhere in the entire structure, A union B union C, and from this the three conclusions follow.

As a second example, consider the position below, which is reached after initial basic techniques are used in the 19 September 2006 Daily Nightmare. By way of explanation, when I first worked on this puzzle, I was trying to learn the ALS XZ technique, and in my eagerness to use this, I overlooked a simpler way to proceed. Please ignore any simpler approaches you may see.

Code: Select all

.---------------------.----------------------.---------------------.
| 2      1      3     |  489    49    #48    | 57     57     6     |
| 7      5      9     |  16     16     2     | 8      34     34    |
| 6      8      4     |  7      3      5     | 9      2      1     |
:---------------------+----------------------+---------------------:
| 3      9      1     |  456    46     7     | 2      456    8     |
| 8      4      2     |  13569  169    136   | 567    1567   57    |
| 5      6      7     |  2      8     #14    | 3      149    49    |
:---------------------+----------------------+---------------------:
| 9      237    56    |  1368   27    #1368  | 4      35678  2357  |
| 4      237    8     | *36     5      9     | 1      367    237   |
| 1      237    56    |  3468   27    #3468  | 567    356789 23579 |
'---------------------'----------------------'---------------------'
Actually, something good came from my oversight. For those readers familiar with Almost Locked Sets (ALS's), I observed that the sets
  • r8c4 -- 1 cell with 2 candidates, {3,6}
and
  • r1679c6 -- 4 cells with 5 candidates, {1,3,4,6,8}
are two ALS's which share 2 restricted common digits, 3 and 6. At the time I noted that there were eliminations in two different houses (box 8 and column 6) which could be made in this position, but it was some time later before I really understood what was going on to make this work. Eventually I came up with the idea I presented in my post in this thread. The eliminations are the same as those which we'll derive from the Sue de Coq technique.

In the position (repeated below), sets A, B, and C in the new Sue de Coq formulation are identified by corresponding prefixes to the cells.

Code: Select all

.---------------------.----------------------.---------------------.
| 2      1      3     |  489    49    B48    | 57     57     6     |
| 7      5      9     |  16     16     2     | 8      34     34    |
| 6      8      4     |  7      3      5     | 9      2      1     |
:---------------------+----------------------+---------------------:
| 3      9      1     |  456    46     7     | 2      456    8     |
| 8      4      2     |  13569  169    36-1  | 567    1567   57    |
| 5      6      7     |  2      8     B14    | 3      149    49    |
:---------------------+----------------------+---------------------:
| 9      237    56    |  18-36  27    C1368  | 4      35678  2357  |
| 4      237    8     | A36     5      9     | 1      367    237   |
| 1      237    56    |  48-36  27    C3468  | 567    356789 23579 |
'---------------------'----------------------'---------------------'
  • set A = r8c4, with candidates {3,6}

    set B = r16c6, with candidates {1,4,8}

    set C = r79c6, with candidates {1,3,4,6,8}
We quickly confirm that:
  • A and B have no common candidates.

    All candidates in A, and all candidates in B, are candidates in C.

    Together, sets A, B, and C contain 5 cells, and 5 distinct candidate digits.
Then the Sue de Coq eliminations are:
  • All candidate digits in A can be removed from any cell in the box which is not in either of A or C -- thus the eliminations of 3 and 6 in r79c4.

    All candidate digits in B can be removed from any cell in the line which is not in either of B or C -- thus the elimination of 1 shown in r5c6.

    In this case, there are no candidates which appear only in C.
As an aside, we can also view this as an Almost Locked Candidates (ALC) position, yielding the same eliminations. ALC was the topic of discussion in a thread I recently initiated (here). Again, a big part of the problem here was determining exactly what "Almost Locked Candidates" means. From the ALC viewpoint, the argument would go something like this (position repeated for convenience):

Code: Select all

.---------------------.----------------------.---------------------.
| 2      1      3     |  489    49     48    | 57     57     6     |
| 7      5      9     |  16     16     2     | 8      34     34    |
| 6      8      4     |  7      3      5     | 9      2      1     |
:---------------------+----------------------+---------------------:
| 3      9      1     |  456    46     7     | 2      456    8     |
| 8      4      2     |  13569  169   *36-1  | 567    1567   57    |
| 5      6      7     |  2      8      14    | 3      149    49    |
:---------------------+----------------------+---------------------:
| 9      237    56    |  18-36  27    *1368  | 4      35678  2357  |
| 4      237    8     | *36     5      9     | 1      367    237   |
| 1      237    56    |  48-36  27    *3468  | 567    356789 23579 |
'---------------------'----------------------'---------------------'
In column 6, the digits 3 and 6 appear only in r5c6, and in the line/box intersection, r79c6. Digits 3 and 6 cannot both be placed in the line/box intersection, as that would conflict with the bivalued "36" cell in r8c4. In column 6, digits 3 and 6 cannot both lie outside of the line box intersection, since there is only one cell available to hold them outside of the line/box intersection, namely r5c6. Therefore, one of 3, 6 must appear in r5c6 (so that r5c6 can't be 1) and the other must appear in one of r79c6. Whichever it is will pair up with r8c4 to eliminate 3's and 6's in other cells of the box.

If you've followed this then you've probably realized that the reasoning presented in the Sudopedia example is really using the ALC viewpoint rather than the Sue de Coq viewpoint.

Anyway, we've seen that there are many ways to skin this cat. I'd be interested to see a real life Sue de Coq position which can't be considered as a "2 ALS's with two restricted common digits" position. (I don't mean to suggest that Sue de Coq is not useful. Players not familiar with ALS's can still use Sue de Coq to advantage.)
mhparker
Grandmaster
Grandmaster
Posts: 345
Joined: Sat Jan 20, 2007 10:47 pm
Location: Germany

Post by mhparker »

Ron,

Many thanks for another impressive post, and for taking the time to explain this so thoroughly. I certainly find your reasoning significantly easier to follow than the original Sue de coq thread!

One point I didn't realize up to now is the fact that the intersection of the two sets (C in your example) must consist of at least 2 cells. But this has now become clear. To avoid being a naked subset, the disjoint sets A and B must each have at least one excess candidate (compared to the number of cells of which they consist). Because of the condition that (A + B + C) must consist of N candidates distributed across N cells, these excess candidates from A and B must be "absorbed" by the intersection C, which must therefore consist of at least 2 cells.

What I'm less sure about is your comment that:

"All candidate digits in A, and all candidate digits in B, are candidates in C."

By similar logic to that applied above, I can see that this must indeed be the case in all the examples I've seen so far, where A and B both consist of a single cell. Otherwise, if C were to lose a candidate, either (A + C) or (B + C) would reduce to a naked subset of (N - 1) candidates distributed across (N - 1) cells.

However, in a more complicated example, where either A or B (or both) consist of more than one cell, this restriction would (according to my current understanding) no longer apply. I assume that, in general, C can afford to lose up to 2 candidates less than the combined number of cells in (A + B).

Ron, am I on the right track here?
Cheers,
Mike
Ron Moore
Addict
Addict
Posts: 72
Joined: Sun Aug 13, 2006 3:34 am
Location: New Mexico

Post by Ron Moore »

Mike,

I don't have time for a response with examples, but I did want to acknowledge your comments. I believe you're definitely on the right track, maybe a bit ahead of me. I included the condition that you question because, as I remember, that was the way the originator (our "boy named Sue," if you've read the "Discussion" tab in the Sudopedia topic) described the pattern in the original Sudoku Player's Forum thread. I don't have time at the moment to confirm that, because, as you suggest, it takes a while to plod through all that's in that thread.

You made some good observations. I agree with what you seem to be suggesting, but it's late for me so you should check me on this.

We can delete the condition in question (the second one) if the third is reworded slightly, something like:

The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C. (Formerly, this was the same as the number of distinct candidate digits in C.)

Then the conclusions can be something like this:

Every digit (in the combined structure) which is not a candidate in B can be removed from any cell in the box which is in neither A nor C.

Similarly, every digit (in the combined structure) which is not a candidate in A can be removed from any cell in the line which is in neither B nor C.

Now we don't have to mention specifically the third conclusion as it's covered by these two. So it looks like this more general formulation is actually easier to express.
mhparker
Grandmaster
Grandmaster
Posts: 345
Joined: Sat Jan 20, 2007 10:47 pm
Location: Germany

Post by mhparker »

Ron,

Sorry it took so long for me to get back to you on this one.

I just want to say that the new generalized description looks fine to me now. Maybe it can serve as the basis for a new formal definition in Sudopedia.

Thanks ever so much for your elaborate and very helpful answers. I look forward to "meeting" you again on another topic in this forum soon!
Cheers,
Mike
Ron Moore
Addict
Addict
Posts: 72
Joined: Sun Aug 13, 2006 3:34 am
Location: New Mexico

Generalized 2 Sector Sue de Coq

Post by Ron Moore »

(Edited 18 Apr 2007 to improve wording of hypotheses and add another example.)

Mike,

Please don't feel you need to excuse yourself for any delays in responding. I'm usually not timely myself even in more limited discussions. Anyway, I have given all this some more thought in the interim.

After reviewing everything to date, I find that I still didn't quite state things in the most general way, and that some more examples would be useful. So here I will attempt to consolidate and make some minor adjustments on everything stated so far, and also provide several hypothetical examples.

*************************************************************************************

Generalized Two Sector Sue de Coq

Suppose, for some box and some line intersecting that box, that A is some set of unsolved cells in the box, B is some set of unsolved cells in the line, and C is a set of two or three unsolved cells in the line/box intersection, such that:
  • A, B, and C are pairwise disjoint (no two share a common cell).

    A and B have no common candidate digits.

    The total number of cells in A, B, and C equals the number of distinct candidate digits in A, B, and C.
Then, letting T denote the set of all cells in the pattern, i.e., T = A union B union C:
  • All candidate digits in T which are not candidates in B can be eliminated from any cell in the box which is in neither A nor C.

    All candidate digits in T which are not candidates in A can be eliminated from any cell in the line which is in neither B nor C.
Note: For a "classic" Sue de Coq pattern, an additional constraint is added -- that all candidate digits in A, and all candidate digits in B, are also candidates in C. However, this constraint is not necessary and the same eliminations follow without it. I suspect that in practice that the vast majority of generalized patterns will in fact be classic patterns. However, I'm wondering if any solver programs have been coded to recognize the generalized pattern described here.

*************************************************************************************

The following diagram is intended to facilitate discussion, not to be physically suggestive of the Sue de Coq pattern.

Code: Select all

.--------.--------.
|        |        |
|   A    |    A'  |
|        |        |
'--------'--------+--------.    A, A', C are all in the box
                  |        |
                  |   C    |
                  |        |
                  '--------'
                  |        |
                  |   B'   |    B, B', C are all in the line
                  |        |
                  '--------'
                  |        |
                  |   B    |    T is the set A union B union C
                  |        |
                  '--------'
A' is the set of all unsolved cells in the box which are in neither A nor C, and similarly B' is the set of all unsolved cells in the line which are in neither B nor C. In an actual Sudoku grid, the cells in any of these sets do not need to be physically contiguous or connected. Also, if there are three unsolved cells in the line/box intersection and C contains only two of them, then the remaining cell of the line/box intersection will be in both A' and B'.

In terms of the diagram, the conclusions are that all digits appearing in T (A union B union C) which are not candidates in B can be removed from A', and that all digits in T which are not candidates in A can be removed from B'.

My assumption has been that the argument from the Sue de Coq perspective is something like this:

By hypothesis, if there are N total cells in T, then in total there are N distinct candidate digits in T. None of these digits can be placed twice in T: certainly no digit can be placed twice in the box, and similarly no digit can be placed twice in the line; and, since by hypothesis A and B have no common candidates, no digit can appear in both A and B. Thus, in order for all N cells of T to be filled, each of the N digits must appear (exactly once) in it.

Each candidate digit of T which is not a candidate in B must reside somewhere in A or C, and thus can be eliminated from all cells in A'. Likewise, each candidate digit of T which is not a candidate in A must reside somewhere in B or C, and thus can be eliminated from all cells in B'.

However, upon a careful rereading of the original Sue de Coq thread in the Sudoku Players Forum, (here), I see that the argument used there is not along the line of reasoning presented here, but in fact resembles the Sudopedia argument. I perhaps should back off from my claim earlier in this thread that the Sudopedia presentation is an Almost Locked Candidates argument, although it does have much of that flavor.

I believe that it was after I read the statement in the Sudopedia that the Sue de Coq technique is a special case of subset counting, that (after a bit of thought) I jumped to the conclusion that the entire pattern was the subset being counted. (This view was recently reinforced when I read the second external reference link given on the Sudopedia page (here). The author, Andries E. Brouwer, explains the Sue de Coq eliminations with an argument similar to the one given above -- a subset counting argument based on the set of cells in the entire pattern. That reference also states some general subset counting principles which are useful.) What we see in the Sudopedia, and in the Sudoku Player's Forum thread, is really a subset counting argument for the set A union C (and similarly for B union C). I don't feel that the argument from this viewpoint is as easily understood as the reasoning presented above, nor is it clearly expressed in general terms in either source. At the risk of being long-winded, I will attempt to elaborate the argument from that viewpoint, since some interesting insights can be gained along the way. (Some readers may prefer to skip to the "Examples" section below.)

To this end, some symbology and definitions (some of which have appeared earlier in this thread) will be useful:

Definitions

If S is any set of cells (or widgets, etc.), then |S| denotes the number of elements in S.

If S denotes a set of cells, then D(S) denotes the set of all candidate digits in S.

If a set of N cells in some house contains N + k candidate digits, then the set will be said to be an Almost ... Almost Locked Set (A...ALS) with k degrees of freedom.

So k is just the candidate digit count minus the cell count.

If S is an A...ALS with k degrees of freedom, then
  • |D(S)| = |S| + k
For k = 0, we have a locked (or naked) set; for k = 1, we have an Almost Locked Set; and for k = 2, we have an Almost Almost Locked Set, etc.

If we agree that two sets of cells "see" each other if the two sets are disjoint, and all cells of the two sets reside in some common house, then we have the following self-evident principle:
  • If A is an A...ALS with k degrees of freedom, and S is any set of cells which sees A, then S can contain at most k digits of D(A).
*************************************************************************************

Now suppose that A, B, and C are sets of cells satisfying the generalized two sector hypotheses, and again let T denote the combined set of cells: T = A union B union C.

First we note that from the hypothesis that sets A, B, and C are pairwise disjoint, the number of cells in T is |A| + |B| + |C|.

Now let kA and kB denote the number of degrees of freedom of sets A and B, respectively.

Since A and B have no common candidates, the total set of candidate digits in T can be broken down into three mutually exclusive sets:
  • Those digits which appear as candidates in A (i.e., those in D(A)). There are |A| + kA such digits.

    Those digits which appear as candidates in B. There are |B| + kB such digits.

    Those digits which appear as candidates only in C (in neither of A or B). I'll call these digits "C only" candidates. This set may be empty. Let nCO denote the number of such digits.
From the hypothesis that the total number of cells in T equals the total number of candidate digits appearing in T, we have the relation
  • |A| + |B| + |C| = |A| + kA + |B| + kB + nCO
which reduces to
  • |C| = kA + kB + nCO
**********************************************************************************

Aside: As Mike pointed out earlier in the thread, to make things interesting, we can assume that both kA and kB are at least one. (If either of A or B is a naked subset, there are corresponding eliminations in its house, independent of the other sets in the pattern.)

Now since |C| is 2 or 3, and for "interesting" patterns we have kA, kB >= 1, the above relation shows that there's not much leeway in the configuration:
  • If |C| = 2, we must have kA = kB = 1, nCO = 0.

    If nCO > 0, we must have |C| = 3, kA = kB = nCO = 1.

    So nCO is 0 or 1 -- there is at most one "C only" digit.

    We also note that at least one of the sets A, B must be an Almost Locked Set. The other can be an Almost Almost Locked Set if |C| = 3 and nCO = 0.
So it's not surprising that many Sue de Coq configurations can be entirely understood in terms of Almost Locked Set arguments.

*********************************************************************************

Diagram repeated for convenience:

Code: Select all

.--------.--------.
|        |        |
|   A    |    A'  |
|        |        |
'--------'--------+--------.    A, A', C are all in the box
                  |        |
                  |   C    |
                  |        |
                  '--------'
                  |        |
                  |   B'   |    B, B', C are all in the line
                  |        |
                  '--------'
                  |        |
                  |   B    |    T is the set A union B union C
                  |        |
                  '--------'
Now consider the set C. Again we can break down the candidate digits in C into three mutually exclusive sets -- those in D(A), those in D(B), and the "C only" digit, if any. Since C sees set A (in the box), which has kA degrees of freedom, C can contain at most kA digits of D(A). Likewise, C can contain at most kB digits of D(B). Finally, C can contain at most nCO of the "C only" candidate digits (because that's all there are).

So C can contain at most
  • kA + kB + nCO digits
but we've shown above that C has exactly that many cells.

So, in order for all cells of C to be filled, C must contain (some set of) kA digits of D(A), (some set of) kB digits of D(B), and the "C only" digit, if any.

In the box, the |A| cells of set A will combine with the kA + nCO cells of C which contain the kA digits of D(A), and the "C only" digit (if any), to form a locked set -- |A| + kA + nCO cells with |A| + kA + nCO digits. So, these digits can be removed as candidates from all cells in A'. Note that this set of digits contains exactly those digits in T which are not in B.

Of course, everything in this discussion is symmetrical with respect to the box and the line so we similarly derive the conclusion that those digits in T which are not in A can be removed from all cells in B'.

***************************************************************************

Examples

Note: In the following diagrams, all eliminated digits are shown after the "-" sign.

In this classic 4-cell Sue de Coq pattern, |C| = 2, and the remaining cell in the line box intersection benefits from both the eliminations in the box and in the line.

Code: Select all

      .----------------------.-----------------.------------------.
row 1 | A12     *-12  *-12   |  .    .     .   |  .     .     .   |
row 2 | *-12    *-12  *-12   |  .    .     .   |  .     .     .   |
row 3 | *-1234  C1234  C1234 | B34  *-34  *-34 | *-34  *-34  *-34 |
      .----------------------.-----------------.------------------.

      Set C = r3c23, 2 cells with digits 1234
      Set A = r1c1,  1 cell  with digits 12
      Set B = r3c4,  1 cell  with digits 34
This 5-cell example is not a classic Sue de Coq pattern, since digit 5 is in A but not in C. Again we have |C| = 2, and the remaining cell of the line/box intersection benefits from both the eliminations in the box and in the line.

Code: Select all

       
      .-----------------------.-----------------.------------------.
row 1 | A125     *-125  *-125 |  .    .     .   |  .     .     .   |
row 2 | *-125    A125   *-125 |  .    .     .   |  .     .     .   |
row 3 | *-12345  C1234  C1234 | B34  *-34  *-34 | *-34  *-34  *-34 |
      .-----------------------.-----------------.------------------.     

      Set C = r3c23,     2 cells with digits 1234
      Set A = r1c1|r2c2  2 cells with digits 125
      Set B = r3c4,      1 cell  with digits 34
This 5-cell pattern shows that if |C| = 2, it is possible that one of the sets A or B can contain the remaining cell in the line/box intersection:

Code: Select all

      .---------------------.-----------------.------------------.
row 1 | *-125  *-125  *-125 |  .    .     .   |  .     .     .   |
row 2 | *-125  A125   *-125 |  .    .     .   |  .     .     .   |
row 3 | A125   C1234  C1234 | B34  *-34  *-34 | *-34  *-34  *-34 |
      .---------------------.-----------------.------------------.

      Set C = r3c23,     2 cells with digits 1234
      Set A = r2c2|r3c1, 2 cells with digits 125
      Set B = r3c4,      1 cell with digits  34
In this 6-cell pattern, |C| = 2 and each of sets A and B contains a digit not in C.

Code: Select all

      .------------------------.-------------------.---------------------.
row 1 | A125      *-125  *-125 |  .     .     .    |  .      .      .    |
row 2 | *-125     A125   *-125 |  .     .     .    |  .      .      .    |
row 3 | *-123456  C1234  C1234 | B346  *-346  *-346| B346   *-346  *-346 |
      .------------------------.-------------------.---------------------.

      Set C = r3c23,     2 cells with digits 1234
      Set A = r1c1|r2c2, 2 cells with digits 125
      Set B = r3c47,     2 cells with digits 346
For completeness, this is a 5-cell classic Sue de Coq pattern with a "C only" digit (namely, 5):

Code: Select all

      .------------------------.-------------------.---------------------.
row 1 | A12     *-125   *-125  |  .    .      .    |  .      .      .    |
row 2 | *-125   *-125   *-125  |  .    .      .    |  .      .      .    |
row 3 | C12345  C12345  C12345 | B34  *-345  *-345 | *-345  *-345  *-345 |
      .------------------------.-------------------.---------------------.

      Set C = r3c123, 3 cells with digits 12345
      Set A = r1c1,   1 cell with digits  12
      Set B = r3c4,   1 cell with digits  34
Clearly, this last example could be extended to a 6-cell non-classic pattern if B consisted of two cells in the line, say r3c47, containing digits 3,4, and 6, for example.

This is a 5-cell pattern with set A an Almost Locked Set and set B an Almost Almost Locked Set.

Code: Select all

      .------------------------.--------------------.---------------------.
row 1 | A12     *-12    *-12   |  .    .      .     |  .      .      .    |
row 2 | *-12    *-12    *-12   |  .    .      .     |  .      .      .    |
row 3 | C12345  C12345  C12345 | B345  *-345  *-345 | *-345  *-345  *-345 |
      .------------------------.--------------------.---------------------.

      Set C = r3c123, 3 cells with digits 12345
      Set A = r1c1,   1 cell with digits  12
      Set B = r3c4,   1 cell with digits  345
Last edited by Ron Moore on Wed Apr 18, 2007 9:05 pm, edited 1 time in total.
mhparker
Grandmaster
Grandmaster
Posts: 345
Joined: Sat Jan 20, 2007 10:47 pm
Location: Germany

Post by mhparker »

Ron,

All of what you say really appeals to me. Indeed, I can't really fault any of it, and particularly liked your formalization based on degrees of freedom. I can honestly say - and I'm not just saying this to be polite - it's the best description of Sue de Coq I've ever read.

The only thing I would like to point out is that maybe it would be a good idea to generalize even more (if and when the definitions need to be migrated to another source such as Sudopedia)! In particular, I'm thinking about jigsaws here, where a box/line intersection may consist of more than 3 cells. For example, instead of saying:
C is a set of two or three unsolved cells in the line/box intersection
it might instead be better to say:

"C is a set of two or more unsolved cells in the line/box intersection".

I would certainly be very interested in seeing a generalized Sue de Coq implementation, especially when properly adapted and applied to jigsaws, where I have high hopes for it. Now, with Ruud's Daily Jigsaws (in particular, the "Ultra Hard" ones at the weekend), we have - possibly for the first time on the web (?!) - a consistent source of high-quality jigsaws that require advanced vanilla techniques to solve. I'm hoping that these jigsaws will start attracting a new "hard-core" audience because of this.
Cheers,
Mike
Ron Moore
Addict
Addict
Posts: 72
Joined: Sun Aug 13, 2006 3:34 am
Location: New Mexico

Post by Ron Moore »

Mike,

I made some edits to my previous post to improve the wording of the hypotheses and add another example (with set B being an AALS). I didn't change the basic hypotheses as you mention, since I haven't dared venture into jigsaws yet. So here I'll limit myself to standard Sudokus. I see from your posts elsewhere that you're very much into jigsaws. By all means feel free to adapt anything here to that (or other) variants in another topic.

Thanks for your interest and comments.
Post Reply