CQ WW CC reveals its log-checking techniques

I think everyone has noticed that the CQWWCC is becoming more and more open and carries on a dialogue with the contest community.
The srtiking example is an interview of Doug KR2Q he had during last Webinar hold by PVRC on the Internet. Here is the excerption and we think it can help alot in sheding some light on the modern log-checking techniques:

Every year we work to further refine the log checking software. All submitted logs go through the same computerized log checking. Every log is checked against every other log and against the Cluster Spot files. As a result, multiple output files are created. These include, but are not limited to:

BAD file: This file shows the number of busts caused in other logs. The percentage is the number of busts found in other logs for this call, divided by the number of crosschecks. Example: KR2Q was copied incorrectly 4 times in 1081 logs checked = 0.4% -OR- IZ5EKV was copied incorrectly 48 times in 1487 logs checked = 3.2%
CROSS CHECK file: sort of the inverse of the BAD file. How many times (and percent) the call was copied correctly.
CROSS BAND QSO file: sorted by call: how many times the “wrong band” was recorded for verified QSO’s.
ERROR file: Listing of all entrants who have “impossible” calls in their log and where in their log that error occurs. For example, these are the entire call sign in the log: “4JEE” or “T88” “or 7Q1/”
MM file: Identifies all maritime mobiles and the correct zone.
NIL file: This file shows the number of not in log contacts caused in other logs. The percentage is the number of not in log contacts found in other logs for this call divided by the number of crosschecked contacts.
OBT (off by two) file: Identifies possible “not in logs” by looking at callsigns which are “off by two” instead of just “off by one.”
Reverse Log file. A listing of every callsign submitted by entrants with a total of the number of QSO for each “logged” QSO. I’ll talk more about this later.
SPOT file: A tabular (excel file) listing of every entrant. Includes Callsign, Continent, Prefix, Zone, District, Category of entry, Score, Total QSOs, Total QSOs on day 1 and day 2, Cluster (spot) hits total and percent, Cluster hits and percents for day 1 and day 2, Total Mult and percents, Mults and percents day 1 and day 2, Total Mult cluster hits and percent, Mult cluster hits and percents day 1 and day 2, plus 89 other measures. Great for station to station comparisons.
TAB file: List of all entrants: Call, Country, Continent, Zone, Category, Certificate, Claimed Score, Raw calculated score, Final Score, Percent Reduction, Total Q, Z, C and band by band Q, Z, C (18 columns), Dupes, Bads, NILs, Uniques, Cross-bands, how many times the entrant caused a NIL in other logs, and percentages for each of those measures. Also great for comparisons.
UNIQUE file:Listing per entrant of all uniques along with a sub-listing of calls it might be, based on other entrant logs. Example, say KR2Q works YL2OO which ends up being Unique. The sub-listing includes: YL2AO (worked by only 1 entrant, probably a bust), YL2CO (claimed by just two entrants), YL2KO (worked by 2478 other entrants and the likely real call), YL2LO (worked by one entrant) and YL2PO, worked by only one entrant.
MONOBAND ENTRY file: A listing of all claimed QSO’s with monoband entrants, but where the qso was NOT on the entrant’s band of entry
Zone 2 file: Listing of the only REAL zone 2 entrants.
LOG file: original cabrillo submission for every entrant.
output LOG file: ComputerAdjudicated file of every entrant
NIL folder: Listing by entrant call of all the NILs they caused, by QRG and time
Reverse Log Folder: Reviews all logs received and builds a file based on those entries. If KR2Q works 1000 stations and 900 submit logs, the RL file for KR2Q would have 900 entries. Includes all data from the submitted “other” logs, including time and QRG.
Spot Folder: Contains a file for every entrant. Combines the submitted log data with the reverse log data, and the Cluster File data. Identifies time deltas, QRG deltas, Uniques, Cross-check status, whether the contact was while running or S+P, the QRG of the RL of the RL, whether or not the QSO is a mult, whether or not the QSO aligns with a cluster spot, and other valuable key metrics.

Certain output files and key metrics allow us to compare logs against each other. When one or more of the metrics appears to too high, the log can be flagged as an outlier. Outlier logs are subject to further committee scrutiny. Sometimes outlier logs turn out to be valid and the Flag was simply the result of a legitimate operation. In such a case, the flag results in a false positive. I should also point out that very few logs reach the stage of multiple flags which warrant further scrutiny. The vast majority of entrants “play fairly.”

Box scores also undergo a human review, irrespective of the flags generated by the computerized review. No log is ever disqualified or reclassified without a human, line by line, review of the actual log.

Sometimes a log passes the computer review (that is, no flags), but the human review turns up something worthy of further investigation. Experienced contesters can be better at finding “funny stuff” in a log after an eyeball review as compared to the relatively limited number of algorithms that are programmed into the software.

Additionally, the CQWWCC encourages input from outside the committee. If you are suspicious that a log may be not truly represent the entrant’s actual effort, then please let us know. All external input is appreciated. Sometimes, the input is simply a “fishing expedition” and the suspect log turns out to be perfectly acceptable. Other times, the outside input is very valuable and helps us to zero in more quickly on a problem log.

Recently, we have been receiving a fair amount of input about “cluster cheaters.” Sometimes, lots of sincere and extensive effort (genuine Hard Work) has gone into the submitted “so-called” proof of Cluster Spotting Assistance while claiming to be unassisted. And all too often, this input is statistically invalid. In these times, we are fast approaching the advent of “everything is spotted all the time.” If you track the movements of almost any of the most-needed mults (such as HC8 or 3V8 and even the “more common” stations) you will find that they are often spotted with 5 minutes of getting on the air or within 5 minutes of changing bands or frequency. Sometimes, they are spotted in the same minute! And then, they are spotted so frequently, that it defies all reason. In such cases, it would be nearly impossible for anyone to work one of these stations without also being flagged as “spotting assistance” based on DX Cluster information. In several cases, the DX was spotted so often and for so long, that any of us could log them during any given minute within a 4 and half hour timeframe and still have them end up being a “cluster hit.”

Certainly, to find a “cluster cheater,” you have to look at this data, but equally, you have to very careful to look at the big picture. Simple “hit” associations are not enough – which is why the CQWWCC is constantly developing and testing new statistical measures.

But please do not be dissuaded from attempting to help-out the committee. We want your input – and the earlier the better. Please don’t wait until the scores are published to say, “Hey - look at this!”

Doug KR2Q

Rating
1	2	3	4	5