Sunday, May 14, 2006

Data mining thin ore for terrorists

USA Today has created quite a furor by reporting that the NSA has been getting records of telephone calls to analyze for signs of terrorist activity.

The administration has been trying to defend the practise saying that they are only getting telephones numbers, not addresses or names. When a name and address are only a click away from the telephone number that defence is meaningless. The fact that the administration would even make the argument is a cause for concern rather than reassurance. The legality of the whole practise is being questioned. Based on their comments so far the administration is relying on hair splitting turns of phrase to defend their actions.

The idea of analyzing telephone records is hardly a new idea. Like many people, I have an unfinished novel sitting in a bottom drawer. Mine was written in 1994 and turned on the use of telephone records to predict a terrorist attack by the IRA.

What became immediately obvious to me when I was writing the book was that to be able to predict a terrorist attack you need several things. You, of course, need a comprehensive record of telephone calls and lots of processing power. My hero effectively stole both of those. Beyond that you need a lot of terrorist events, you need terrorists who are creatures of habit and you need to know the identies of at least some of the nodes on the terrorists' network. Otherwise you are just trying to drink from the proverbial fire hose. The IRA struggle in Northern Ireland created enough actions by a hierarchical organization, in a small enough region and population group, and with the identies of enough IRA people known to make my story at least plausible. Every shooting, bombing, arrest or funeral was likely to generate related telephone traffic, both before and after the event.

A very few events occuring over a large area committed by individual al Qaeda cells which are essentially independent will make prediction almost impossible. The signal to noise ratio is just too low. In the world of fiction, my hero (a nerdy computer programmer) was able to stop a bombing. It is doubtful that the NSA ever will.

A database of all the telephone calls made in the United States would be much more useful in identifying which Congressmen have mistresses than it is ever likely to be in predicting terrorist attacks. The administration will deny that such information would ever be used to target political opponents of the government. The government assurances are essentially meaningless. Years ago the FBI targetted Martin Luther King. Someone in the current administration targetted Valerie Plame and deliberately disclosed confidential information about her to strike at her husband and no one has been punished.

While the telephone data may never predict an attack it could be genuinely useful after the fact in finding co-conspirators. In Canada, the Air India bombing killed over three hundred people and goes unsolved. We know who made the bomb. If the RCMP could go back through a comprehensive database of telephone calls they might be able to find who else was involved in the bombing.

The issues raised by the telephone database apply to other types of data as well. Where I live, in Northern BC, we have had a series of killings of young women along a highway corridor stretching six hundred miles (now dubbed "the Highway of Tears"). At least some of the victims were probably picked up hitch-hiking. One theory is that someone who regularly travels that corridor is responsible for many or all of those killings. If the RCMP had access to a comprehensive database of credit card transactions (including gas company credit cards) they might be able to zero in on a handful of people who travel the route and who were in the right place at the right time to be responsible for the killings.

This example illustrates both the possibilities and the temptations of comprehensive databases.

If a database of telephone calls existed the temptation to use it to try to solve routine murders would be enormous. Most murder victims are killed by someone they know. A comprehensive telephone database would quickly identify the associates of a murder victim (or drug dealer). But as the database is accessed more and more often for more and more purposes the chance for abuse grows exponentially. The more people who have access, the harder it will be to assign responsibility for a misuse.

The USA Today talks about some of the results from the database being shared with the FBI and the DEA. The INS and the IRS are probably only steps behind. What form that sharing would take does not appear to have been discussed but one can see a situation arising pretty easily where other government agencies start forwarding special query "suggestions" or express requests to the NSA for processing.

Cellphone interceptions and credit card records have destroyed the careers of BC politicians in the past. American politicians are right to be worried about this NSA project: they are among the most likely victims of any invasion of privacy and misuse of the data.