How not to report a research paper
Over on twitter, I received a tweet suggesting that "File-Sharing Tools Could Put Personal Health Data at Risk"- a reasonable statement of fact, possibly, although not exactly an earth-shattering one. File-sharing tools could put any kind of data "at risk" if by at risk you mean available for sharing with other people, since that's kind of the point of them. You might equally write "postage stamps could put personal health data at risk" because if you inadvertently put a postage stamp or two on a repeat prescription form you could, you know, post it to someone. By mistake.
But gullible health informatician that I am, I followed the link. I arrived at a story on a website called 'iHealthBeat' - where a severe shortage of space characters apparently doesn't prevent them from "reporting technology's impact on healthcare". They ran the story under the less straightforward headline "Study: File-Sharing Tools Could Put Personal Health Data at Risk" and they start their article by claiming that...
"Physicians who use file-sharing software on their computers could inadvertently put their patients' health and financial information at risk, according to a recent study published in the Journal of the American Medical Informatics Association, the Montreal Gazette reports"
First up, I love the third hand reporting. So the iHealthBeat reporting of technology's impact didn't require them to read the actual study, just the Montreal Gazette's reporting of the study. Maybe iHealthBeat need a new strapline - "reporting the mainstream media's reporting of technology's impact on health" perhaps?
Secondly note how the very first word in the story is "physicians". Right from the off, the journalists have turned a study based on the functioning of computers (in fact, on IP addresses, which might - when home networks are considered - represent more than one computer each) into a news story based on the behaviour of doctors.
The iHealthBeat 'news story' also references further reporting of the study in the pages of Healthcare IT News, where the story was presented as "Docs' file sharing risky business for patient data".
Likewise the Montreal Gazette present the study findings as "File-sharing programs might put doctors' patient records at risk: Study".
So all three of the news sources covering the study present the findings as being related to doctors accidentally making data on their patients available over peer to peer networking. But this is not, and could not be, a conclusion or even an inference of the study. It's purely media speculation. And these days, stoking up fear that people in authority are "misusing" personal data - either deliberately or through their negligence - is a popular line amongst journalists.
But the study itself - the full text is available through the BMJ website - makes much, much more limited claims and observations about the risks presented by peer to peer networking.
According to the researchers, they trawled peer to peer networks and analysed file content (and also searches conducted over those networks) for personal health information, and also personal financial information.
Most of the paper is an interesting and thoughtful consideration of whether it is even possible to conduct research such as this, since the act of monitoring such material is itself accessing personal information (including personal health information) for which the subjects have not given consent. the authors conclude, and I agree with them, that they have conducted the study with the best possible ethical parameters, and that the study was justifiable.
Where the paper discusses the extent of breaches of confidentiality, it makes clear that often it is the patient, not the doctor, who occasions the breach. They reference an earlier study which showed that approximately 10% of secondhand computers sold in Canada were found to contain identifiable health information, and go on to say that many Canadians "are selling or disposing of their computers unaware that they contain PHI".
It is unfortunate that the study (and even more so, the news coverage of the study) lump together personal financial information and personal health information. The prevalence of personal health information was approximately one quarter that of personal financial information amongst Canadian IP addresses identified, and one tenth that of personal financial information when IP addresses in the USA were considered.
The study conclusion was that "around 0.5% of IP addresses were disclosing PHI in the USA and Canada. This was significantly less than the amount of PFI that was being disclosed". Yet the media repeatedly presented the total figure as being representative of both PHI and PFI breaches. So the Montreal Gazette said, "Out of 23-24 million files, researchers found about two per cent, or tens of thousands, in Canada which contained private health and financial information and could be accessed using a simple search tool. In the U.S., that number was significantly greater at five per cent, in the hundreds of thousands, said El Emam." Nice touch, to quote the author of the study while actually misrepresenting the findings of his research. Note the 'and' in the first sentence, which dramatically alters the meaning of the findings.
To be scrupulously fair to the journalists, the study authors probably didn't make their lives easy by presenting two charts (one for PHI exposure and one for PFI exposure) side by side with different scales on the Y axis. For sure the lines look similar, so maybe you can be forgiven for assuming the findings are similar too. But, you know, actually reading the graphs rather than just looking at them would give it away.
Interestingly the study showed a much lower level of personally identifiable information (i.e. not specifically health or finance related) generally available through file sharing than some other studies (between 7 and 11 per cent, as against previous studies reporting 49% and 61%). The researchers suggest that the previous studies, which you can be sure whipped up a media storm all of their own, used targetted searches and thus excluded a large chunk of available files, and also didn't actually examine the files for personally identifiable information but used the filename to decide if it might contain any!
None of the nuance or emphasis of the original study paper is present in the media representation of the study. Even when the researchers gave quotes to the newspapers such as "I think it's important for the public to be aware of the risks of running those programs" (i.e. it is the public - not doctors per se - who are running the file sharing software) that still did not blunt the media's presentation of this story as being about reckless physicians rather than reckless patients.
Turning the scary percentages into raw numbers suggests that the researches looked at files being shared by approximately 1,600 computers, and found personal health information on maybe eight of them. Is that bad? Yes. But is it evidence that the healthcare community are wilfully exposing your prescriptions to the warez fraternity? No. Probably not.
Turning to minor issues of fact, Healthcare IT News told their readers that "Researchers used popular file sharing software such as Limewire, BitTorrent and Kazaa to gain access to documents they downloaded from a representative sample of IP addresses" - whereas the researchers made clear that they had explicitly chosen not to study the use or misuse of BitTorrent clients. Oddly, this didn't stop the Montreal Gazetter from choosing the makers of BitTorrent to be their 'industry spokesperson' for responding to the study.
Interestingly, the study authors agreed a 'special protocol' such that if there were any cases discovered of "disclosure of particularly sensitive personal information or personal health information for a large number of individuals, then they would be reported to the appropriate privacy commissioner". They do not comment on whether this protocol had to be activated at any time during the study.
Does any of this matter? As we grapple in the UK with the task of converting our health information systems from manual and paper records to electronic ones, the spectre of 'breaches of confidentiality' will only grow. Likewise, with the Government and Opposition both seemingly intent on demonising anyone who uses file-sharing software of any kind, such "Limewire ate my babies" stories are likely to become commonplace. Of course there are risks in using such technology, but the benefits of electronic health records in terms of improved healthcare and high quality translational research are potentially dramatic. If we allow the media to sensationalise and misrepresent the risks, there is a real chance that benefits will be lost and medical informatics will be set back by decades all because journalists either cannot or will not report the actual evidence being presented to them.