Evaluating a Probabilistic Algorithm for Matching Patients’ Records in Health Information Exchange - SIIM News Fall 2009
Tyler McClintock, BS; Daisy Nie, BS; and Joseph Gitlin, DPH, FSIIM
Systematic reviews of medical literature suggest that implementation of health information exchange has the potential to improve both quality and efficiency of patient care.1-3 In recent years, these improvements have become particularly relevant to health care reform efforts in the United States as increased efficiency would likely yield substantial cost savings. Though a standardized Nationwide Health Information Network (NHIN) could yield a net value of $77.8 billion annually, consensus does not yet exist on the cost and structure of such a system.4,5 As a result, researchers have focused primarily on creating independent Regional Health Information Organizations (RHIOs).6 These networks allow trials of many different organizational and technological approaches to health information exchange, providing useful feedback to guide the planning of future networks. A nation of successful RHIOs could eventually be linked together as the infrastructure for a NHIN, thereby allowing the widespread exchange of medical data and images that many believe would allow health information exchange to reach its full potential in impacting quality, efficiency, and costs.7 This development strategy accommodates the future possibility of a NHIN while avoiding the technical, political, and economic obstacles involved with building such a network from the ground up.
Both deterministic and probabilistic identification methods have been proposed for matching patient records. Deterministic matching is dependent upon user-specified rules to distinguish “match” from “non-match” (e.g., defining that two records with the same Social Security numbers and birth dates constitutes a “match”), whereas the probabilistic approach is distinguished by algorithms that consider both the frequency and uniqueness of data, ultimately assigning a score that signifies the probability of a match.11 The theoretical basis for probabilistic matching lies in Fellegi and Sunter’s 1969 article on the theory for record linking, wherein they submitted the first formal probabilistic linkage rule for conditionally independent attributes in records.12 Research has suggested that probabilistic matching is most appropriate for large-scale systems of complex data where sensitivity is of utmost importance.13,14 Additionally, probabilistic matching has been shown to be particularly effective in instances of typographical errors and record duplication, both of which are common sources of error in medical data systems.15
The need for effective automated matching of patient records has become a priority as many health care facilities have implemented electronic patient record systems, and there is substantial interest at many levels of government, industry, and academe in developing viable health information exchange. This is especially pertinent to the Baltimore/Washington “patient catchment” area that includes health care facilities representing university medical centers, private hospitals, military and VA treatment organizations, as well as a wide variety of medical clinics and offices. In this area, the diversity of health care delivery programs poses a substantial challenge to establishing a RHIO in terms of quality of care, patient and provider acceptance, and economic viability. A prompt, accurate patient matching process is a fundamental step in the development of such a health information exchange. This evaluation of a probabilistic matching system developed by Initiate Systems, Inc. (Chicago, IL) is intended to facilitate the implementation of health information exchange in the Baltimore/Washington area.
The primary objectives of the study were to:
1. Evaluate a probabilistic algorithm to determine the number and characteristics of “matching” radiology patients seen at both Johns Hopkins Hospital and Bayview Medical Center during a three-year period.
For both medical centers in the study, we selected all examinations in the Hopkins Radiology Information System (RIS) from the three-year period July 1, 2003 - June 30, 2006. This amounted to approximately 1.2 million radiology examinations from the Johns Hopkins Hospital (JHH) and 250,000 from Bayview Medical Center. Each examination record contained certain parameters related to patient identification, including institution, history number, patient name, Social Security number, home address, telephone number, date of birth, gender, date of examination, and current procedural terminology (CPT) examination code.
The examination records were processed, and a unique master patient index (MPI) number was assigned to each individual patient identified by the probabilistic algorithm. In addition to the MPI, a “matching indicator code” was added by the algorithm to the study record as follows:
To test the accuracy of the matching algorithm, the study team drew a representative sample of (1 in 1,000) patient examination records from the RIS files of the two medical centers. The sample contained 240 patients with a total of 1,207 examinations. The patient identifiers on each sample record were compared with the corresponding parameters on the clinical record in the Hopkins EPR system, which included physician notes, medical image reports, and diagnoses related to each radiology examination.
The distribution of radiology patients by matching category (Figure 1.) shows that the total patient set of 236,323 is comprised of 5.0 percent Crossover, 30.9 percent Singleton, and 64.1 percent Same Source. This may be compared to the distribution of 1,418,043 examinations by matching category (Figure 2.), where 11.6 percent of the examinations are from Crossover patients, 5.2 percent are from Singleton patients, and 83.3 percent are from patients coded as Same Source. It should be noted that the Crossover patients that comprise 5.0 percent of the total patient set in Figure 1 account for 11.6 percent of total examinations in Figure 2. This is in contrast to the Singleton category, where patients account for 30.9 percent of the total patient set but only 5.2 percent of total examinations.
These differences affect the mean number of imaging examinations per patient within each matching category (Figure 3.). Across both medical centers, one can see that Singleton patients have only one examination, Same Source patients average 7.8 examinations, and the Crossover category has a mean of 13.9 examinations per patient. This finding illustrates the high number of encounters for each Crossover patient and stresses the importance of accurate identification of these patients across disparate medical databases.
The distribution of JHH’s imaging examinations by matching category was found to be 4.9 percent Singleton, 86.7 percent Same Source, and 8.4 percent Crossover. This is in contrast to Bayview’s results of 6.2 percent for Singleton, 67.8 percent for Same Source, and 26.0 percent for Crossover patient examinations. The difference in distribution of matching categories of examinations between JHH and Bayview was found to be significant (p < 0.001) based upon the chi-square test. Such a finding is particularly noteworthy because of the large proportion (26.0 percent) of patients at Bayview who also had examinations at JHH. This emphasizes the importance of record sharing and the need for accurate patient identification.
The comparison of the 1,207 sample patient records with the EPR clinical information indicated no differences in patient identification in terms of “false positives” or “false negatives” associated with the probabilistic algorithm used by Initiate to identify matching records in the study. Using principles of Bayesian inference, the result of our analysis indicates that we can be 95 percent confident that the “true error rate” is less than 0.8 percent.
Tyler McClintock is a medical student at New York University. Daisy Nie is a medical student at the University of Chicago. The study was conducted when both were research assistants in the Department of Radiology in the Johns Hopkins University School of Medicine, under the direction of Associate Professor Dr. Joseph Gitlin.
1) Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, Morton SC, Shekelle PG: Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care. Ann Intern Med 144:742-752, 2006
2) Shekelle PG, Morton SC, Keeler EB: Costs and benefits of health information technology. Evid Rep Technol Assess (Full Rep) 132:1–71, 2006
3) Bates DW, Gawande AA: Improving Safety with Information Technology. N Engl J Med 348(25):2526-2534, 2003
4) Walker J, Pan E, Johnston D, Alder-Milstein J, Bates DW, Middleton B: The value of health care information exchange and interoperability. Health Aff (Millwood) 24:w10-w18, 2005
5) Kaushal R, Blumenthal D, Poon EG, Jha AK, Franz C et al.: The Costs of a National Health Information Network. Ann Intern Med 143:165-173, 2005
6) Adler-Milstein J, McAfee AP, Bates DW, Jha AK: The State Of Regional Health Information Organizations: Current Activities and Financing. Health Aff (Millwood) 27:w60-w69, 2008
7) Adler-Milstein J, Bates DW, Jha AK: U.S. Regional Health Information Organizations: Progress And Challenges. Health Aff (Milwood) 28(2):483-492, 2009
8) Jha AK, DesRoches CM, Campbell EG, Donelan K, Rao SR, Ferris TG, Shields A, Rosenbaum S, Blumenthal D: Use of Electronic Health Records in U.S. Hospitals. N Engl J Med 360(16):1628-1638, 2009
9) Fernandes L, O’Connor M: Future of Patient Identification. J AHIMA 77(1):36-40, 2006
10) Brailer DJ: Interoperability: the key to the future health care system. Health Aff (Milwood) Suppl Web Exclusives:W5-19–W5-21, 2005
11) Grannis SJ, Overhage JM, Hui S, McDonald CJ: Analysis of a probabilistic record linkage technique without human review. AMIA Annu Symp Proc 2003:259-263
12) Fellegi IP, Sunter AB: A theory for record linkage. J Am Stat Assoc 64:1183-1210, 1969
13) Clark DE, Hahn DR: Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. Proc Annu Symp Comput Appl Med Care 1995:397-401
14) Gomatam S, Carter R, Ariet M, Mitchell G: Am empirical comparison of record linkage procedures. Stat Med 21:1485-1496, 2002
15) Schumacher S: Probabilistic versus deterministic data matching: making an accurate decision. DM Review, 2007