Tuesday, 21 February 2012
Saturday, 14 January 2012
Architectures for Pseudonymisation (overview)
In this blog post, I seek to analyse various topologies for delivering pseudonymisation technology in line with the accepted standards. But it is important to note that software architectures are numerous, and prolific, and this posting seeks to address the most recent or topical that are available in what is becoming a crowded marketplace.
Option 1: Data Warehouse, Behind the Firewall
This approach is the most undertaken, especially in the NHS, and the benefits are obvious. Data is housed behind an organisation firewall, and data is centralised to provide a single version of the truth in terms of patient identification. Data into and out of the organisation is strictly controlled and tracked by deploying the pseudonymisation ‘engine’ within a new or already established data warehouse. New sources of patient data are integrated into the central data warehouse, which is intended to function as a reporting repository, not necessarily as an ‘interfacing’ technology. The final section will talk about this in greater detail, as interfacing data warehouses to applications used within an organisation can fundamentally weaken the pseudonymisation engine.
Organisations select this particular strategy with the following in mind
· No dilution or ambiguity in terms of responsibility. The organisation processes its patients.
· Data Warehouses are the ideal way to control pseudonyms in a centralised manner.
· Connecting for Health, via the “Pseudonymisation Implementation Plan”, encouraged NHS trusts to implement pseudonymisation technology in a Data Warehousing environment.
· Key IG Toolkit requirements can only be deployed in a Data Warehousing environment.
· Ability to deploy ‘military grade’ encryption.
· Ability to control costs.
· Ability to centrally audit access to patient identifiable data.
Option 2: Deployed externally, data transfer
This approach is to take organisational data feeds into an externally hosted Data Centre, so as to create a ‘super repository’ of pseudonyms. The super repository has been attempted before, at national level (NHS) and this has benefits in that all pseudonyms are managed centrally, as a single system.
This approach is not as widely deployed, as option 1, as many organisations could view this with a degree of suspicion. This option involves data transfer, and therefore data interception (at worst) or technical issues (at best) could occur. Also, if this type of solution were to be deployed to a significant number of trusts, and interfacing were required to ‘wire in’ pseudo data into organisational applications, the sheer number of applications that the pseudo data would need to be technically compliant with, could create an issue in terms of strength of pseudonym technology used (ie. 16 bit encryption), which could result in a breach of privacy. Finally, ‘cloud’ solutions of this type are usually billed on a subscription basis, which is a new type of costing model compared to the traditional “license + support” model commonly used across many organisations, nto just the NHS>
Option 3: ‘Interfaced’ Solution, Behind the Firewall
This option contains many of the facets of 1 & 2, in that a centralised repository is required, but that repository must service multiple (modern and legacy) applications and services. The organisation benefits from having a centralised system, but as the nature of the system is to support
· Reporting
· Interfacing
The solution must meet the exacting interface standards of all applications mentioned, and all applications that would be procured in the future. Therefore, the algorithms used in the pseudonymisation engine must be capable of creating values which are consistent with all applications which will be subject to the protection afforded by the pseudonymisation solution. This, in the opinion of this paper, would inevitably result in the weakening of the pseudonymisation engine, as it has many obligations to fulfil, not just the reporting of patient level reports.
To conclude, multiple local government and advisory bodies have recommended pseudonymisation as the only and best way of preventing breaches of the data protection act. In the case of the NHS, this requirement has been passed by NHS Connecting for Health to local organisations, to be implemented as they see fit. This by default creates an issue in that an individual will possess multiple pseudonyms, and pseudonym types, when stored at different trusts in different parts of the country. This scenario is now the reality facing NHS organisations, and as such data management (incorporating pseudonymisation) across multiple organisations must now be placed into a “next best fit” model as so many versions of the truth exist.
Subscribe to:
Posts (Atom)