The Oka-Bi Pseudonymisation Solution was put through a battery of tests during 2011, and all aspects of the solution have been verified, from functionality testing, to end usability, to performance tests. What follows is a brief overview of the test environment, to benefit the wider NHS using feedback from this unique test scenario, with an enterprise data warehouse containing 3 Primary Care Trusts’ data being triplicated and loaded through the Oka-Bi Pseudonymisation Engine.
The test technical platform was as follows – a 64 bit Windows 2008 server with 8GB of RAM, dual core processor and 100 gigabytes of available disk space.
Data was prepared in a SQL Server 2000 loading database, and the Oka-Bi New Safe Haven and Pseudonymisation Engine was prepared in under 1 hour by use of the “Code Generator” application.
The data to be loaded consisted of the following, for each PCT:
A&E Historical Data – 7 years
Outpatient Data – 7 years
Inpatient Data – 5 years
Registered Population Data – 1 year
This data was then triplicated, with the aim of conducting an intensive test of the Oka-Bi Pseudonymisation Engine. So, overall, a simulated 9 PCT load was actioned by Oka-Bi pseudonymisation specialists. This can be most easily thought of as follows – the 4 datasets for each organisation consitute the data warehouse, and the data warehouse was triplicated, with a single set of pseudonyms being used for each PCTs (triplicated) data. This provided a great test of the accuracy of the multi pseudonym technology embedded in the Oka-Bi Pseudonymisation Engine, which resulted in 100% accuracy.
After loading had taken place (over a 2 day period), the data was then linked to the new Nottinghamshire Enterprise Data Warehouse, which is the delivery database of the data passing through the Oka-Bi solution. It is important to note that end users should not have direct access to the New Safe Haven, as this is an administrative database only, according to Connecting for Health guidelines. Some notable challenges were faced in this phase of development, as de-pseudonymised data needed to be available to end users via many different end user applications (access and sql server sessions to name but a few). These challenges were overcome, due to extensions to the toolkit, which meant that end user applications appeared in the precise context required by the de-pseudonymisation engine.
End user testing took place over a period of 2 months, with no major changes required to the engine. In terms of technical feedback, this was the richest part of the programme from the supplier perspective, as the final specifications were formalised based on ETL and end user query timings. Oka-Bi now possess an engine which can operate in a range of scenarios, and can scale as required in multiple architectural environments, with no compromise in terms of accuracy and performance.
The toolkit was significantly augmented during the last 6 months of test activity (some of which existed outside that of the load scenario above), so as to provide assurance to Oka-Bi that the toolkit can scale on demand to multiple scenarios. We were determined to make best use of the opportunity to test against such a huge dataset, and are looking forward to assisting new customers in such a new technical concept (ie. The delivery of full scale pseudonymisation solutions on the Microsoft SQL Server platform).
It is also very important to note that the audit trail facility was a significant feature of the toolkit, and passed all tests with 100% accuracy. Every individual access request to pseudonymised and non-pseudonymised data were successfully recorded in the Oka-Bi Audit Log, providing insight as to the usage behaviour of the tri-PCT community to senior management across the trusts.
To conclude, we now possess at Oka-Bi a unique perspective regarding the demands of such a rigorous multi facated requirement as a pseudonymisation solution. And, in the changing NHS, we are proud that we have developed a toolkit which operates efficiently in multiple contexts, with identical results, and provides tha ability to automate the most demanding parts of pseudonymisation software development, leading to dramatic reductions in development time.
No comments:
Post a Comment