As illustrated in the above Figure, one way to achieve this level of control is to make it possible for the participating institutions to independently administer their research platforms (platform instances) and connect them in a federated structure governed by a federated identity management arrangement. Under this model, participating institutions interact in a flat hierarchy where mutual trust between different administrative domains is avoided, they are in full control of their respective platform services and resources, and manage them according to their own preferences. In this context, resources consist of data sets, algorithms, compute infrastructures (CPU, storage), and meta-data which describe the resources and express relationships between them. Note that a platform instance should not make any assumption about the implementation details of other platform instances, nor the safety of their execution context. In short, it should never entrust them with sensitive information.
From a user perspective, each platform should give the ability to seamlessly search the collective meta-data for resources, and access them anywhere in the federation as long as the permission rules specified by the owning institution allow it.
Going back to our earlier example, after Bob authenticates in the platform instance of his institution he is provided with a global view of all the meta-data visible to him from all the platform instances combined, including data and algorithms made publically available by Alice. He can explore this view and discover data sets of interest or research projects related to his. Alice is provided with a similar view (with visibility set to her own permission rights) after authenticating to the platform of her institution. Alice can use her view of the research world as a one stop shop to execute an algorithm on selected input data sets. Under the hood, the platform instances will cooperate to orchestrate the execution with respect to their respective rules. For instance, due to the private nature of the data, Bob’s institution could impose additional restrictions on how external entities such as Alice can access their data, such as being accessible only by authorized applications running on behalf of authorized persons or labs (Alice), and only on servers administered by Bob’s institution. Under this scenario, Alice does not see the input data, she only sees the results of the execution of her algorithms, stripped of all personally identifiable information.
Using this model as a basis for the secure collaborative research, we can build business logic layers to extend it with additional useful capabilities. We can take advantage of the fact that all resource access operations must be authorized by the platform instances owning the resources to automatically preserve a trace of the user activities from which a lineage can be derived, such as the algorithms execution and input data that were used to create an output data set, expressed in a form similar to the one illustrated in the figure.
Patients and hospital can use this lineage for attribution purposes, to find out who is using their data, how, and for what purposes.
Conversely, doctors and researchers, can verify the origin of the data and trace it to its sources through all the intermediate transformation steps (as long as the permissions allow it).