This blog post is the last of three in a DFG blog series dealing with the significance of artificial intelligence for good research practice and open science:
- Part I: The Role of Artificial Intelligence in Research Practice
- Part II: The Relations Between Open Science and Artificial Intelligence
- Part III: Data Tracking and Artificial Intelligence
Part III: Data Tracking and Artificial Intelligence
How can the research landscape respond to the analysis of data that may have been tracked and collected without authorization?
Two different aspects come together here. The first relates to data tracking – that is, the recording, collection, and analysis of data on individuals’ behaviour when using digital services. In the research landscape, tracking takes place, for example, in the case of literature searches on publishers’ platforms – what search terms are used, what web pages are clicked on, who uses what publication and where? It ranges from legal forms of data tracking needed to legitimise and optimise services, through controversially discussed grey areas in the assessment of scientific quality or in predictive science, to analysis and sharing practices that violate data protection regulations and are questionable from a research ethics perspective. The boundaries are often not clearly distinguishable. The situation becomes critical when the collected usage data are linked to other personal data collected in “real life” outside the system – for example by integrated service providers or commercial analysis services. This can result in individual usage profiles that allow conclusions to be drawn about persons and their preferences. Without researchers even realizing it, they lose their digital anonymity. Even more seriously, such data can open up potential for misuse, which may result in personal discrimination or other social implications.
What makes the problem even worse is the fact that many users are not even aware that they are being tracked, what exactly is being tracked, and why. And users do not always give their explicit consent, which is required under data protection law. The digitalization of research has led not only to a shift of power in favour of the information platforms and publishers but also to a shift of data control and responsibility for the protection of researchers’ digital data – away from the previously data-protection-friendly library environments with high data protection standards that met scientific requirements and towards the current digital handling of personal usage data, which often takes place via commercial platforms and follows commercial privacy standards that must be ethically questioned.
This development impacts academic freedom, and the providers of academic information – be they commercial or non-commercial – urgently need to take action and fulfil their ethical responsibility by ensuring transparency, complying with mandatory statutory data protection requirements, and parsimoniously collecting only those data that are necessary for the information infrastructure’ services – ideally using privacy by design. Transparency creates trust, and the academic community needs to be able to trust that academic information platforms are reliable and ethically sound.
AI and Data Tracking
The second aspect concerns the analysis of these tracking data by AI systems. Used positively, these analyses can, for example, improve service offerings, enable the creation of desired personal search profiles, or be used as science analytics. Used negatively, they can lead to discrimination and abuse of power, predictive science, and hidden control mechanisms in the research system. If the personal data used for the AI systems have been illicitly tracked without the users’ consent, this is a clear violation of applicable data protection law and has legal consequences. In addition, all results of the AI processing are on shaky legal ground. Transparency also plays a major role here. Researchers must know and critically question what data are used for AI analyses, where the data come from, and how the analyses are conducted. Clear ethical guidelines are needed, especially when it comes to the use of personal data by AI. If these questions cannot be answered unequivocally, we will lose the basis of trust in AI systems and analysis results.
The DFG’s Position on Data Tracking in Publishing
The increasing practice among academic publishers and platform operators of recording and analysing the usage behaviour of researchers through tracking and of sharing these data with third parties threatens the anonymity of researchers and is contrary to academic freedom. The DFG has therefore engaged intensively in recent years with the challenge of data tracking in research and has formulated clear positions and recommendations for action (DFG, 2021).
The focus is on the call for ethical reflection on this topic that goes beyond the technical and legal issues associated with tracking practices. First and foremost, transparency must be created regarding the type and scope of tracking, users must be clearly informed about the use of their data, and provider-side data protection regulations must be consistently complied with. The latter – although clearly formulated in the legislation – has often been the exception to date. Furthermore, the DFG calls for the ethical handling of data. Only absolutely essential data should be collected (data parsimony), and data protection should be integrated into the system architecture according to privacy-by-design principles. Research institutions and libraries also have a responsibility. The DFG calls on them to critically examine their contracts with publishers and platforms and to ensure that data protection and ethical standards are upheld.
The DFG plays an active part in shaping this process. For example, it set up a group of experts who introduced data protection aspects into the national DEAL negotiations for the first time (Altschaffel et al., 2024). This was pioneering work in the sense that legal frameworks were systematically examined, and, for the first time, more comprehensive data protection requirements were tabled in the negotiations and contractually agreed with the major academic publishers. Even if the contractually agreed data protection provisions do not go far enough from the perspective of the academic community, they are nonetheless a good basis for discussion. In addition, legislative improvements must be made in order to legitimately secure more researcher-friendly data protection solutions in the long term.
Critical Thinking
In both the professional and the private spheres, it is worth questioning your own reading and search behaviour in the digital space and taking a critical look at the privacy policies of online providers. This is the only way to know who is tracking what and with whom your data are being shared. It is also the only way to put yourself in a position to act confidently and with self-determination, to use technological tools such as blockers as additional tracking protection, or even to consciously choose more data-protection-friendly alternatives to the commercial and/or tracking platforms. This applies equally to academic publishing. Researchers can freely choose their publication venue, and ideally they should not follow only established reputation mechanisms when doing so. Questions that you can ask yourself include not only the type of licence (e.g. an open licence such as the Creative Commons attribution licence, CC BY) but also whether you want to publish with purely commercial providers or those with non-transparent tracking, or whether you want to opt instead for a diamond open access publication organ.
Libraries and information institutions can do valuable educational work by training researchers and students in personal data protection aspects and the ethically correct use of AI, giving technical tips on tracking protection, and providing comprehensive information offerings, also in the area of responsible publishing. It is important to continue to sensitise researchers to this topic and to create awareness.
References
- Altschaffel, R., Beurskens, M., Dittmann, J., Horstmann, W., Kiltz, S., Lauer, G., Ludwig, J., Mittermaier, B., & Stump, K. (2024). Datentracking und DEAL – Zu den Verhandlungen 2022/2023 und den Folgen für die wissenschaftlichen Bibliotheken. RuZ – Recht und Zugang, 5(1), 23–40. https://doi.org/10.5771/2699-1284-2024-1-23
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation). (2021). Data tracking in research: Aggregation and use or sale of usage data by academic publishers. https://www.dfg.de/resource/blob/174924/d99b797724796bc1a1 37fe3d6858f3 26/datentracking-papier-en-data.pdf
Suggested citation
Bilic-Merdes, M., Brandt, S., & Lentze, M. (2025). Perspektiven der DFG auf KI und Open Access, Teil III. Datentracking und Künstliche Intelligenz. open-access.network. doi.org/10.64395/6kdx2-jsf90.
This article is licensed under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0).
