This contribution is part of series that deals at irregular intervals with the topics of artificial intelligence and open access/open research.
Note: This blog post was first published in German on 4 August 2021 in the blog of iRights.info. It is part of the Creative Commons (CC) Germany FAQs (https://de.creativecommons.net/faqs/), which were created by the members of CC's German Chapter. The lead author was Fabian Rack, a lawyer with iRights.Law
Research data and databases, too, can be made freely reusable with Creative Commons licenses. They can be used, for example, in the development of new technologies as training material for artificial intelligence or machine learning. In the fifth part of the CC FAQs, we present what needs to be taken into account when doing so.
iRights.info has been reporting and providing information on Creative Commons for years now. At irregular intervals, we present typical and frequently requested topics from the CC Germany FAQs and prepare them with various focuses.
The CC Germany FAQ page contains around 130 questions and answers in total. The present text deals with fundamental questions regarding the opening and use of data and databases by means of a Creative Commons license and the application to artificial intelligence systems.
Creative Commons (CC): Frequently asked questions (FAQs)
Around 130 CC-related FAQs and their answers have been available in German since mid-2021. Although the form and content of the CC Germany FAQs are modeled on the official U.S. CC FAQs, they address numerous particularities of German and European law.
The CC Germany FAQs are available free of charge here. For ease of reference, they are divided into five large thematic blocks:
1. About Creative Commons
2. General information about CC licenses
3. For licensors
4. For licensees
5. Databases, data, and AI
The CC Germany FAQs are themselves licensed under a Creative Commons license (CC BY 4.0). They were created by members of the German Chapter of Creative Commons. The lead author was Fabian Rack, a lawyer with iRights.Law and an author with iRights.info.
Not all research data are protected by copyright. In many disciplines, research data, and the databases to which they belong, are freely accessible anyway.
Anyone who wishes to enable the general public to freely use copyrighted (research) data and databases, can make them available under a CC0 license, thereby releasing them into the public domain by waiving all copyright and related rights worldwide, insofar as that is legally possible.
But who decides whether and when a database may be released? If a CC license is applied to a database, what does that license cover? And how can CC-licensed data be used to train artificial intelligence? As a supplement to Part 4 of the CC-FAQs, we have compiled details about databases and Creative Commons from the FAQs.
→ Question 5.1.7: If a CC license is applied to a database, what does that license cover? Does it also cover the respective data contained in, or the elements of, the database?
Whether the CC license applied to a database also covers the contents of that database depends on the way the licensing is implemented. Licensors may license databases as a whole – that is, both their structure and the elements they contain. However, it is also possible to license the database and the elements it contains separately, and therefore not uniformly.
Anyone who applies a CC license to a database without further specification, also licenses the individual elements of that database. Thus, in the absence of further specification, the individual elements of the CC-licensed database may also be used in accordance with the license terms and conditions (the elements of a database are, of course, covered by the terms and conditions of the license only insofar as these elements are protected by copyright or related rights in the first place – see here).
However, if licensors do not wish to license the database and the elements it contains (e.g. figures) in the same way, they must indicate this expressly. For this use case, there are also special licenses that make this explicitly clear: To license only the structure of a database but not the independent elements it contains, the Open Data Commons Attribution License (ODC-By) or the Open Data Commons Open Database License (ODbL) can be used.
→ Question 5.1.8: Who decides whether a database may be shared under a CC license?
As with all other protected subject matter, the rights holder decides whether a database may be shared under a CC license. Where a database falls under sui generis protection (see here), the rights holder is the database maker. In contrast to a (database) work, the rights holder is not always the person, or the group of persons, who created the database. Rather, the database maker is whoever made the investment in the procurement, checking, or presentation of the collection of database elements.
If you wish to publish the database under a CC license, and you are not the database maker yourself, you must therefore ensure that you obtain the necessary rights from the database maker.
When sharing a database, you may also have to take care of the rights regarding the individual elements contained in the database: If these elements are protected in their own right, and third parties have rights in them, these parties must also grant the necessary rights so that the elements contained in the database can be licensed (together with the structure). If you do not have the necessary rights, you must explicitly exclude such elements from the CC license under which the database is released. However, this should be avoided, if possible, as it restricts the reuse of the database (content).
→ Question 5.2.1: Can CC-licensed content be used in the development of new technologies as training material for artificial intelligence/machine learning?
Yes, the CC licenses are also designed for such uses. The uses allowed under the CC license terms and conditions are so broad that they are also open to new technologies. That is one of the great advantages of CC licenses. However, this use may be permitted even without a license.
If copyrighted content must be copied, adapted, or shared when inputting training material for AI applications, this is covered by the CC licenses – with the respective restrictions if commercial use (NC) or derivatives (ND) are prohibited.
However, it is quite possible that these forms of use are permitted by law. If this is the case, the CC license is no longer relevant. In Germany, for example, there is a limitation [on the rights of database makers] for the purposes of text and data mining, which permits copies and adaptations of copyrighted content for non-commercial purposes. In this case, users are not obliged to adhere to the license terms and conditions, because the restrictions contained in these terms and conditions do not apply in the case of uses permitted by law. Due to a reform of EU copyright law, further new lawfully permitted uses have recently (as of June 9, 2021) been created for mining, which also cover commercial environments.
→ Question 5.2.2: Besides copyright, what further rights of others do I have to consider in the case of training material for machine learning?
The training material that you use or the results that you generate may affect data protection rights and privacy and publicity rights or may conflict with ethical research standards. The CC licenses do not contain any provisions for these aspects, because the permissions for use that they grant are aimed solely at protection under copyright law.
The aforementioned four questions and answers are from the Creative Commons Germany FAQs (authors: Rack/Jaeger/Klimpel/Kreutzer/Weitzmann) and are licensed under a CC-BY-4.0 license. The selection of FAQs for this post was made by the editors of iRights.info (El-Auwad/Fischer).
Overview: CC FAQs on iRights.info
Do you have any questions or uncertainties about Creative Commons licenses? If so, the CC Germany FAQs can help! iRights.info offers a seven-part overview:
- Part 1: Why Creative Commons licenses are needed and how exactly they work
- Part 2: Correctly understanding and applying Creative Commons license modules – Example: Attribution (CC-BY)
- Part 3: Combining Creative Commons license modules correctly – Particularities of the NC (non-commercial) module
- Part 4: Databases and Creative Commons licenses: What must you pay attention to?
- Part 5: Data and Creative Commons licenses – Training material for artificial intelligence
- Part 6. Creative Commons: What do I do if license violations occur? How do I enforce my rights?
- Part 7: How do Creative Commons [licenses] relate to the public domain and open access?
Also interesting: The iRights.info dossier on Creative Commons with many helpful tips and texts.