Copyleft CCAI License Considers AI Model Training Use

A group of researchers from Yale University proposed (PDF) a new type of open CCAI licenses (Contextual Copyleft), expanding the application of copyleft to generative AI models.

The idea behind CCAI is that using content under this license as data for machine learning results in copyleft conditions being applied to the resulting generative AI models. It is hoped that the new license could help reduce abuse in AI projects and prevent the emergence of bogus AI models that are formally touted as open source but, by hiding source data and training tools, are tied to the manufacturer.

CCAI determines that any distribution and publication of exact copies or modified derivative works distributed under the license CCAI, must be made under the same license terms without imposing additional restrictions. This requirement applies to any AI model, data set or AI system that was trained using CCAI-licensed software or the result of its work.

In the context of training generative AI models, CCAI requires disclosure of the model’s source code, a detailed description of the data involved in training, parameters, weights and model architecture.

The CCAI license can also be used as an additional requirement attached to existing copyleft licenses, such as AGPLv3. This requirement extends the license to training data sets, code and model weights in accordance with the Open Source Initiative (OSI) criteria for the openness of AI systems. Code distributed under such a license may be used to train an AI model only if all users are provided with a description of the training dataset, code for training the model, and the trained AI model.

Text of the attached additional requirement: “When using software to train, optimize, or create any machine learning model or generative AI system, any resulting model, dataset, or system must be published on terms consistent with the requirements of this license. Such requirement covers the provision of access to the code for conducting training, a description of the data used in training, the parameters and architecture of the model. Providing access to the model or the result of its work over

/Reports, release notes, official announcements.