In episode 15 of Canadian IP Voices, we were joined by Paul Gagnon and Misha Benjamin, experts on how to protect data and artificial intelligence (AI). In the episode, Paul and Misha discuss:
- how AI systems are built
- how data is used to train AI systems
- copyright concerns regarding the ownership of AI and associated data
Understanding AI systems
When we think of AI, we may picture a robot performing human tasks. In some cases, that's not so far from reality! However, AI is a lot more prevalent in our society than you probably think. For example, you may be surprised to learn that when your phone uses facial recognition, it's using AI to do so.
AI systems are able to accomplish complex tasks thanks to the data they're being fed. To recognize your face, your phone's software was "taught" to recognize and distinguish cues like the shape of your eye and its iris. But before it recognized you, it had to be taught how to determine what is a face, then to distinguish a nose, eyes, iris, mouth and so on.
AI systems and data
AI systems are software that are able to learn by analyzing large amounts of data. First, the data must be prepared and organized in a way where it can be used to deliver the intended outcome. It's during this data organization period (sometimes called the "cleanup") that labeling work is often required. For example, if the data consists of images of faces, a person would select suitable images, label this raw data with informative tags to point out the nose, eyes, iris, etc. As Misha explains, this is one of the key steps. It's often the most time-consuming part of the AI creating process.
Next, the best-suited algorithm must be selected. This important task requires the software creator to choose the best learning method for the AI system. Once the right algorithm is selected, the training process begins with the hope that the AI system can learn and become "intelligent". Once the system has learned to recognize a face, nose, eyes and so on, the data and labels are no longer needed and not part of the resulting AI system.
From a copyright perspective, the data in AI systems present some challenges.
Collection and use of data
As expressed by Misha, the collection and use of data is essential to the AI learning process. Without data, an AI system cannot learn and loses its value. As AI systems copy tremendous amounts of data, the current legislation remains unclear surrounding which data can be used.
Concepts like "data scraping" and using data that is publicly available are common to create training sets. Although data is not copyright protected per se, the data scaping process often involves the reproduction of large quantities of text, photos, compilations of data to extract relevant data and information. Some of this data may be copyright protected, and this extraction of data can be problematic since copyright law normally requires permission from copyright owners for the use of their protected material. However, getting such permission is a real challenge considering the overwhelming amount of data that is required in an AI learning process.
In Canada, the Copyright Act does not address the use or terminologies of data by an AI system.
As Paul Gagnon expresses, "if you were to do a plain reading of the Copyright Act, then you would see that all of this AI training process falls into a legislative grey zone. The use of data that's being made by AI systems doesn't fit neatly into any of the definitions contained in the Act. This means that from a legal standpoint, it currently is not clear what leg AI developers are standing on."
Applicability of the fair dealing exception
Section 29 of the Copyright Act provides an exception allowing for the use and reproduction of copyright-protected material without permission from the copyright owner 1) for certain purposes (i.e., research, private study, education, parody or satire, criticism and reporting of news event); and 2) under the condition that the dealing is fair. The analysis of fairness is fact-specific and depends on the balancing of a range of factors developed by the courts (including the purpose, character, amount and alternatives).
Many are still wondering whether AI's learning process could fall under this exception, notably under the research purpose. It's also important to note that even when used for an acceptable purpose, the use would also need to pass the second part of the test and be considered fair.
If the courts or Parliament clarify that the fair dealing exception was to apply to AI systems, the stakeholder views would surely be divided. On one hand, allowing AI systems to make free use of copyright-protected material could promote innovation and more training data available but, on the other hand, copyright owners could have concerns relating to the control over their creations. After all, their products/creations were achieved as a result of their hard work, and they would be interested in compensation for their uses.
As of today, the fair dealing exception in relation to AI remains untested in Canadian courts.
The importance of standardized language in license drafting
For Paul and Misha, the importance of a more adapted terminology when drafting licences on AI systems cannot be overstated. Because this is how owners of data for AI systems can control how the data is used.
As Paul explains, AI creators will often draft a license which allows for the "use" of the work without actually detailing that notion of use in relation to its data. Many of them will borrow license language from open source software. But, this is problematic because these licenses may imply that source code and also the data should be freely redistributed, for any purpose. This may not be what the data creators had intended. Therefore, as licenses are binding contracts, their language should take into account the many intricacies of how data can be used.
To offer a solution, Misha and Paul worked together with AI leading experts and co-authored a paper titled, Towards Standardization of Data Licenses: The Montreal Data License, which offers guidelines as to all of the different use cases and rights that you can grant or not with respect to data.
As Paul says, "the current terminology is really not giving the right level of control over what can actually be done because there are different interests in the copyrighted data. You should be able to modulate those permissions and give more specifics about what can and can't be done. That's the spirit behind the guidelines created by Misha and I."
AI-generated work: Uncertainty regarding copyright protection and ownership
Another challenge is ownership of AI-generated works. AI is taking on the art industry. AI systems are creating images, songs, poems, books, etc. In 2017, an AI called Bot Dylan wrote folk music after having learned and analyzed 23,000 Irish songs. Not only was he the best named bot, his music was quite catchy too. This surprised many who did not think that machines could create good music.
To find out more about intellectual property in Canada and about copyright issues surrounding AI systems and machine learning, listen to episode 15 of Canadian IP Voices with Paul Gagnon and Misha Benjamin.
2021 Government consultation
In 2021, Canada conducted a Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things. The consultation notably sought stakeholder views on whether to make changes to Canada's copyright framework to better support text and data mining, clarify authorship and ownership of works created by AI, and copyright liability regarding AI. The Government of Canada is currently reviewing the submissions provided by stakeholders and evaluating options for the next steps.