The information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages and privacy requirements.
Submissions added after the initial posting on June 19, 2024:
E
Entertainment Software Association of Canada
Technical Evidence
Q – How does your organization access and collect copyright-protected content, and encode it in training datasets?
A – Video game companies can be sources for generative AI training input, creators of generative AI output, developers of generative AI models and users of third-party generative AI software tools. To this end, some companies use image, text, and code generator technological tools (including proprietary, licensed third-party and open-source software), to generate output such as content generation, ideation, concept testing and development, generating mock virtual worlds or generating computer-controlled character dialogue.
Q – In your area of knowledge or organization, what is the involvement of humans in the development of AI systems?
A – Humans are always involved in the development of AI systems related to the creation of games.There are myriad uses of AI in video games that would evidence a level of creativity that meets the current legal standard for copyrightability, i.e., of a human who exercises skill and judgment in creating the work. For example, some game companies use image, text, and code generator technological tools (including proprietary, licensed third-party and open-source software) to generate output, such as content generation, ideation, concept testing and development, generating mock virtual worlds or generating computer-controlled character dialogue.
Copyright eligibility for machine-assisted output with the requisite human creative contribution is the right policy result. It incentivizes creativity and innovation which, in turn, spurs the creation of new types of work. We recommend the government avoid rigid rules for determining human authorship and should instead look at whether the human's use of AI is done with sufficient creative control and in such a way that the output reflects the product of human creativity. The focus should be on both the quantitative and qualitative aspects of human contribution.
Q – How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work, or organization?
A – AI technology has been deployed in games for over two decades as useful tools for a variety of purposes, such as background and terrain generation, processing or analysis of data within the game, or quality control. Although certain uses of generative AI have launched questions of copyrightability and authorship, it is important to remember that AI technology is and should be treated as any other software tool with respect to copyright protection.
Generative AI technology gives video game companies the opportunity to elevate the game experience for players and be responsive to their expectations while supporting the programmers, artists, writers, musicians, and others that are integral to game development in ways that allow those creatives to focus less on tedious tasks and more on meaningful projects that will ultimately enrich the gameplay experience.
Video game companies have long utilized AI within games and consider generative AI to be instrumental for developing and operating the next generation of video games. AI is used to improve content creation, art generation, animation, sound and music, natural language processing (for example, natural speech and responses from non-player characters within the game), as well as automating repetitive and tedious tasks on the developer side.
Some ESAC members use generative AI tools to accomplish all the tasks listed above in a
much more efficient manner. Current generative AI tools have the potential to improve workflow and reduce game development costs. If game development becomes easier companies will seek to do more, be more innovative and productive.
Generative AI tools also allow artists to spend more time on the truly creative aspects of making in-game artwork, while freeing up time from the more tedious, less creative tasks, such as, fleshing out backgrounds once the artistic direction has been set. Scripting an open world game usually runs more than 100,000 lines of dialogue, and is often supported in at least 5 languages. One example of the use of generative AI in games would be a scriptwriting tool that frees writers to spend their valuable time on the core game plot and narrative rather than on "barks", a term for non-player character (NPC) dialogue, which are often short and tedious though intelligent barks can nonetheless be a central feature of player immersion in video games because the more responsive they are to players, the better and more realistic the gameplay experience.
Text and Data Mining
Q – What would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?
A – The current legal framework has been designed to adapt to new technologies and has been able to work through unique questions that come up as a result of new technologies. In light of this, the existing legal framework should have a chance to work.
Q – If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would be the expected impact of such an exception on your industry and activities?
A – The current legal framework has been designed to adapt to new technologies and has been able to work through unique questions that come up as a result of new technologies. In light of this, the existing legal framework should have a chance to work.
Q – Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?
A – Consistent with our position that the government should encourage a robust marketplace for emerging technologies, such as generative AI, we believe that the mandated disclosure of the use of copyrighted works used in machine learning needs careful consideration and balancing of priorities, especially when the AI developer owns or licenses the works at issue or the resulting output, or when mandated disclosure could jeopardize confidential information, trade secrets or other protected data.
Mandated disclosure of the use of copyrighted works used in machine learning needs careful consideration and the balancing of objectives of such a requirement, which will differ in different contexts, such as closed AI models versus AI models used by the general public. If the purpose is to enable the ability of copyright owners to enforce their exclusive rights, then there would appear to be little justification for the imposition of a transparency and disclosure requirement on a developer of a non-public-facing AI system that is trained on the developer's own works, internal, licensed or legally accessed data, or on an implementer of a foundation model that fine tunes the model on its own works. In these types of situations, transparency and disclosure mandates should not apply.
Transparency and record-keeping mandates with respect to generative AI models also
raise questions of feasibility. Any requirements should be narrowly tailored to the particular purpose. Training materials for foundational models may constitute millions, or even billions, of data entries, the maintenance of which becomes onerous for developers. And to the extent that a developer or publisher is also a user of open-source or other third-party software, an attenuated chain of responsibility becomes burdensome and does not substantially advance the goals that spurred the demand for such mandates in the first place. We would therefore recommend that any such mandates must consider both feasibility and relevance to the objective they seek to achieve.
Authorship and Ownership of Works Generated by AI
Q – Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?
A – No legislative change is needed. We think that a work containing AI assisted content should be eligible for copyright protection, and the current legislation permits this. Copyright laws and regulations are already drafted to address the advent of new technologies. ISED should refrain from taking categorical and formalistic approaches to the eligibility of AI assisted content for copyright protection, especially when the technology is still evolving. We also ask that government be wary of calls to mandate either the marking of expressive works or new disclosure requirements that could result in the disclosure of confidential information and instead allow different creative industries to take the approach that works best for their stakeholders.
We believe that if there is sufficient human contribution to either the input or the output, consistent with existing legal principles, the resulting work should be eligible for copyright protection.
We recommend the government avoid rigid rules for determining human authorship and should instead look at whether the human's use of AI is done with sufficient creative control and in such a way that the output reflects the product of human creativity. There are myriad uses of AI in video games that would evidence a level of creativity that meets the current legal standard. The focus should be on both the quantitative and qualitative aspects of human contribution.
Q – Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?
A – No. The UK uniquely has a provision for protecting computer generated works where there is no human author, this approach is not recommended, or needed. We recommend the government avoid rigid rules for determining human authorship and should instead look at whether the human's use of AI is done with sufficient creative control and in such a way that the output reflects the product of human creativity. There are myriad uses of AI in video games that would evidence a level of creativity that meets the current legal standard.
We believe that if there is sufficient human contribution to either the input or the output, consistent with existing legal principles, the resulting work should be eligible for copyright protection.
Infringement and Liability regarding AI
Q – Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)?
A – No. The current law is sufficient and we do not think new legislation on copyright liability is needed (at this time). Current principles of copyright liability can and do apply to new technologies and new technological uses. Canadian copyright law remains an adequate framework within which to analyze legal questions involving generative AI, such as authorship, ownership, and liability for infringement, given the current state of the technology.
At this point, there is no reason to believe that existing statutory and common law doctrines based on fact-intensive inquiry are insufficient to address complex questions of access to content/training data, protection and ownership of the resulting output and use.
Q – Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?
A – No. The current law is sufficient and we do not think new legislation on copyright liability is needed (at this time). Current principles of copyright liability can and do apply to new technologies and new technological uses. Canadian copyright law remains an adequate framework within which to analyze legal questions involving generative AI, such as authorship, ownership, and liability for infringement, given the current state of the technology.
At this point, there is no reason to believe that existing statutory and common law doctrines based on fact-intensive inquiry are insufficient to address complex questions of access to content/training data, protection and ownership of the resulting output and use.
Comments and Suggestions
N/A
F
Fédération professionnelle des journalistes du Québec
Preuve de nature technique
N/A
Fouille de textes et de données
N/A
Titularité et propriété des œuvres produites par l'IA
N/A
Violation et responsabilité en matière d'IA
N/A
Commentaires et suggestions
Le développement des outils numériques transforme le quotidien de nombreuses industries, en particulier celle des médias.
L'appropriation du marché publicitaire par les géants du numérique bouscule les modèles d'affaires de bien des médias depuis un peu plus d'une décennie.
Les audiences tenues lors du projet de loi C-18 ont permis notamment d'illustrer les efforts de transformation et d'adaptation des médias ainsi que l'urgence d'une contribution financière équitable de la part des géants du numérique auprès des médias.
L'un de ces géants du numérique, Google, a convenu d'une entente avec Ottawa à hauteur de 100 millions annuellement. Un pas dans la bonne direction, estime la FPJQ, mais qui demeure incomplet puisque Mera refuse toujours de contribuer et de partager à nouveau des contenus médias sur ses plateformes.
Meta, et particulièrement sa plateforme Facebook, constitue pour un grand nombre de Québécois et de Canadiens la porte d'entrée principale vers des contenus journalistiques. Ce blocus, en plus de priver les médias de revenus potentiels, prive les citoyens d'informations névralgiques.
Entretemps, des médias cessent leurs activités, les pertes d'emploi se comptent désormais par milliers et plusieurs régions sont à risque de devenir des déserts médiatiques. Une hémorragie qui s'étire depuis plus de dix ans et qui nuit à la démocratie canadienne.
C'est dans ce contexte que les systèmes d'intelligence artificielle (IA) générative grand public ont fait leur entrée sur le marché en 2022.
Comme le souligne le document de Consultation sur le droit d'auteur à l'ère de l'intelligence artificielle générative :
- Les systèmes d'IA générative fonctionnent au moyen de modèles entraînés à partir de vastes ensembles de données, qu'il s'agisse de textes, d'images ou d'autres types de données. Ces modèles, grâce à diverses techniques d'apprentissage automatique, permettent de construire une représentation des tendances identifiées dans les données d'entraînement.
Une des sources de données importantes des pionniers des systèmes d'intelligence artificielle (IA) générative demeurent les contenus produits par les médias. Des médias déjà éprouvés par une crise financière.
Lors du dernier congrès annuel de la FPJQ, un atelier complet s'est penché sur les enjeux éthiques et de la propriété intellectuelle liée à l'intelligence artificielle (IA) générative.
À la FPJQ, il nous semble essentiel que les travaux de révision de la Loi sur le droit d'auteur incluent un mécanisme de rémunération pour l'utilisation des données des médias dans un contexte canadien, pas seulement américain.
Déjà aux États-Unis, lors d'une audition récente au Sénat sur l'impact de l'IA sur le journalisme, les législateurs ont convenu qu'OpenAI et d'autres devraient rémunérer les médias pour l'utilisation de leur travail dans des projets d'IA.
Le NY Times a lancé au mois de décembre des poursuites, auprès d'un tribunal fédéral à New York, à l'encontre d'OpenAI, créateur du logiciel ChatGPT, ainsi que de Microsoft, son principal investisseur, pour violation du droit d'auteur.
Par ailleurs, la FPJQ estime que la révision de la Loi sur le droit d'auteur doit également assurer une traçabilité des données d'entraînement des systèmes d'intelligence artificielle (IA) générative. On doit savoir d'où proviennent les données pour bâtir les systèmes et les modèles sur lesquels ils s'appuient. Cette traçabilité permettrait aussi d'apprécier plus adéquatement la qualité des contenus produits par IA générative.
Il en va de la viabilité des entreprises de nouvelles et de la démocratie.
La Fédération professionnelle des journalistes du Québec est un organisme sans but lucratif qui rassemble environ 1 600 journalistes dans plus de 250 médias écrits et électroniques. C'est ce qui en fait la principale et la plus représentative organisation journalistique au Canada.
FICPI Canada
Technical Evidence
FICPI appreciates that the Government has solicited diverse views in the present Consultation and, as such, wishes not to express views on aspects of the questionnaire that do not relate specifically to those in the IP profession or relating to IP policy in general. In particular, FICPI wishes not to provide responses to "Technical Evidence" questions.
Text and Data Mining
The first category of the Consultation relates to text and data mining (TDM) and the training of machine learning models using, inter alia, copyright-protected content.
The Government has indicated an interest in maintaining balance between innovation and incentives to creativity and asks whether amendments should be introduced in the Act to clarify how the copyright framework applies to TDM activities and, if so, what those amendments should be.
The Consultation first queries "what would more clarity around copyright and TDM in Canada mean for the AI industry and the creative industry?". FICPI appreciates that increased clarity and thus certainty would likely be favoured by industry. However, FICPI stresses that the Copyright Act should not be amended to address all specific circumstances or technologies, and must maintain flexibility to permit the courts to address future technological development. As recent history has proven, technological progress is increasingly rapid and any attempts to address specific technological innovation in the Copyright Act would inevitably lead to the need to constantly amend the Act to again catch up with technology.
Conversely, FICPI is of the view that the Copyright Act already addresses copyright concerns relating to generative AI, such that legislative amendments to the Act are not yet required.
The Consultation suggests that two existing exceptions to copyright infringement could potentially apply in the context of TDM activities: 1) the fair dealing exception for research in section 29; and, 2) the exception for temporary reproductions for technological processes in section 30.71.
The Consultation notes that certain respondents in the earlier 2021 consultation favoured expansion of the fair dealing provision. FICPI, however, stresses that any further amendments to the fair dealing provisions should be considered only insofar as they would increase harmonization with international norms (i.e., to bring fair dealing closer to other jurisdictions' fair use provisions) generally, and not in response to specific technological innovations.
In other words, FICPI endorses the view of several respondents to the 2021 consultation who noted that the Copyright Act already fairly balances interests between creators and users and contains broad provisions that enable courts to make reasonable decisions to maintain that balance. If TDM activities are being undertaken for purposes relating to research, for example, those TDM activities would be non-infringing if deemed to fall under section 29 of the Act.
Similarly, FICPI encourages the Government to allow courts to determine if particular TDM activities would be permitted under Section 30.71. However, FICPI would oppose statutory amendment of Section 30.71 for the mere purpose of addressing TDM.
FICPI notes, however, that Section 30.71(c) recites the requirement that "the reproduction exists only for the duration of the technological process"; however in certain cases where generative AI systems are continuously trained, there may be an argument that the duration of the technological process is indefinite.
The Consultation has asked several other questions better directed to the AI industry and creators, rather than IP practitioners. Amongst these include questions of available licenses, record keeping requirements and remuneration. FICPI wishes not to provide direct responses to these issues but urges the Government to recognize that developers of generative AI tools may be foreign entities using Canadian-owned copyrighted content; could be Canadian entities using foreign-owned content, or a blend of these. Therefore, to the extent possible, the Government must take a position that does not disadvantage Canadians and, preferably, seeks to adopt an approach that is in harmony with its international counterparts.
Authorship and Ownership of Works Generated by AI
The second category relates to issues of granting copyright to AI-generated content, and furthermore to questions of authorship and ownership of such content.
As with its suggested approach in respect of TDM, FICPI is of the view that the Copyright Act already addresses authorship and ownership of content created wholly or in part by generative AI.
As the consultation notes, "Canadian copyright jurisprudence suggests that 'authorship' must be attributed to a natural person who exercises skill and judgment in creating the work..."
The threshold applied by courts in determining whether "skill and judgement" are required in the creation of a particular work remain applicable and sufficient in the consideration of generative AI. As the consultation notes, in a great deal of current generative AI systems, "[t]he creation of a work or other subject matter by AI may involve some degree of human input, either by programmers or users instructing an AI application to perform its task." These inputs are colloquially referred to as "prompts" to the AI application.
The current Act and existing jurisprudence therefore already empower courts to consider whether those who prompt a system could be considered authors. In particular, courts are already empowered to evaluate whether a specific prompt, or set of prompts, required the human to exercise sufficient skill and judgement in the creation of the work.
To the extent that a work was wholly created by generative AI without any prompt at all, it remains possible for a party to argue that a human was involved in the creation of the AI application itself. Again, FICPI believes it should be left to the courts to determine whether such humans exercised skill and judgement attributable to creation of the work. This evaluation can already be conducted under the existing provisions of the Act.
Furthermore, if the Government were to enact a legislative bar to copyright protection for generated works, this will inevitably cause a lack of clarity in the marketplace. Generative AI systems are already sufficiently sophisticated to create works that appear to be human-created, and this level of sophistication will only increase. Therefore, users will be in an unfortunate position to not know which works are protected by copyright and which are not. Furthermore, there will inevitably be infringers to copyrighted works that deny infringement on the argument that a work could have been generated by AI, which would place an undue burden on creators to prove a work was not created by AI.
FICPI therefore urges the government to maintain the status quo and allow courts to address these issues as cases arise.
Infringement and Liability regarding AI
The third category relates "to the use and commercialization of AI systems and the liability for any infringement that occurs. This includes questions regarding the determination of the persons liable when AI-generated outputs infringe copyright-protected works."
The Consultation notes that "[g]iven the novelty of AI technologies, Canadian courts have not yet rendered decisions regarding liability for infringement that may result from the use of AI, either through the inputs used to train an AI or through the outputs generated by an AI system in the form of works."
The Consultation also notes that the Government has received very little stakeholder feedback on this discussion topic.
As mentioned earlier, FICPI strongly urges the Government to permit issues to actually arise, and to allow courts to render decisions under the existing provisions of the Copyright Act before enacting potentially unnecessary amendments thereto.
With respect to the specific proposals discussed in the Consultation, FICPI suggests that any additional burden placed (e.g., record keeping) that is outside of international norms would likely decrease Canadian competitiveness.
Comments and Suggestions
This response to the Consultation paper: Consultation on Copyright in the Age of Generative Artificial Intelligence (the "Consultation"), is being submitted on behalf of FICPI Canada.
As you are aware, our organization, FICPI (the Fédération Internationale des Conseils en Propriété Intellectuelle), comprises more than 5000 intellectual property attorneys in private practice in 86 countries. FICPI Canada is a self-governing national association of FICPI and represents the interests of Canadian intellectual property professionals. Our membership includes senior professionals at most major Canadian intellectual property firms. Our clients span all types and sizes of businesses, including multi-national corporations, small and medium size enterprises, and individuals.
The Consultation
Thank you for taking the time to consult with stakeholders in respect of Canada's copyright framework and its applicability to generative AI technology.
The Consultation was distributed via the Innovation, Science & Economic Development (ISED) website. The website also explains that "The Government continues working toward amending the Copyright Act for the benefit of all Canadians."
The Consultation considers three primary topics: Section 2.1 relates to text and data mining (TDM); Section 2.2 relates to authorship and ownership of works generated by AI; and Section 2.3 relates to infringement and liability regarding AI.
The Consultation notes that "[the Government of] Canada remains mindful of approaches taken by its international partners that could serve the needs of a functioning copyright marketplace" and that "the Government will aim to balance two main objectives: a) To support innovation and investment in AI and other digital and emerging technologies in all sectors in Canada… b) To support Canada's creative industries and preserve the incentive to create and invest provided by the rights set out in the Canadian Copyright Act (the Act), including to be adequately remunerated for the use of their works or other copyright subject matter."
FICPI endorses the Government's objective to balance technological innovation and investment with the rights of creators.
FICPI believes that it is premature to consider amendments to the Copyright Act merely to address concerns relating to generative AI. FICPI is of the view that the Copyright Act already addresses copyright concerns relating to generative AI, and that it already empowers courts to balance competing interests to determine how to address specific issues arising in the use of generative AI. FICPI urges the Government to allow the courts to exercise their role respecting statutory interpretation before pre-emptively circumscribing specific generative AI exceptions within the Act.
We would welcome the opportunity to consult further on additional consultations. Additionally, while FICPI opposes statutory amendments at this time, it welcomes consultation on any proposed amendments. We thank you again for the opportunity to participate in the consultation process and for considering these remarks.
Please do not hesitate to contact us should you have any questions.
G
Michael Geist
Technical Evidence
N/A
Text and Data Mining
The inclusion of an explicit exception for text and data mining (sometimes referred to as informational analysis) within the Copyright Act’s fair dealing provisions is long overdue. The adoption of a specific text and data mining exception is consistent with the 2019 Copyright Act review, which extensively examined the issue and recommended:
The evidence persuaded the Committee that facilitating the informational analysis of lawfully acquired copyrighted content could help Canada’s promising future in artificial intelligence become reality. The Committee therefore recommends:
Recommendation 23
That the Government of Canada introduce legislation to amend the Copyright Act to facilitate the use of a work or other subject-matter for the purpose of informational analysis.
The federal government has invested millions to support research and commercialization of AI in the hopes of making it a world leader. However, the current state of Canadian copyright law undermines this investment by inhibiting innovation through the creation of legal uncertainty and high barriers faced by the very groups the investment aimed to attract.
AI research works by making machines smart. Whether this is focused on automated translation, big data analytics, or new search capabilities, it is dependent on data being fed into the system. Machines learn by scanning, reading, listening, or viewing human created works. The better the inputs, the better the outputs and the more inputs there are, the likelihood that results are biased or inaccurate decreases.
Canadian copyright law inhibits this because restrictive rules limit the data sets that can be used for machine learning purposes, resulting in fewer pictures to scan, videos to watch, or text to analyze. Without a clear rule to permit machine learning, the Canadian legal framework trails behind other countries that have reduced risks associated with using data sets in AI activities in a manner that fairly treats both innovators and creators. Under the Canadian system, researchers either must risk copyright infringement by using protected works to make their machines smarter (which has a chilling effect on innovation), or severely limit the data sets used, thereby producing less “smart” machines than would be possible under a more open copyright regime. This raises concerns of bias and discrimination.
Within the current framework, the fair dealing rules provide some protections and allow for some use of copyrighted work by AI companies without permission. Canadian courts have ruled that it is a right that should be interpreted in a broad and liberal manner and there are several purposes that would permit some text and data mining activities – notably exceptions for research, education, and private study.
The corporations and high-profile talent attracted by the investment in the Canadian AI system have been calling for such an exception for many years. In 2018, various technology groups noted that the current uncertainties in the Copyright Act limit the ability for Canadian companies to “access a basic necessary resource to train their algorithms”. Indeed, as of the time of this writing, of the 121 companies on the Government of Canada’s approved AI vendor list, 87 are Canadian, with almost all other vendors coming from competing nations with TDM exemptions. Canada is one of the top countries in the world for AI research talent, with rates of growth currently exceeding that of the United States, the UK, Germany, France, and Italy. Unfortunately, we lag behind in AI commercialization. A clearly articulated copyright framework that allows TDM for commercial use is an important step toward changing that tide.
Internationally, other countries have addressed this issue through text and data mining exceptions. In the above-mentioned discussion, Microsoft noted that there was a broad acceptance of text and data mining exceptions around the world and that Canada is posed to fall behind and be at a global disadvantage without one. They also noted that, unsurprisingly, the countries that have/are considering these exceptions are at the forefront of research involving data analytics and artificial intelligence.
Countries such as the United States and Israel have elected to open up their fair use provision to exempt text and data mining activities, with Israel’s Attorney General issuing guidance to inform the exception. Others have created specific exceptions. Examples of such exceptions can be found in Japan, the European Union, the United Kingdom, and Singapore.
The EU enacted a mandatory exception for text and data mining for the purposes of scientific research but has permitted rightsholders to “contract out”. The U.K.’s exception allows copies of works to be made without permission of the copyright owner for the purposes of automated analytical techniques to analyze text and data for patterns, trends, and other information. The law does not allow contracts to restrict data mining activities, but the exception is limited to non-commercial research. The approach adopted by Singapore specifically exempts text and data mining in their Copyright Act by exempting copies of works wherein the “copy is made for the purpose of computational data analysis; or preparing the work or recording for computational data analysis.”
In order to not fall behind internationally and position ourselves as a world-leader, Canada needs to adopt a broad exception for text and data mining. Ultimately, machine learning does not harm the primary purposes of the original work – the goal is not to republish or compete with copyrighted materials, but to ensure that researchers and AI companies can mine the text and data for informational analysis purposes – thus including commercial uses in the exception will not harm rights holders and will facilitate Canadian innovation.
Recognizing that there are concerns about the scope of a text and data mining exception, the government should consider including transparency requirements alongside the exception. There is a need for all stakeholders – copyright owners, users, and the broader Canadian public – to have easy access to disclosures about what content has been used in training AI systems. A mandatory transparency requirement would be akin to the attribution requirements in some fair dealing exceptions. By providing attribution in the form of transparent disclosures, the text and data mining exception would enable machine learning while also providing necessary safeguards for creators to better monitor and respond to the permitted use of their works within this exception.
Authorship and Ownership of Works Generated by AI
There is no significant uncertainty at the moment and no need for legislative reform.
Infringement and Liability regarding AI
Inclusion of Copyright Materials in LLMs
The inclusion of copyright materials in LLMs has emerged as a major source of concern for some rights holders, who argue that their rights are being infringed upon by virtue of the inclusion of their works without permission. I believe it is premature to introduce legislative reforms on the use of copyright works within LLMs. Indeed, notwithstanding the calls for immediate legislative reform, I believe that there are better approaches that balance the copyright concerns with the policy goals of developing beneficial generative AI systems that may support a wide range of activities including education, health care, and commerce. As the UK Minister for AI and intellectual property recently noted, there is no rush to regulate the AI field.
First, there are a myriad of cases currently before the courts worldwide. These cases are likely to provide a first analysis of many of the copyright-related concerns with the inclusion of copyright materials within LLMs. For example, in Andersen v Stability AI Ltd, there are four different claims. They are infringement of copyright, the removal of copyright management information under the DMCA, publicity claims where the defendant’s knowingly used the plaintiff’s names in their products (by allowing a user to request art in their specific style) and style by allowing a request in their artistic identities, unfair competition claims under the Lanham Act for the use of their art for commercial gain without permission or proper attribution. In Authors Guild v OpenAI Inc, the claims focus on the use of published works to train LLMs. This is done by reproducing the works and that this act is central to the quality of the OpenAI product.
There are many other cases that will canvass these claims. Generative AI companies will likely point to uses that do not infringe copyright, the inclusion of materials not subject to copyright protection, and the temporary nature of the reproductions, largely for statistical and analytical purposes. The AI companies typically do not reproduce actual full text of the underlying materials found in the LLMs.
Given the legal uncertainties, it would be premature to intervene with legislative reform at this time. Rather, the government should maintain a watching brief on the litigation to see how these cases unfold and whether reforms may be required. Intervention with legislative reform runs the risk of altering or undermining both creator and user rights as the technology continues to evolve, market-based solutions emerge, and courts address the application of LLMs to current copyright laws. Rather than leaping into reforms that may have negative effects and entrench the power of a handful of AI and tech companies, it is preferable to better understand how the law has been applied to LLMs and generative AI tools and then identify potential gaps or reform solutions.
Second, while allowing the litigation process to unfold, the government could encourage several private sector developments to the benefit of all stakeholders. These include greater transparency of which materials included within LLMs, akin to an attribution requirement for some fair dealing purposes. It could also include work toward an AI version of the robot.txt standard for data scraping. While the robot.txt standard has worked well for decades within the context of Internet search, there are other considerations in generative AI. Unlike search, generative AI tools may not direct the end user to the original source material, suggesting the mutual benefit in search may not be replicated in AI. As the legal process unfolds, a new standard specific to generative AI and LLMs is needed to allow rights holders to opt out of the inclusion of their works within LLMs.
Third, the government moved quickly last year to develop Generative AI guidelines that address issues related to transparency, security, and fair practices. While there were some concerns expressed with how the guidelines were drafted, they provide a useful starting point for governing activities in the sector. Since the guidelines are only effective if implemented, the government should actively ensure that they are respected by AI companies and work to identify whether further provisions or amendments are needed.
Outputs of Generative AI Systems
Similar to the analysis on the inputs into generative AI systems, the outputs are also the subject of litigation. While there are some cases involving questions regarding the copyrightability of machine-generated works, the more notable issue at the moment is whether works created by a generative AI system may infringe copyright if they appear to replicate an original work that may have been included within an LLM.
However, notwithstanding some fears expressed in the media, replication is likely rare given the vast amounts of training data that are used to train an AI system. For example, a study on GitHub’s Copilot found that reproduction of code took place only 0.1% of times. A study on replication in the LAION Aesthetics dataset, which includes 12 million images, found that 1.88% of random outputs had a high similarity score with the training material, which was considered a high incidence rate among current studies. The study noted that the reason for the high similarity results, was due to the prevalence of popular images in the dataset. The study also used a small sample set, which only included 0.6% of LAION’s training data.
In sum, it is fair to say that there is a small degree of memorization, which can lead to replication, of certain popular sources across AI models, but current studies reveal that the rate is quite low. The process of training an AI necessarily involves breaking large quantities of data apart, clustering, putting things that are similar together and then passing them through a noise filter. At the end of this process, there is little left of the original work in the AI model, with some exceptions.
The technological realities of generative AI suggest that infringing outputs is likely very rare. The government should not intervene with legislative reform to address what may be a non-issue. Rather, it should await the outcomes of litigation that may examine these issues in greater detail. Intervening at this premature stage, could harm creator rights, the development of AI technologies, and Canada’s competitiveness in a rapidly growing sector.
Comments and Suggestions
I am a law professor at the University of Ottawa where I hold the Canada Research Chair in Internet and E-commerce Law and serve as a member of the Centre for Law, Technology and Society. I focus on the intersection between law and technology with an emphasis on digital policies. I have edited multiple texts on Canadian copyright law and appeared many times before House of Commons committees on copyright law and policy. I submit these comments in a personal capacity representing only my own views.
The consultation raises several questions related to generative AI and copyright. I have focused on three in this submission:
(1) Should Canada proceed with a text and data mining exception as recommended in the 2019 Copyright Act review?
(2) Should Canada introduce legislative reforms to address the use of copyright works in large language models (LLMs) that are central to the development of generative AI technologies?
(3) Should Canada introduce legislative reforms to address copyright-related questions arising from the outputs of generative AI systems?
My submission argues as follows:
1. Introducing a text and data mining exception into Canadian copyright law is long overdue and should proceed as a copyright reform priority. Similar provisions are widely used in other jurisdictions. Proceeding with the exception would ensure that Canada implements a copyright framework for AI that encourages innovation and investment while also providing appropriate protections for creators.
2. It is premature to introduce legislative reforms on the use of copyright works within LLMs. While there are both technical and copyright related issues related to LLMs, the copyright issue is currently before courts around the world in multiple cases that raise questions related to inclusion of copyrighted works within LLMs, whether such use constitutes infringement, and the potential application of limitations and exceptions. Given that these issues should become clearer as those cases progress, the government should maintain a watching brief to determine how the cases before the courts develop, whether market-based licensing alternatives emerge, and how the technology adjusts to reflect copyright-related concerns.
3. It is similarly premature to introduce legislative reforms to address the outputs of generative AI systems. While many have expressed concerns about the occasional similarities between generative AI outputs and copyrighted works that may have been included within LLMs, a deeper dive into the technology suggests that infringement is very rare. The courts will again be called upon to examine these claims and the government should await those outcomes before proceeding with any potential legislative reforms.
I also note that the next statutorily mandated Copyright Act review is due. Before proceeding with any reforms, it would be useful to conduct an assessment of the implementation of the recommendations from the last review conducted by the Standing Committee on Industry, Science and Technology and scope out a future review to update on outstanding issues and address emerging ones such as generative AI.
Daniel Gervais
Technical Evidence
N/A
Text and Data Mining
As I see it, Canadian policy should ideally aim to achieve three objectives.
First, it should ensure the proper development of the AI industry in Canada. Second, it should ensure that Canadian “content” is properly reflected in AI training datasets for Large Language Models (LLMs), especially those used by Canadians. Third, it should ensure that Canadian creators are fairly compensated for the use of their copyright works, whether in Canada or elsewhere. These objectives are not presented in any kind of hierarchical order.
To begin with what I consider low-hanging fruit, a transparency obligation concerning the data used to train AI models would allow Canadians and the Government of Canada to know whether those objectives are being achieved. The recently adopted EU AI Act may provide a useful precedent in that regard.
Many LLMs are trained on available online material, sometimes any material that the AI system can locate. The status of online material is often misunderstood. A copyright work that is available online is not “copyright-free” unless it is licensed under those terms, or in the public domain. Some amateur content uploaded to various platforms and services is licensed under broad Terms of Service that allow reuse for several purposes, but those T&Cs must be considered case by case. Some of those T&Cs may allow reproduction for training. As most of the material was presumably made available before the emergence of LLMs, the idea that it is all subject to an implied license for LLM training strikes me as at least anachronistic.
Beyond publicly available material, there are troubling reports like this one (https://aicopyright.substack.com/p/the-books-used-to-train-llms) of datasets of books available electronically that were never authorized by rightsholders and where the “this was freely available” argument fails--independently of its legal value. Having said that, however, in my view the need for a new exception for TDM has not been demonstrated.
First, bargaining around the issue may require knowing a better understanding of where fair dealing ends, a determination not yet done by courts.
Second, even if and when a specific exception was added to the Act, Canadian courts may fall back on fair dealing, as happened in educational photocopying cases despite the existence of a specific exception.
Third, an exception allowing any and all use of copyright material for AI training purposes would fail to achieve the third objective identified above--and also the second objective unless it was accompanied by a transparency obligation that cannot be circumvented by relying instead on fair dealing. Moreover, a new exception is a policy risk in that it is almost certain to produce unintended effects. A *limitation* on rights (instead of a full exception) is appropriate, but its objective and scope should both be well delineated. Given the large number of small, medium, and large companies operating now or likely to do so in the future in the AI/TDM space and the even larger number of potentially affected copyright and related rights holders, individual transactions between rightsholders and user will not suffice. Transaction costs would be insurmountable. This supports the case for a limitation. The limitation that users require would limit their liability and allow them to proceed without having to ask permission.
Naturally, users--even those with multibillion-dollar budgets--also want it all for free, but that would fail to achieve the third objective. The limitation should achieve all three objectives and include fair compensation for creators.
In thinking about the contours of a such a limitation, four points should be made.
First, as I have explained in several publications (e.g., Collective Management of Copyright and Related
Rights, 3rd edn, chapter 1), if a collective licensing system is in place, then functionally rightsholders do not (and probably cannot) say no despite the existence of an exclusive right. Why would they? Hence, this is the functional (though not the legal) equivalent of a compulsory license. Put differently, if a collective management approach is in place, licenses with worldwide effect could be made available, rendering a new exception or limitation unnecessary. The license could not only compensate creators, it could also set parameters for fair reporting of material used, thereby ensuring transparency.
Second, a statutory limitation could be appropriate, with or without voluntary collective licenses. Canada could innovate by recognizing a right for creators to be compensated for the use of protected material for text and data mining, as Canada did in 1985 with private copying. The argument that the distribution of remuneration of that nature to creators is a "black box" can be easily refuted. I can provide additional information in that respect if that would be useful.
Third, any limitation must factor in Canada's international obligations under the Berne Convention, the
WCT, the WPPT, and the TRIPS Agreement, including the three-step test. I won’t elaborate on this here but could do so separately. I have also published extensively on this topic.
Fourth and finally, the deeper question concerning a possible limitation is the nature of the right that would form the basis for a right to remuneration--whether as a voluntary license and/or a statutory limitation of creators' rights. There is little doubt that the training of LLMs often requires the making of a local copy of the training material. This is prima facie infringement in the jurisdiction where it is taking place, subject to applicable exceptions and limitations. This is a potential area for a licensing solution, even--I might say especially when--parties disagree about the exact scope of existing exceptions, as this would avoid years of uncertainty and litigation costs.
It must be underscored that, though LLMs prima facie infringe when they ingest copyright material, subject to fair dealing, even if this matter is resolved in a way that ensures proper payment for creators, the "ingestion" process is unlikely to function as a source of ongoing remuneration. Large AI models will likely ingest any given element once or very few times. Larger models may then be made available to other AI companies as an "infrastructural layer". As there is only one human timeline, once most or all copyright material (roughly literary and artistic works from the past century) has been ingested, it will be difficult to justify ongoing royalties sufficient to allow creators to continue to do what they do. A separate area for potential licenses is the creation of outputs based on training data that is protected by copyright, a matter to which I return in my answers to the following questions.
A last point is that the copying of material for the training of LLMs either omits or deletes copyright management information. This would seem to constitute a second, separate source of liability. It could also be managed, however, under the terms of a license that would contain proper parameters for the lawful use of protected material.
Authorship and Ownership of Works Generated by AI
I do not believe there is much real uncertainty about most of the issues identified in the questionnaire.
First, it is beyond cavil that historically copyright (droit d'auteur) has always depended on originality generated by (and only by) humans. Second, although what LLMs produce looks like literary and artistic works, the human creative process is completely different from the process used by LLMs to produce their outputs. Third--and here I want to take a very broad view before returning to the copyright-specific questions because I consider this background essential to debate the issue--recognizing machines as authors would greatly accelerate the replacement of human journalists, writers, songwriters, and artists, causing potentially irreversible damage to human progress. As we assess the impact of LLMs on human evolution and the evolution of ideas, we should at the very least not push the LLM gas pedal to the floor because there might be a cliff ahead. This is not the same issue as cases of genuine human-machine collaboration, a matter to which I return below.
To be clear, this is not opposition to AI or LLM technology in any way. There are areas in which this technology will produce almost entirely beneficial results for humans, such as medical research--for example, its ability to identify new molecules and predict their efficacy, or to correlate certain genetic or other biomarkers and specific diseases (though I suspect the same predictive capabilities of the technology will be used by insurance companies to deny coverage to many Canadians without legislative interventions).
For purposes of this response, however, it is important to note that the easiest task for LLMs is to mimic certain outputs of the human creative process. This does have some benefits in the short term, such as allowing people to improve a draft text they've written. However, as research in neuroscience and psychology has shown, once a cognitive task has been delegated to a machine, humans quickly lose the ability to perform it well. For example, people who learned to drive before the omnipresence of GPS can probably still do so in areas they knew before they started relying on a GPS, but depend on a GPS for most other trips.
Professional creators of many forms of literary and artistic works, from journalists to songwriters to filmmakers to visual artists, learn by watching and getting feedback from more experienced creators.
Accelerating the replacement of those humans by removing their ability to earn a decent living because machines can mimic the format of their creative endeavors cheaper and faster is fraught with potentially catastrophic risks. Imagine a world where all or almost all music is machine-produced or a world in which Canadians get all their news from machines, not other humans (journalists).
As I wrote in an Essay published a few years ago:
"If machines can produce […] literary and artistic works cheaper and faster than human creators, it a highly likely that industry will favor them over their human counterparts. In the copyright sphere, delegating to machines the task of helping us understand and interpret our world has profound consequences. It is through this interpretation that humans can become true agents in the world and ultimately change it. Delegating this very task to machines is thus pregnant with implications for the future for it changes its arc. It will not be complete obliteration of course. There will always be humans who write, pick up a paintbrush, or try to make a movie or sculpture, but if most of what we are given to read, watch or listen to comes from machines, much will be lost. If copyright protection is granted on outputs without a human cause, and assuming that the cost of machine productions will be lower (and machines will not ask for ongoing royalty payments or have reversion rights) then market forces will inescapably push for a replacement of human authors whenever it is commercially feasible." D. Gervais,
The Human Cause, in Research Handbook on Intellectual Property and Artificial Intelligence (R. Abbott, ed), (Edward Edgar, 2022) pp 21-38.
I stand by that statement.
I would add that copyright is an incentive, and I am not aware of convincing data to show a major crisis of underinvestment in Generative AI.
In crafting the best policy, therefore, I urge the government to resist the commonly held view that any and all disruption caused by AI companies is per se positive and must be allowed by law, and instead consider that a diminution of the percentage of works created by humans available in commerce, from journalism to essays to novels, is not necessarily a clear positive. At the very least, it should not be accelerated by providing copyright rights to autonomous machine outputs Arguments in opposition to the approach outlined above usually begin with the observation that it takes a lot of time and money to train an AI system. Fair enough. Yet, it also took time and money to produce thousands of telephone books (getting the data from every subscriber, arranging it, and then printing and distributing the books), but that was not a basis for copyright protection. Nor is Einstein's E=mc2 protected by copyright (or any form of IP for that matter). A related argument is that because some AI outputs have commercial value, they should be protected. This is plainly wrong. Courts have long recognized that "what is worth copying is worth protecting" is not a correct statement of the law. A third argument is that LLMs should be considered authors because they mimic human creativity by creating outputs that may look like they could have been produced by a human author, occasionally very well. I submit that mimicry is not a sound policy basis for a claim to a right. Then, if rights in machine outputs were recognized, who would be the rightsholder? The owner of the machine? The company that programmed the algorithm (and there could be many, such as the company that created the model and the company that reused and adapted it), the company responsible for training, the person(s) who prompted the machine? Recognizing one over the other is difficult, but applying the notion of joint authorship in that context is doctrinally wrong (for these purported authors have not in fact collaborated).
That being said, the authorship issue is a sliding scale in this context. A work can be produced by an author with the assistance of AI tools—as opposed to outputs produced autonomously by the machine what I call outputs without a (sufficient) human cause. If a work has sufficient human authorship, then the use of AI tools should not prevent the copyrightability of the work, though if the machine’s contribution is separable, then a question can be raised about the copyrightability of the machine-produced portions. I discuss this in more detail in The Human Cause chapter cited above.
Finally, the question of the copyrightability of prompts is interesting. A detailed prompt (long enough and with sufficient originality) might be considered as text protected by copyright (literary work). The more interesting question is whether authoring (or “engineering”) a prompt means that the prompt “engineer” is the author of the resultant output. In almost every case, the answer should be negative.
One can imagine situations, however, in which a series of consecutive detailed prompts could contain expressions of specific ideas that reflect human creative choices directly perceptible in the machine’s output, in which case the argument that the prompts’ originality may have “transferred” to the output could at least be made with some credibility. I see those situations as exceptional, however. But to avoid confusion, one must be very clear. Even if the prompt(s) contains detailed *ideas* that are reflected in the machine's output, ideas are not protected by copyright. Thus, one must look for expression in the prompt that transferred to the output, for example, an instruction to use some specific, original text.
In summary, granting rights to machine productions would be a major doctrinal jump and a normative error. It would be the first type of second-degree intellectual property--exclusive rights not to something humans have made, but to what was made by what they have made. I see no reason to change copyright law so fundamentally without a very compelling reason, taking into account the major risks to human progress that any acceleration of the replacement of human authors is likely to cause.
Infringement and Liability regarding AI
LLMs necessarily produce outputs based on their training data. If that training data consists (entirely or even in part) of material protected by copyright or a related right, then that material is undeniably the basis for the output. That does not mean that machines infringe, however.
The principal rights to discuss are the rights of reproduction, translation, and adaptation.
The test for infringement of the right of reproduction is well known. The reproduction right is infringed only if a substantial part of a protected work is copied. This allows, for example, short quotes (at least of a larger work) to be taken without permission. Substantiality is related more to the quality rather than the quantity of what was taken. To determine whether someone has copied a work a three-part test is applied. The plaintiff must establish: first, that they created or otherwise own a copyright in a protected work; second, objective similarity between the plaintiff’s work and the defendant’s product (whether the defendant’s product is itself possibly a copyright work is immaterial); and third, that the defendant had access to the plaintiff’s work. The first question can be answered quickly and easily for most of the material protected by copyright used for LLM training purposes. Similarly, because there is at least some transparency (more would be far better) in terms of the material used, the third question can also be answered quickly and easily in the vast majority of cases. The key debate is, therefore, almost always about the second part of the test.
The question is whether the output infringes the reproduction right in one or more identifiable pre-existing works. This test has been applied by Canadian courts for decades. The fact that a potentially infringing output was generated by a machine does not and should not change the applicable test.
Although if an infringement does occur it would likely be in a work contained in the training dataset, it is certainly conceivable that an LLM could produce an output that copies a work not contained in its training dataset. The same three-part test should apply.
Whether an output infringes the rights of adaptation or translation also requires identifying which preexisting work(s) was infringed. As the majority of the Supreme Court of Canada explained in Théberge:
"[W]hile there is no explicit and independent concept of 'derivative work' in our Act, the words 'produce or reproduce the work ... in any material form whatever' in s. 3(1) confers on artists and authors the exclusive right to control the preparation of derivative works such as the union leaflet incorporating and multiplying the Michelin man in the Michelin case, supra. [...] In King Features Syndicate Inc. v. O .and M Kleemann Ltd., [1941] A.C. 417 (H.L.), under a provision in the English Act similar to s. 3(1) of our Act, the plaintiff's copyright in the cartoon character 'Popeye the Sailor' was held to be infringed by an unauthorized doll, i.e., the two dimensional character was reproduced without authorization in a new three-dimensional form."
Again, whether the output was produced autonomously by the machine, by a human, or by a machine-human "collaborative" effort, the infringement analysis remains essentially the same.
I note that several major providers of AI services--specifically LLMs--have offered indemnifications to their users. Those indemnifications should be evaluated very cautiously. The legal text supporting the indemnifications for possibly infringing outputs often contains significant exclusions. For example, OpenAI’s Terms of Service (version dated November 6, 2023) exclude indemnifications for outputs which the “Customer or Customer’s End Users knew or should have known the Output was infringing or likely to infringe, (ii) Customer or Customer’s End Users disabled, ignored, or did not use any relevant citation, filtering or safety features or restrictions provided by OpenAI, (iii) Output was modified, transformed, or used in combination with products or services not provided by or on behalf of OpenAI, (iv) Customer or its End Users did not have the right to use the Input or fine-tuning files to generate the allegedly infringing Output, (v) the claim alleges violation of trademark or related rights based on Customer’s or its End Users’ use of Output in trade or commerce, and (vi) the allegedly infringing Output is from content from a Third Party Offering.” Whether the “should have known” clause imposes a duty on users to check whether a particular output may be infringing is unclear, but the standard is certainly open to various interpretations. Moreover, excluding any material that the user modified is noteworthy, as many users are likely to at least tweak the machine’s output. Would something like a format change be sufficient to exclude the application of the protection? Indemnifications offered by Google and Microsoft similarly contain important limitations, for good reasons. Google’s indemnity clause for example excludes customer uses “after receiving notice of an infringement claim.” There is ample support, therefore, for the claim in a recent Forbes article that “if you read the fine print, the protections offered are narrower than what’s suggested by the PR.” Brad Stone, "AI Legal Protections May Not Save You from Getting Sued", Forbes, 13 November 2023.
I have also noted the promotion of automated filtering meant to prevent infringement. I am skeptical.
Yet, humans often disagree and must rely on long litigation to decide whether a particular literary or artistic production infringes rights in one or more pre-existing works, often because of the fuzzy contours of fair dealing. How can this be automated with sufficiently high accuracy that a user could safely rely on it?
In sum, although in common parlance LLM outputs are necessarily *based on* the material contained in their training dataset, it does not follow that every output will infringe a copyright or related right in that material. As explained above, rights in one or more identifiable works must be infringed. This means that fair remuneration for creators in terms of outputs is likely to be difficult to achieve, with the strongest basis at this juncture being the avoidance or resolution of litigation. As discussed in answers to previous questions, LLMs will infringe when they ingest copyright material, subject to fair dealing. Even if this matter is resolved in a way that ensures proper payment for creators, the "ingestion" process is unlikely to function as a source of ongoing remuneration. Larger LLMs will likely ingest a specific element only once or very few times.
A potential solution to consider would be the creation of a right to remuneration based on the use of tokenized copyright works and objects of related rights in LLM datasets. I do not have the space here to elaborate more fully on this possibility, but I would be happy to do so separately in due time. Ideally, such an approach would reflect a multilateral consensus on the importance of human creators. A quick turnaround of the international community after a major technological breakthrough is far from unprecedented. In 1996, barely two years after such a major shock (the invention of the HTTP protocol and the "World Wide Web"), two new treaties were adopted under the aegis of WIPO. The emergence of LLMs is at least as important a technological change, and arguably a far bigger one.
Comments and Suggestions
N/A
Goodbot Society
Technical Evidence
N/A
Text and Data Mining
Clarity on regulatory requirements on copyright and TDM in Canada can allow the AI and creative industries to develop and implement practices that account for the interests of the public and end users. TDM activities are conducted widely in Canada in the public and private sectors. Applications are broad and can impact end users in many critical ways.
In the public sector, law enforcement, health organizations, research institutions and nonprofits engage in data mining and analysis using TDM to generate insights and inform decision making on a range of issues. See Palantir's contract with Defense http://tinyurl.com/mtstbp42; the OPP's use of Palantir http://tinyurl.com/5h22wren; the Canadian military's data mining of social media http://tinyurl.com/5xyzyyza; PHAC's mining of social media http://tinyurl.com/2vdnfu9c; data mining by universities and nonprofits including Population Data BC http://tinyurl.com/4hzee8a4, Cybera https://tinyurl.com/3uenysvb, University of Toronto http://tinyurl.com/bpn62ks3 and University of Waterloo http://tinyurl.com/4f7cwmzd.
In the private sector, companies collect and use data about customers to generate insights into consumer behaviour, while data brokers collect and use data to generate insights that are sold to other companies. See Loblaws http://tinyurl.com/4t7vxhsh and Canadian Internet Policy and Public Interest Clinic's draft report on Canada's data brokerage industry http://tinyurl.com/23uarcup.
Unfortunately, resources that support TDM activities and their products are often not freely available to the public. Archives of academic research, legal information, financial information and news are typically paywalled and available only through a few major publishers, with little competition. Market dominance allows publishers to bargain for licensing and assignment rights that perpetuate monopolies over information. Sarah Lamden's book Data Cartels (Stanford University Press, 2022) describes the monopolization of information by US companies such as RELX and Thomson Reuters, while Tim Ribaric notes similar trends in Canada http://tinyurl.com/2razce4u. Major publishers often use this data to perform TDM to generate insights which are sold to other companies (see Data Cartels, and LexisNexis' Canadian websites, such as Intelligize http://tinyurl.com/4t649y78).
In addition, the largest data mining companies typically serve only large clients such as corporations, universities, law enforcement, and governments (see websites of the largest data mining companies http://tinyurl.com/yc8d3fva for primary customers.) Copyright Act supports copyright over code and data compilations that meet the test of originality (see synopsis of the Federal Court of Appeal's decision in 2017 FCA 236, on database copyright http://tinyurl.com/3rjwpff2) and is likely to be used by companies to protect products in ways that severely limit end users' access.
Even for organizations that can afford TDM services, licenses are expensive, sometimes costing millions for a single archive (Data Cartels). Even then, licences do not necessarily provide end users with desired access to conduct TDM (e.g. formatted data, data analysis outside TDM vendor environments).
TDM services are typically fee-based licenses that provide to access to (See Peter McCracken and Emma Raub http://tinyurl.com/mwrmpk4s)
1. copyrighted data made available by a vendor in formats amenable to TDM activities
2. data that a vendor has copyright in and that can be downloaded for use in TDM activities in any environment, but which is not necessarily formatted for TDM
3. third-party copyrighted data aggregated by a vendor, where end users can perform TDM activities using copyrighted data that must be performed within a vendor's environment due to restrictions from third-party copyright holders, and
4. proprietary data formatted by the TDM vendor where data can only be accessed and TDM activity can only be performed within a vendor environment.
The latter two licenses are predominantly described in literature and by virtue of creating closed TDM ecosystems, they limit the scope of TDM activities that can be performed.
To protect the end user's interests, Canada should amend Copyright Act to clarify the scope of permissible TDM activities and the safeguards that should be in place. In doing so, Canada should consider how to address following concerns:
- AI systems should be transparent. AI developers should be obligated to both keep records of and disclose what copyright-protected content is used in training AI systems. Moreover, end users should be able to access and review sources relied on by generative AI models such as chatbots.
- Products intended for TDM activities involving personal information or products created by performing TDM on data including personal information, should be strictly regulated in terms of the copyright protections extended to creators. Serious consideration should be made on whether it is consistent with public interest to profit off products using personal information (PI) at all, even if PI is deidentified or anonymized.
- Copyright laws that specifically regulate databases are required. At present, the Copyright Act protects databases independently created by the author and that display a threshold minimum degree of skill, judgment and labour in its overall selection or arrangement. However, this threshold minimum is unclear. Given databases are crucial for performing TDM activities and require intensive resources to maintain, economic incentives, such as those provided by copyright, to maintain databases may be required; however, economic incentives must be balanced with the need for information to remain a public resource. Copyright Act should be amended to provide guidance on when a database will be subject to protection to ensure copyright cannot be used to limit end users' access to databases that are not truly entitled to copyright protection.
- Copyright Act does not enshrine access to copyrighted materials for public purpose as a policy goal but Canada has an obligation to ensure copyright does not stymie flows of information in the public interest. For example, while fair dealing currently places the burden of proof on end users to prove they did not infringe copyright, Copyright Act should grant explicit end-user rights to use copyrighted material for permitted purposes (e.g. research, policy-making).
- Copyright law should preserve and protect information as a public resource. This should encompass an explicit set of end-user purposes which include research, teaching and disseminating current affairs. It should also include the ability to understand and challenge automated/AI-enabled decisions made by organizations that can substantially impact lived experiences of end users, such as access or denial to resources and services. As such, all academic research, legal, financial information and news should be made readily available to end users, without requiring prohibitive cost or effort.
- Even if Canada does not agree that Copyright Act should explicitly protect information as a public resource, end users should have a right to access, without cost, copyrighted materials used in TDM acivities and AI-enabed decision tools that impact end users, including to remedy harms. A right to access is distinct from a fair dealing exception which does not guarantee access to copyrighted materials and instead places the burden on end users to prove they accessed copyrighted materials for a purpose protected by fair dealing. When it comes to understanding and challenging AI-facilitated decision-making processes about themselves, an end user should not have to worry about, first, how they can access the copyrighted materials used in that decision-making, and second, whether they are complying with copyright laws when it comes to accessing those materials. (See article on federal government guidelines on use of AI by employees on critical concerns: http://tinyurl.com/3dhnw7sd)
- Different classes of creators have differing interests, levels of bargaining power, and different classes of works, but copyright laws do not distinguish between these creator classes or and works. Consideration should be given as to whether current assignment and licensing regimes adequately balance and protects all classes of creators (e.g. authors vs a publishing conglomerate) and adequately balances public interest in free flow of information (e.g. should assignment and licensing terms be solely determined by negotiations between the copyright owner and the assignee/licensee or should there be mandated considerations in those negotiations, such as sufficient public access), and whether the same copyright protections should extend to all classes of works (e.g. should research papers be subject to the same length of copyright as a novel).
- Copyright law should work in tandem with other areas of law such as patent, privacy, competition and contract law to protect and preserve information as a public resource.
As Canada considers copyright and TDM policy, discussions in the US and EU are instructive.
1. The EU Database Protection Directive is instructive in terms of how not to balance economic incentives for database creation and maintenance with public access, having led to overprotection of databases (see Pamela Andanda http://tinyurl.com/a84vy9sv and David Freedman http://tinyurl.com/rcrj729n).
2. Both the US and EU have created exceptions to copyright law for TDM performed for the purpose of scholarly research and teaching and for scientific research, respectively. Exceptions are listed in the US under the Digital Millenium Copyright Act (DMCA) and in the EU's Directive on Copyright in the Digital Single Market (see Authors' Alliance petition to Library of Congress on renewing DCMA exemptions http://tinyurl.com/2xvyrwaw and Thomas Margoni and Martin Kretschmer work on the EU's Directive http://tinyurl.com/23nakewm).
Authorship and Ownership of Works Generated by AI
GoodBot submits that the Canadian copyright ownership and authorship regimes in respect of AI-assisted or AI-generated works lack clarity. This lack of clarity results in uncertainty and risk for an end user who is consuming and using such works.
Clarification is required at both the legislative level in the Copyright Act and in Copyright Office procedure. Both copyright authorship and ownership should be restricted to natural persons, and artificial intelligence (AI) is characterised as a tool that supports the creative process of such a natural person.
The copyright regime is designed to protect copyright owners while promoting creativity and the orderly exchange of ideas (http://tinyurl.com/ypcyb8z5). This is an incentive structure suited for human behaviour and should be limited to natural persons. Since AI, as a tool embodied in a machine or software, cannot be incentivized in the same manner as a human, it is therefore not reasonable to allow AI to be copyright authors or owners.
To give context to the discussion below, it is important to distinguish between copyright authorship and ownership. Subject to the exceptions provided within the Copyright Act, the author of a work is generally the first owner of the copyright, although copyright ownership is assignable (Copyright Act, s. 13). While the Copyright Act does not define "author" specifically, the term generally refers to the person who created the work, or the person who first put it in a fixed form (Gould Estate v Stoddart Publishing Co Ltd (1998), 39 OR 555 (Ont CA)). The Copyright Act already alludes to the concept of authorship being tied to a natural person, given that the Act ties the term of copyright protection to the life and death of an author (Copyright Act, s. 23), and Canadian courts have interpreted this as meaning that an author must be a natural person or human being (See for example P.S. Knight Co Ltd v Canadian Standards Association, 2018 FCA 222, at para 147; Setanta Sport Limited v 2049630 Ontario Inc (Verde Minho Tapas & Lounge), 2007 FC 899, at para 4). Under the Copyright Act, copyright protects "original" expression (Copyright Act, s. 5). To be an original expression or original work, the work must be an exercise of an author's skill and judgment in expression of their ideas (CCH Canadian Ltd v Law Society of Upper Canada, 2004 SCC 13).
To restrict copyright authorship to a natural person, the definition of "original" work in section 5 of the Copyright Act should be amended to define the nature and extent of human input or contribution required for a work to be "original" – namely, that an "author" must be a natural person or human being – thus reinforcing the notion at case law that the skill and judgment required in creating the work must be exercised by a natural person.
By restricting copyright authorship to a natural person, a question that must be addressed is what nature or quality of human contribution is required for copyright authorship. For example, do the human contributions of the creator of an AI tool contribute to authorship? Does the human user of AI perform authorship in creating a work?
GoodBot is of the view that characterising AI as a tool can provide a helpful starting point for determining the quality of human contribution required for copyright authorship (for example, the US District Court (for the District of Columbia) has referred to the use of AI by artists "in their toolbox", see Stephen Thaler v. Shira Perlmutter and The United States Copyright Office (1:22-cv-01564) http://tinyurl.com/2496aatk). By characterising AI as a tool, the human creator of the AI tool is not a part of the creative process of creating an original work, and thus is not an author of the original work. Copyright authorship in an AI-assisted work would thus fall with the AI user who creates the original work.
Copyright authorship not being available to the creator of an AI tool provides a clear benefit to the end-user – these works being free from copyright allows for them to be freely used and reused by anyone. This is in contrast to an unfair monopoly that would result from the creator of an AI tool having ownership in every piece of work produced using that AI tool. The copyright lies with the user, namely, the author who used the AI to create the work.
In the US, the District Court held that the US Copyright Act requires human authorship and therefore only protects works of "human creation" (Stephen Thaler v. Shira Perlmutter and The United States Copyright Office (1:22-cv-01564) http://tinyurl.com/2496aatk).
In Europe, a work can be copyrighted if the work is the "author's own intellectual creation" (Copyright Directive (2001/29/EC), as interpreted by a number of Court of Justice of the European Union (CJEU) cases starting with Infopaq in 2009 http://tinyurl.com/yeyb2nkp). While not harmonized under the current EU copyright framework (EU copyright law nowhere expressly states that copyright requires a human creator), it is generally understood by commentators that some level of human contribution is required for copyright authorship (See, for example, Hugenholtz, P.B., Quintais, J.P. Copyright and Artificial Creation: Does EU Copyright Law Protect AI-Assisted Output?. IIC 52, 1190–1216 (2021). http://tinyurl.com/ymn8nskz).
With respect to the quantity of human contribution that is required for copyright authorship, the extent of human input that is necessary to rise to the level of authorship needs to be clearly defined, allowing for a distinction to be made between work that is "AI-assisted", containing both human and AI-generated contributions, and "AI-generated" work that is wholly created by an AI tool.
In the US, the US Copyright Office has issued guidance indicating that for those seeking registration of their copyrighted works, any more than a "de minimis" amount of AI-generated content in the work must be disclosed and disclaimed, and thus, excluded from copyright registration (Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence http://tinyurl.com/3ru54v65). At the time of writing, however, it is not clear how much human contribution or authorship is required for copyright registration in the US, nor has a clear standard been defined for a "de minimis" amount of AI-generated content.
The US Copyright Office has held that entering a series of prompts into an AI system does not make someone an author (Letter from the Copyright Review Board to Tamara Pester, Esq., (5 Sep 2023) "Second Request for Reconsideration for Refusal to Register Théâtre D'opéra Spatial SR #1-11743923581 http://tinyurl.com/e75bsdte).
By contrast, in China, a recent Beijing Internet Court decision indicates that prompt engineering, parameter setting and output selection of a generative AI system require originality of the author, and therefore the work resulting from such intellectual input is subject to copyright protection (China's First Case on Copyrightability of AI-Generated Picture, Seagull Song http://tinyurl.com/4sfrvnw7).
In Canada, while a copyrightable work is protected by copyright laws the moment it is created and fixed in a material form, copyright may be formally registered at the Canadian Copyright Office.
The copyright registration process by the Canadian Copyright Office requires more oversight, and more substantive review, at least of authorship. Oversight of copyright authorship could be facilitated by a duty on the applicant to disclose to the Office the presence of any AI-generated content in the work, in a similar manner to the requirement in the US Copyright Office where applicants have a duty to disclose the inclusion of AI-generated content in a work submitted for registration and to provide a brief explanation of the human author's contributions to the work (Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence http://tinyurl.com/3ru54v65).
The Canadian Copyright Office has granted copyright registration for a work in which a natural person and AI system were listed as co-authors (Canadian Copyright Registration No. 1188619, "SURYAST" http://tinyurl.com/33hxuf4j). Since the registration is not substantively reviewed, this result creates confusion in the standard for authorship. In the long term, a lack of substantive review of copyright registration can lead to inconsistency as between the Copyright Office and case law that may need to be resolved in the courts, and ultimately a lack of public confidence in the copyright regime. Canada's copyright policy – including Copyright Office directives – should be consistent and explicit in allowing exclusively natural persons to be listed as authors which achieves a nonexplicit intent of disallowing AI tools to be authors.
Infringement and Liability regarding AI
Under the existing legal test for copyright infringement, Canadian courts analyze whether the alleged work is similar to at least the substantial part of the copyrighted work (Techno-Pieux Inc v Techno Piles Inc, 2023 FC 581). The court will analyze the copyrighted work as a whole and in particular if the author's skills and judgement resulted in original work. The court also states that "original work" has a lower standard than novelty and uniqueness but still needs skills and judgement. Canadian Intellectual Property Office has approved certain copyright applications when AI is a co-author and the human author has exercised skills and judgement in creating a work, however, the substantiveness of the copyright claim has not been examined (See, for example "SURYAST" by Ankit Sahni, Canadian Copyright Database, 1188619, 1 Dec 2021, http://tinyurl.com/33hxuf4j).
While Canada has not released its opinion regarding the copyright infringement of AI generated works, other jurisdictions may provide some guidance regarding the Canadian approach.
China
China recently passed an official regulation relating to AI and intellectual property right infringements (Cyberspace Administration of China, Measures for the Management of Generative Artificial Intelligence Services, Jul 13, 2023, http://tinyurl.com/3cywpsve [in Chinese] English translation for draft version http://tinyurl.com/y92xsa54). The Measures include liability on the AI content provider, requiring them to respect lawful rights of other persons and have a duty to prevent IP infringement under Article 4.5. Furthermore, AI content providers are required under Article 13 to stop or suspend the generation of right-infringing content if they receive complaints or realize that the content may be violating intellectual property rights. Article 15 also requires the AI content provider to optimize the training models to prevent the recurrence of right-infringing content within 90 days of discovering the infringement or receiving a complaint.
The Chinese approach likely is based on the rationale that AI developers have direct knowledge of the datasets used to produce the AI-generated results. In contrast, AI platform users generally do not have access to the raw AI model training data. Interestingly, this approach does not waive or indemnify users for creating and using the AI because users can be jointly liable along with AI developers for copyright infringement under the Chinese law.
Japan
The Japanese Government takes a position to separate the infringement issues into two parts: (1) infringement occurred during training and development of the AI, and (2) infringement occurred during the generation and output of AI using trained or untrained data.
For the first type of infringement, according to Fukuoka, Japanese copyright law provides a high degree of flexibility when companies use copyrighted materials for model training under the data analysis exception as long as the purpose is "not to enjoy the ideas or sentiment expressed in the work" (See Shinnosuke Fukuoka et al., "Legal Issues in Generative AI under Japanese Law – Copyright", Jul 11 2023, page 3, http://tinyurl.com/4urycmdx). Consent is generally not needed if the exception applies. However, consent may still be necessary when copyright holders' interests are unreasonably impaired. However, the Japanese law does not provide a clear definition regarding unreasonably impaired for training and other AI uses.
At the generation stage, infringement can occur when the generated content ends up similar to a copyrighted work, especially when the training relies on AI models trained by the copyrighted work. Fukuoka suggests companies may not rely on data analysis to escape their legal liabilities.
Singapore
Similar to Japan, Singapore's Intellectual Property Office (SIPO) prohibits AI companies from using copyrighted work without permission for purposes not covered by data analysis exceptions (See Intellectual Property Office of Singapore, "COPYRIGHT FACTSHEET ON COPYRIGHT ACT 2021", Nov 24, 2022, http://tinyurl.com/2tfn67tm). Unlike Japan that is silent on whether companies can use copyright materials illegally obtained, SIPO emphasizes lawful access, meaning that companies cannot circumvent paywalls and must demonstrate due diligence in verifying the source used for obtaining copyrighted works.
USA
Regarding the AI model training stage, as the US does not have a specific data analysis exception, AI companies rely on the fair use regime to conduct these activities. The US jurisprudence such as Authors Guild v. Google, 721 F.3d 132 (2nd Cir. 2015), seems to open a floodgate that wide scope of activities may qualify as fair use in which Google organized and published the digital snippets of copyrighted books that were considered fair use. AI model training may be comparable to what Google has done as the use was transformative and public disclosure of copyright materials are usually limited. As such, companies may conduct AI model trainings without having the right holder's consent.
The United States Copyright Office (USCO) issued an official policy document that clarifies its position on copyright for AI generated works (USCO, Mar 16 2023, "Rule Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence", http://tinyurl.com/3urwhzh4). The policy affirms that the office only grants copyright to AI works with human authorship and rejects applications that are based on works autonomously generated by AI. Applicants also need to disclaim AI-generated content if it exceeds the de minimis standard. This means that the office has the authority to reject applications if the AI-generated elements could be eligible for copyright protection had they been created by a human, even though the specific standard is not clearly defined. USCO has rejected several applications, and at least one US court supports USCO's perspective (Thaler v. Perlmutter and USCO (1:22-cv-01564), D.D.C, Jun 2 2022).
In Andersen v. Stability AI Ltd, 23-cv-00201-WHO, N.D. Cal. Oct. 30, 2023, the court allowed the plaintiff to pursue infringement claim against Stability AI when the plaintiff used a third party report as evidence to show her copyrighted work has been used by Stability AI for AI training.
EU
The draft EU Artificial Intelligence Act, Article 28 adds an obligation that AI content providers are required to record and publish a "sufficiently detailed summary" of the training data protected by copyright laws (European Parliament, "Texts adopted – Artificial Intelligence Act",14 Jun 2023, https://tinyurl.com/yc6ywv4z). According to Matt Hervey, a UK Gowling WLG lawyer, the Recital 58a also mentions that AI deployers must also conduct risk mitigations, including protecting third-party copyrights and other IP rights (Matt Hervey, Jun 26 2023, "The EU AI Act and IP", https://tinyurl.com/4mus2cd4).
Canada's Desired Approach
Canada should balance the public interest with the rights of copyright holders in its approach to AI. AI development can severely impact the day-to-day life of copyright holders and the general public. It is unreasonable and infeasible to put any responsibility on public users, who generally lack a technical understanding of generative models. However, AI service providers and deployers have the money and resources to implement safeguards and other measures to increase transparency and take ownership of the system.
Therefore, GoodBot recommends that
- Canada adopt the EU approach requiring AI developers to disclose any and all copyrighted work used to train AI models. This requirement addresses a key barrier to determining if an AI system has accessed or copied a copyrighted work, and provides more concrete grounds to the general public in establishing infringement claims.
- Canada should make AI developers liable for using illegally obtained data to train AI models. Both the Japanese and US Fair Use approaches are poorly defined and lack explicit disclosure obligations that risk prioritizing the rights of AI developers over end users. AI developers should hold responsibility for protecting and disclosing existing copyrights, and thus, Singapore and EU approaches are more reasonable. Importantly, Canada's fair dealing is different from the US's fair use in that it explicitly provides a fixed category of written exceptions under Canadian copyright law; such exceptions may be tabled accordingly to address the issue of generative AI. AI developers will also be in infringement while using trained models to generate an output that infringes an existing copyrighted work. Thus, Canada should move towards the Chinese, Singapore, and EU approaches such that AI developers should demonstrate that data has been obtained legally and also be responsible for third-party copyright infringement and for removing infringing output.
- To better protect the public interest and put more burden on the AI developers, Canada should make AI developers responsible for clarifying copyrights, including a mandatory disclosures of copyright work used in training models.
- Canada should invoke a rebuttable presumption in the amendment to state that general users are sheltered from copyright liabilities for using AI to produce work if they do not knowingly order the AI to generate infringing content. To balance AI developer's interests, they may invoke due diligence defence by warning users about potential copyright infringement, i.e., "a user must not input explicit prompts that are directed to generate copyright infringing content". This amendment can also incentivize AI developers. For instance, they may establish a well-defined exception that places the responsibility on AI developers and content providers to use copyrighted data responsibly when training AI models for specific non-commercial research purposes that ultimately benefit the general public.
Comments and Suggestions
While GoodBot's submission is mindful of how generative AI impacts workers and creators, i.e. future of work, we assume that = entities representing these interests will submit industry specific recommendations. Our submissions therefore focuses on the impact of policies on end users, (i.e. everyday Canadians) who we assume will be under-represented in other submissions on Copyright policy.
AI developments have significant implications for Canadians that need to be addressed through various forms of intellectual property beyond copyright, such as patents and trademarks.
At this time, the advent of Generative AI, coupled with a historical lack of intentional focus on the rights of end users and effective enforcement mechanisms has resulted in a lack of policy coherence which contributes to confusion, uncertainty and harm to end users and the public interest. Policy exists more in theory than in reality for many Canadians impacted by technology-enabled harms.
The Government of Canada must address gaps and inconsistencies in policy while prioritizing the establishment of effective enforcement mechanisms capable of rebalancing towards end-user rights who are at an asymmetrical disadvantage to technology companies and espeically Big Tech.
The Government of Canada should revise policy and operational processes that prioritize the rapidly evolving needs of end users, including through substantive rights-based policies that consider and account for harms and impacts on end-users. Rebalancing rights also requires that policy frameworks provide improved access to legal remedies and enforcement mechanisms that monitor legal obligations and ensure that rights can be realized.
Increasingly conversations also prioritize the need to align the development of technology with public interest. Yet while technology industries, governments, and civil society (including academia, nonprofits) all engage in different ways on how technology impacts people, families, communities, societies and democracies, there is limited shared common understanding of the issues and the solutions to address them.
These differences are in part a product of different language, approaches, priorities and incentives facing different stakeholder groups which often contribute to growing power asymmetries and to adversarial relationships between rights focused actors prioritizing public interest and industry-aligned actors prioritizing minimal obligations. This conflict is not productive and not aligned with public interest which is the domain of government.
The result is a thin common layer of understanding of what is in the public interest (which often comes down 'what is legal?' and/ or 'what can I get away with?') A consequence of this permissive approach is that even riskier applications once considered unthinkable are later pursued once market dominance is established (see OpenAI's recent decision to revoke military bans on the use of Generative AI products.)
Developing and thickening a robust public interest layer is in everyone's interest especially for end-users, content creators, workers, policymakers, entrepreneurs sectors and markets, if perhaps not monopolies whose dominance is itself an issue.
A robust layer of public interest means that:
- companies are actively disincentivized from employing harmful practices or business models that pit user and societal well-being against company profits
- healthy technology markets can emerge that are no longer dominated by a handful of ungovernable monopolies and where emerging responsible companies can compete on trust
- companies own the primary burden of demonstrating that models are safe and fair, and that they have been developed with lawfully obtained data
- transparency requirements can enable robust oversight and accountability when companies fail to adhere to legal obligations or when they cause harm
- end users are aware of on rights and have accessible mechanisms available where corrective measures can be realized
Moving policy frameworks to rebalance toward public interest requires substantive obligations for generative AI companies. It is – for example – in the public's interest to understand how AI models are developed, trained and normalized in order to address and provide resource when harms arise from models. Such obligations should include but are not limited to the disclosure of data used to train AI models, demonstrations of legal provenance, and accessible remedial procedures such as the expedient removal of copyright-protected content from models as well as compensation (where appropriate) to creators whose works were infringed.
Such obligations can and should be scaled according to size (e.g. level of profit, investment), risk, including the risk posed by copyright content data sets and the risks posed by the development of AI models, as well as the extent of potential breach causing harm to end-users. GoodBot recommends modelling EU approaches to risk, which require risk-based audits and human rights impact assessments based on assessed risks of AI models.
The Text and Data Mining submission specifically raises the questions as to whether it is consistent with public interest for creators to profit off products created using personal information at all, even if that personal information is de-identified or anonymized in the final product. This is especially true if reasonable efforts have not been made to address known risks such as bias that could impact on automated decision systems.
It is also important to the role of technology lobbies, backed by industry groups, that attempt to disincentivize countries and regions from robustly legislating on AI. These groups are often mandated with weakening policy obligations and legislations by pitting countries against one another, dangling the promise of AI investment dollars (See conflict between the UK, France and Germany in the final days of the AI Act http://tinyurl.com/47ps3ksc). As industry bodies invest in lobby groups, there is no equivalent lobby for end-users.
Robust and coherent multilateral public policy is therefore also critical as an antidote to companies attempting to undermine policy. Canada should advance substantive rights-based global policy efforts to regulate AI that balance end-users rights and that include civil society delegations that can represent thes interests.
Generative AI is likely to cause disruptive societal challenges in the years to come. On Jan 14, 2024, International Monetary Fund (IMF) chief Kristalina Georgieva raised concerns that nearly 40% of jobs around the world could be affected by the rise of AI, a trend which is likely to deepen inequality. The IMF is calling on governments to establish social safety nets and offer retraining programs to counter the impact of AI http://tinyurl.com/mv3rmwnx. This post was published on the same day that Oxfam released a report demonstrating that the five of wealthiest men in the world – including tech bosses Elon Musk, Jeff Bezos and Larry Ellison – have doubled their fortunes since the pandemic and now make nearly twice as much wealth as the rest of the world put together over the past two years. http://tinyurl.com/9srbw9xz
GoodBot welcomes Canada's efforts to amend the Copyright Act for regulating the use of generative AI and recommends prioritizing and safeguarding rights of end users (i.e. everyday Canadians) over industry interest. Embracing this regulatory end, we emphasize that this submission is not anti-technology but rather underscore a commitment to foster responsible use of technology. Through the proposed recommendations we encourage the Canadian government to establish a framework that balances innovation with ethical considerations, ensuring a public interest-oriented integration of generative AI within legal boundaries. Canada should ensure that policy frameworks to address public interest requirements and that limit the ability of copyright policy to be used as a tool to stymie access to information that is the public interest.
GoodBot believes effective public policy can enable a wave of technology developments that prioritize and align with public interest and that it is in the long interests of even technology companies to be effectively regulated to avoid races to the bottom that are harmful to society. The longer it takes to regulate, the more painful it will be for companies to adhere to new obligations.
This moment also provides an opportunity to address the vast and growing asymmetry of power of Big Tech which is positioned to further consolidate its dominance in ways that are making companies ungovernable.
Google
Technical Evidence
Our Approach To Developing Artificial Intelligence
Recent years have seen huge breakthroughs in the use and application of artificial intelligence — and AI holds major promise for people around the world. It has the potential to unlock major benefits, from better understanding diseases to mitigating climate change and driving prosperity through greater economic opportunity.
AI also already powers Google's core products, which help billions of people every day. Whether it's asking for movie times, finding the nearest doctor, or finding better routes home, our work in AI is centered on improving people's everyday experiences. Some of our most popular products at Google — like Lens and Translate — have at their core AI technologies like optical character recognition and machine learning. And countless other Google products now have AI built into them, making them more helpful to billions of people.
Many of these improvements are possible thanks to Google Research's introduction of the Transformer model in 2017. The Transformer is considered the foundation of modern language models; on top of this architecture we are now able to build AI language models — like BERT, PaLM, MUM, LaMDA, and Gemini — that can do everything from solving complex math word problems to answering questions in new languages. They can even express their reasoning through chain-of-thought prompting.
We believe our approach to AI must be both bold and responsible. To us that means developing AI in a way that maximizes the benefits to society while addressing the challenges. We were one of the first companies to publish a set of AI Principles, and we use an AI risk-assessment framework to identify and mitigate risks. Google DeepMind likewise adheres to our AI Principles and has a dedicated internal governance body — the Responsibility and Safety Council — tasked with upholding them. We also constantly learn from our research, our experiences, our users, and the wider community — and incorporate what we learn into our approach to developing and deploying AI.
How AI Systems Work Artificial Intelligence and Machine Learning
Before discussing the process of machine learning, it is important to understand some basic terminology and types of AI systems. The term artificial intelligence describes a broad and diverse set of technologies. AI systems can be built in many ways. For instance, the computer opponent in a basic chess app might have some deterministic behaviours that are hard-coded by the developer in the form of if-then statements known as decision or production rules. The same is often true even of more advanced AI systems. The Deep Blue chess system that was the first computer to win a match against reigning world champion Gary Kasparov in 1997 was an expert system built using a large rule-set meant to imitate experts in a deterministic manner.
Much of the recent progress we've seen in AI is based on machine learning (ML), a subfield of computer science where computers learn to recognize patterns from example data, rather than being programmed with specific rules. Because of the way they are built, ML models are able to complete tasks and solve problems that would have been impossible for expert systems. Deep learning is a specific ML technique based on neural networks. Neural networks use nodes or "artificial neurons," inspired by models of brain neurons, as fundamental processing units which receive numeric inputs from, and pass outputs to, other neurons. Deep learning connects multiple layers of these artificial neurons. The interconnections between these neurons, also referred to as nodes, are numerical weights that essentially represent the importance of the contribution of that neuron to the final output. It is also relevant to note that there is no copy of the training data — whether text, images, or other formats — present in the model itself. Deep neural networks themselves determine the attributes of the data that they use to recognize patterns, as opposed to a human coder setting those attributes manually.
For example, the AlphaGo model, developed by Google DeepMind, was the first computer program to defeat any professional human Go player and, soon after, the first to defeat a Go world champion. To accomplish that, AlphaGo had to employ an ML model. That is because Go is a profoundly complex game, one googol (10100) times more complex than chess. The power of the decision-making engine underlying Deep Blue would have been no match for a Go world champion.
ML models have long been used for classification or prediction purposes, e.g., a system that can detect cats in photos or predict vehicular traffic patterns. However, recently, the ML models that have captured the most attention are generative AI models.
Generative AI
Generative AI models can use what they have learned to create new content, such as text, images, music, and computer code. A "large language model" (LLM) is a generative AI model that finds patterns in human language, making it suitable for a range of writing tasks, including predicting the next words to complete a sentence or suggesting grammatical edits that preserve what you mean to say. During training, a model evaluates the proximity, order, frequency, and other attributes of portions of words, called tokens, in its training data. Tokens represent language data in its most disaggregated form. For example, the word "indistinguishable" is made up of three tokens: "in", "distinguish", and "able". The model itself selects which attributes of tokens to use. In this way, training is the discovery of probabilities of relationships between the tokens — ultimately not in any individual text, but in all of the text on which the model is trained. The trained model then comprises a large network of weights that represent these learned relationships. The model can then respond to a prompt and generate new text with a probability of addressing the prompt as determined by its training.
Generative AI models are not databases or information retrieval systems. To be sure, when, for instance, an LLM is prompted for facts, it can generate articulate responses that may give the impression that it is retrieving information. But, fundamentally, the model is generating responses based on a statistical estimation of what a satisfactory response should look like. Put simply, it produces an average group of words, pixels, or sounds related to a prompt. Some have referred to this as, not an answer, but merely "answer-shaped." To understand how generative AI systems are built, it is easiest to take as an example the LLMs — like LaMDA, PaLM, and MusicLM — that underlie many of Google's latest AI advances.
The technical process of "learning" for an LLM begins with training the model to identify relationships and patterns among words in a large dataset. Through this process, a generative AI model will adjust its parameters to reflect the mathematical relationships in the data. Once the model has adjusted its parameters to accurately reflect these relationships, it can then use them to generate new outputs based on those parameters. The number of parameters needed to capture the complexities and nuances of human language and facts about the world is vast.
LLMs are developed in multiple stages, including pre-training and fine-tuning. Pre-training is a way of training an ML model on a wide variety of data. This gives the model a head start for when it is later trained on a smaller dataset of labeled data for a specific task. When an LLM is pre-trained, training material is analyzed to examine and extract statistical relationships among the individual tokens, words, and sentences, e.g., their frequency, importance, and semantic relationship to each other. The AI "model" is simply the encapsulation of those statistical facts in numbers. And given enough content — on the scale of hundreds of billions of tokens — the model may be able to embody the ways human language hangs together as a whole in the form of its parameters and nodes. Importantly, given the volume of tokens that models need to train on, any particular work standing alone is not essential or even necessary for that training. Instead, it is the total collection of works that is needed to train an AI model. Following pre-training, the model can be refined through a process called fine-tuning. Fine-tuning an LLM is the process of adapting a pre-trained LLM to improve its performance on a specific task. The model learns from additional example data to help hone its capabilities. For instance, one can fine-tune a general purpose LLM to teach it how to summarize technical reports in general by using a smaller set of examples of technical reports and accurate summaries.
Text and Data Mining
Text and Data Mining Exceptions
Innovation in AI fundamentally depends on the ability of LLMs to learn in the computational sense from the widest possible variety of publicly available material. Fair use and text and data mining exceptions around the world support innovation by ensuring that developers are able to assemble the building blocks needed for the development of AI. These provisions further the purpose of copyright law by purposefully and carefully balancing protections for creators with the need for innovation and cumulative creativity.
Canadian law already includes important exceptions and limitations to copyright that support the training of AI models. The fair dealing exception for research and private study is generally understood to include AI training, as "many uses [of copyrighted works] made for machine-learning purposes are likely to be "fair" under the second step [of the fair dealing test], not least because such copies do not compromise the core interests of the copyright owner or substitute for the work of the author in the market." Similarly, the exception for "temporary reproductions for technological processes" could also apply to the training of AI models in circumstances where the copyrighted works included in the training models were not retained beyond what was required for the training. However, to further encourage innovation, policymakers should consider creating even more certainty through the adoption of an express text and data mining exception for both research and commercial uses. Indeed, after conducting an extensive mandatory review of the Copyright Act, the Standing Committee on Industry, Science and Technology similarly concluded that "facilitating the informational analysis of lawfully acquired copyrighted content could help Canada's promising future in artificial intelligence become reality" and recommending "that the Government of Canada introduce legislation to amend the Copyright Act to facilitate the use of a work or other subject-matter for the purpose of informational analysis".
It is not a coincidence that the leading innovative countries in the world have fair use or a specific text and data mining exception that includes commercial uses. And Canadian policymakers should consider a similar model to ensure that AI developers have the certainty necessary to invest in AI in Canada. For example, Japan's "non-enjoyment" statute recognizes that it must be permissible to use a work when the person's purpose is not to personally enjoy the work, but simply for use in data analysis. Singapore's Copyright Act similarly recognizes that copies made in the course of computational data analysis are permitted. Jurisdictions that recognize a flexible fair use exception, such as the United States, have held similar activities to be legal fair uses. And when the European Union's Copyright Directive ("EUCD") adopted provisions on text and data mining ("TDM") for both research and commercial uses, it did so with an understanding that these new technologies held enormous beneficial promise.
These types of provisions are also in line with the Berne Three-Step Test. As explained in A Balanced Interpretation of the "Three-Step Test" in Copyright Law ("Munich Declaration"), the Three-Step Test's restriction of limitations and exceptions to exclusive rights to certain special cases do not prevent legislatures from introducing open-ended limitations and exceptions, so long as the scope of such limitations and exceptions is reasonably foreseeable. Additionally, limitations and exceptions do not conflict with a normal exploitation of protected subject matter, if they are based on important competing considerations. Finally, the Three-Step Test should be interpreted in a manner that respects legitimate public interests, notably in scientific progress and cultural, social, or economic development.
Fair use provisions as well as narrower legislative exceptions that permit text and data mining for purposes of training machine learning models fit this description. Their scope is reasonably foreseeable and does not conflict with the normal exploitation of copyrighted works as copyright protection has always focused on protecting expression, not facts within or about that expression. And, as already described, the public interest in the scientific progress such uses make possible is entirely appropriate to consider when conducting a Three-Step analysis. Professor Martin Senftleben, who has written extensively about the three-step test, has even argued that TDM is not a traditional category of use that could be contemplated by existing international treaties – rather, it is 'an automated, analytical type of use that does not affect the expressive core of literary and artistic works', which therefore falls outside international copyright harmonization, and the three-step test, altogether.
We understand that some have argued that AI developers should be required to license or get permission for training. However, such a requirement would be essentially impossible given the large amount of data needed to train AI models and the lack of comprehensive data about copyright ownership. As a result, it would effectively block the development and use of large language models and other types of cutting-edge AI. And if innovators are unable to leverage these building blocks needed for the development of AI, the many opportunities that come with this technology will be at risk. We will not be able to use AI to help unlock scientific discoveries and to tackle humanity's greatest challenges and opportunities – from improving cancer screening to developing solutions to tackle climate change. In addition, any limitation on the ability to train on publicly available material increases the risk that models will be trained on non-representative data — including potentially excluding marginalized or alternative voices from the training data. For example, restrictions or impediments to the training of models might lead some model developers to favor older data sets (such as out-of-copyright books from more than 100 years ago that are in the public domain) -- which could result in model outputs being skewed based on biased or inaccurate assumptions about, e.g., race, nationality, gender roles, and gender identity.
Some have suggested a collective licensing structure as a potential solution to these challenges. However, there simply aren't any copyright collectives governing the wide array of copyrightable works that are currently being used (or could potentially be) in large data sets, or that fully cover the wide array of works within a class (e.g., non-Canadian works, etc). And in instances where there are collectives governing certain specific commercial uses of certain specific classes of works, a requirement to license a certain class would create an incentive to not develop AI models using those classes of works in favour of others which do not require a license (e.g., public domain, open source, non-copyrightable, etc). As a result, Canadian policymakers should reject proposals that would require licensing or permission to train AI models.
There has also been significant discussion about compelling AI developers to disclose the datasets that they have trained on. Most LLMs are trained on a wide variety of publicly available online data, including web-crawled data, rather than on 'offline' datasets that are prospectively compiled and documented. Given that fact, a disclosure requirement would be unsound policy for several reasons. First, the source of much of the training, validation, testing, and input data is the massive volume of content available on the entire open World Wide Web — in contrast with models that use a limited number of well-defined, readily identifiable sources. Second, identifying the datasets used to train particular systems would expose competitively sensitive (and potentially trade-secret-protected) information. And third, AI developers do not have access to detailed or accurate information about copyright status, ownership, or licensing terms for the content available on the public web. In fact, there is no such source of truth anywhere in the world. Thus, complying with disclosure rules may simply prove impossible from the start. In addition, recently Google and other AI developers announced improved web publisher controls for training of generative AI models. These controls, and others like them soon to follow, make disclosure requirements unnecessary because they enable rightsholders to know and control ex ante whether their online content may be used for training of future models. Further, giving web publishers the ability to choose whether or not their content may be used for training may also facilitate new, market-based solutions.
Authorship and Ownership of Works Generated by AI
Authorship and Ownership of Works Generated by AI
Scholars and policymakers alike have recognized that AI systems do not need an incentive to create, and so there is no sound public policy reason to extend copyright protection to AI-generated works. That said, the presence or absence of sufficient human intervention in the creative process is a nuance that will need to be addressed on a case-by-case basis. In particular, it is likely that most commercial uses of AI will entail at least some amount of human creativity. There may also be many cases where creators use these tools integratively as part of their creative process. In that circumstance, the final work product may well be protected by copyright. The question of what the scope of that copyright would be is a matter properly handled by the copyright offices and courts in the context of specific disputes.
Infringement and Liability regarding AI
Infringement and Liability regarding AI
We believe that existing copyright rules regarding infringement and liability, specifically those relating to platform safe-harbours, are sufficient to address the unique issues raised by AI technologies. The structure of the existing regime carefully balances the protections for rightsholders against the burdens on other stakeholders. It would be contrary to sound public policy to make any amendments to the Copyright Act that would unduly increase the obligations or risk for copyright infringement liability of internet intermediaries, including, but not limited to, amendments that could make intermediary stakeholders liable for replicating a style or method of creation and amendments that would exclude developers of generative AI systems from the safe harbour provisions at section 31.1 of the Copyright Act.
In most jurisdictions, a work is not infringing unless its creator has improperly copied expressive content from a copyright-protected work — for example, under Canadian copyright law, a work must be produced or reproduced in "substantial part" for there to be infringement. Some have offered more novel infringement theories, challenging replication of an artist's style or arguing that all output of an AI system is an infringing work of the content the system was trained on. These theories are not founded in any cognizable copyright principles, and in fact run contrary to accepted copyright canons.
Styles and creative methods are not copyrightable. In Cinar v. Robinson, The Supreme Court established that courts must adopt a qualitative and holistic approach to determining whether elements of a work are "generic" and therefore not copyrightable in the context of an infringement claim. The "style" of a work is arguably too generic to attract copyright protection according to this approach. Extending copyright protection to styles would impede the creation of wide swathes of original works and would run headlong into core concerns around freedom of expression. It would be similarly troubling to create a rule that regulates the method of drawing from or looking at other works during the creative process as it would treat every instance of mere inspiration as a basis for a claim of infringement. As was recognized long ago in the U.S. case Emerson v. Davies, every new work "borrows and must necessarily borrow, and use much which was well known and used before."
Under the Copyright Act, safe harbour provisions shield intermediaries from obligations or liability in connection with alleged or proven copyright infringement where the intermediaries only provide the technical means by which others infringe. Canada's technologically neutral copyright laws have allowed stakeholders to constantly innovate without negatively impacting the creative and economic interests of rightsholders and users. The law has long been wary of permitting rightsholders to hold up technologies merely because they could potentially be used for infringing purposes. In Society of Composers, Authors and Music Publishers of Canada v. Canadian Assn. of Internet Providers, The Supreme Court of Canada held that Internet Service Providers do not authorize infringement by merely providing connectivity to their users—connectivity that the court emphasized was an important innovation for the dissemination of works. This holding mirrors a similar doctrine in U.S. copyright law which states that products or services that have substantial non-infringing uses do not invite copyright infringement. This rule exists to limit the copyright monopoly to its proper scope so that new technologies and the markets for them are allowed to develop. Generative AI is a technology engineered to create new works, not to copy or facilitate the copying of existing works. Excluding developers of generative AI systems from the Copyright Act's safe harbour provisions would put all innovation in the field of machine learning at risk.
The possibility that a generative AI system can, through "prompt engineering," be made to output content that is substantially similar to existing content does raise questions around the proper boundary between direct and secondary infringement. When an AI system is prompted by a user to produce an infringing output, any resulting liability should attach to the user as the party whose volitional conduct caused the infringement in the same way that the current Notice and Notice regime is designed to hold the internet subscribers committing infringing actions accountable for their behaviour rather than the internet service providers. The AI developer can be liable (or not) under doctrines of secondary copyright liability based on whether they had any actual knowledge that specific infringing material was being created using its system. A rule that would hold AI developers directly (and strictly) liable for any infringing outputs that users create would impose crushing liability on AI developers, even if they have undertaken reasonable measures to prevent infringing activity by users. Had that standard applied in the past, we would not have legal access to photocopiers, personal audio and video recording devices, or personal computers — all of which are capable of being used for infringement as well as for substantial beneficial purposes.
We are aware of concerns that AI systems can output content that is substantially similar to individual pieces of content on which they were trained. According to the leading research paper in the field, this is an exceedingly rare occurrence, even under adversarial prompting. The possibility that AI models can occasionally, despite the best efforts of their developers, output content that replicates existing expression is a bug, not a feature, and developers are taking a range of measures to limit that occurrence even further, including deduplication of training data. This problem is well understood to be an open research challenge in the AI developer community and is, on that basis, a focus of significant attention that is expected to lead to effective interventions. It is a problem more effectively addressed technically, rather than legislatively.
Comments and Suggestions
How AI Will Unlock Scientific Discoveries, Help Organizations Tackle Societal Challenges, and Improve Our Everyday Lives
AI's potential societal benefits to Canada and the world cannot be overstated. The technology's uses are extensive. From powering research that enables new scientific breakthroughs to product integrations designed to make everyday life easier, we're exploring responsible and innovative AI technologies that make a true difference for humanity. We are excited about the promise AI holds for solving some of the most persistent challenges facing our world.
AI has the potential to significantly improve healthcare, including maternal care, cancer treatments, and tuberculosis screening. For example, understanding how a protein folds is important for medical research, but it is also time-intensive and painstaking. Google DeepMind's AlphaFold predicted 200 million protein structures that previously would have taken several years each to discover, effectively saving hundreds of millions of years of researchers' time. Structural biologists who use AlphaFold have seen their productivity grow 20% faster than those who do not. Google Research also recently announced a new LLM that could be a helpful tool for clinicians: Med-PaLM.
AI can also help with mitigating and adapting to climate change: by tracking wildfire boundaries in real time; helping to reduce carbon emissions by decreasing stop-and-go traffic; and providing critical flood forecasts. Partnerships in the field of climate science will help organizations develop innovative solutions. For example, Google Research teamed up with American Airlines and Breakthrough Energy to bring together large amounts of data — like satellite imagery and weather and flight path data — to develop forecast maps to test if pilots can choose routes that avoid creating contrails (i.e., the thin, white lines sometimes seen behind airplanes that account for roughly 35% of aviation's global warming impact). This partnership showed that contrail avoidance has the potential to be a cost-effective, scalable solution to reduce the climate impact of flying.
In addition, AI is powering progress in making the world's information accessible to people everywhere. Google's Data Commons project synthesizes publicly available data from government agencies and other authoritative sources into an open source, API-accessible knowledge graph available to everyone. It links references to unique entities (such as cities, counties, organizations, etc.) that exist across different datasets to nodes on the graph, such that users can access data about a particular entity aggregated from different sources without the significant data wrangling procedures required to clean or join records. Data Commons is also now using LLMs to create a natural language interface that allows users to ask questions, making it even more useful.
And through our 1,000 Languages Initiative, we are working to build an AI model that will support the world's 1,000 most-spoken languages, bringing greater inclusion to billions of people in historically marginalized or underserved communities all around the world. While more than 7,000 languages are spoken globally, only a few are well represented online today. That means traditional approaches to training LLMs on text from the World Wide Web fail to capture the diversity of global communication. We've already made significant progress towards this goal with a Universal Speech Model trained on more than 400 languages. We also recently sponsored a competition that tasked researchers with developing AI models that could recognize American Sign Language Fingerspelling and translate it into text. Advances in this and similar areas can make the devices we all use more accessible.
AI can also make a difference in our everyday lives. For example, AI is already powering many products that millions (and in some cases billions) of people use, such as Google Maps, Google Translate, Google Lens, and more. And now we are leveraging AI to help people ignite and assist their creativity with Bard, increase their productivity with Workspace tools, and revolutionize the way they access knowledge in Search. These types of tools have the potential to make everyday experiences easier, more productive, and more creative.
How We Are Working With the Creative Community To Unlock New Opportunities
We also believe in AI's potential to amplify and augment human creativity, unlocking new opportunities for artists, creators, journalists, musicians, and consumers to engage creatively with new tools and expand the pie for everyone. In fact, we are already seeing creators exploring new areas, including the creation of new types of music, books, photography, and other art. We're excited about how AI can supercharge human creativity — not replacing it, but enhancing, enabling, and liberating it.
With this in mind, we are committed to building tools that increase access to information and create new and expanded economic and creative opportunities for artists, small businesses, and creators of all kinds. To do this, we are working closely with the creative community to put these tools in the hands of creators and to tackle new challenges as they emerge.
For example, we are working closely with our music partners to develop an AI framework to help us work toward our common goals. This includes YouTube's Music AI Incubator. The incubator will help inform YouTube's approach as we work with some of music's most innovative artists, songwriters, and producers in the industry, across a diverse range of culture, genres, and experience. We also announced a set of principles that will govern YouTube's work on AI.
In addition, we announced Lab Sessions, a series of experimental collaborations with visionaries — from artists to academics, scientists to students, creators to entrepreneurs — to help them use AI to compose new music, support the creative writing process, better learn sign language, and more.
Through the Google News Initiative, we are supporting training programs for journalists — so they can use AI in their work — and research into how AI can support the news ecosystem. In addition, we have built research tools like Pinpoint, which helps journalists and academics explore and analyze large collections of documents. Recently, a study by JournalismAI showed that almost three quarters (73%) of news organizations surveyed believe generative AI applications, such as Bard or ChatGPT, present new opportunities for journalism. Some respondents noted that AI can free up journalists' capacity for more creative work by taking on time-intensive tasks such as interview transcription and fact-checking.
We are also prioritizing approaches that will allow us to send valuable traffic to web publishers, including news publishers. We've heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases. That is why we announced Google-Extended, a new control that web publishers can use to manage whether their sites help improve Bard and our Vertex AI generative APIs, including future generations of models that power those products. And it's why we're committed to engaging with the web and AI communities to explore additional machine-readable approaches to choice and control for web publishers.
AI in Canada
Google's core platforms have long been powered by AI. Google was one of the first companies to use machine learning in our products and we became an "AI first" company in 2015. This technology offers radical potential for exponential growth, and Google is working to help Canada fully realize AI's economic potential.
Google has been in Canada since 2001, with offices in Waterloo, Toronto, and Montreal, proudly supporting Canada's thriving tech sector and the Canadians who use our products everyday. Google has AI research labs in Toronto and Montreal.
In Toronto, our mission is to drive innovation in neural network architectures and learning procedures in areas of strategic importance to Google. Beginning in 2016, the group's research spans a wide spectrum of basic questions in deep learning and diverse domains of application. This includes new architectures, new techniques for training neural networks, robust learning from small amounts of labeled data, and unsupervised learning. Current research is focused on four broad areas: neural network techniques, generative models, computer vision and graphics, and machine learning for systems and software.
In November 2016, Google announced the launch of the Google Brain Team in Montreal, a deep learning and AI research group linked to Google headquarters in Mountain View. The following year, DeepMind Montreal was launched. Since then, the team has grown into a diverse group of research scientists and engineers and has made major contributions to DeepMind's mission and the scientific advancement of AI, contributing to over 45 scientific papers in leading machine learning conferences and peer reviewed journals. These include work in diverse topics such as: Infection detection, collaborative AI, ethical and social risks, and theorem proving.
In 2023, Google Brain and Google DeepMind merged. Today, Google Research in Montreal performs both open-ended and applied research, in numerous areas including reinforcement learning, meta-learning, optimization, program synthesis, generative modeling, machine translation, and more. We also support the local academic community and have several academic collaborations, including with Mila – Quebec Artificial Intelligence Institute. In just five years, the local research teams in Montreal have helped boost the AI ecosystem in Canada, and the thriving research team is now helping reverse the "academic brain drain".
Guilde des musiciens et musiciennes du Québec (GMMQ)
Preuve de nature technique
Cette soumission constitue une position commune des associations et société suivantes :
Artisti, une société de gestion collective canadienne représentant divers artistes-interprètes pour la gestion collective de leur droit à la rémunération équitable et leur droit à la rémunération découlant de la copie privée ainsi que tout ou partie de leurs droits exclusifs ;
L'Union des artistes (UDA), un syndicat professionnel représentant les artistes de plusieurs disciplines œuvrant en français ou dans toute autre langue à l'exception de la production faite et exécutée en anglais;
La Guilde des musiciens et musiciennes du Québec (GMMQ), une association d'artistes légalement reconnue au Québec pour représenter les musiciens professionnels, notamment lors de la négociation d'ententes collectives visant leurs conditions de travail et de rémunération.
Celles-ci voient le potentiel de l'IA à titre d'outil de création : plusieurs de leurs membres s'en servent d'ailleurs comme d'instruments leur permettant de livrer une prestation. Il est néanmoins essentiel d'encadrer l'utilisation de la technologie, particulièrement dans le contexte de la fouille de textes et de données (« FTD »), puisque les prestations des artistes interprètes sont actuellement utilisées à cette fin à leur insu, sans rétribution.
Fouille de textes et de données
Question 1: Une plus grande clarté et transparence permettraient de mieux appréhender le fonctionnement de la FTD, incluant la façon dont les prestations d'artistes interprètes sont utilisées, ainsi que les rôles et responsabilités des différentes parties prenantes. Ceci permettrait également de déterminer : (i) dans quel(s) contexte(s) l'analyse informationnelle est autorisée, ou non, par le régime actuel de droit d'auteur canadien et ainsi, (ii) quelles licences et rétributions doivent être versées aux titulaires des prestations d'artistes interprètes.
Question 2: Oui, des activités de FTD sont actuellement menées au Canada, afin d'entraîner des modèles algorithmiques. Les activités de développement et d'entraînement de systèmes d'IA peuvent impliquer la reproduction de contenus protégés par droit d'auteur dont des prestations d'artistes interprètes ou leur voix et leur image hors prestation, et ce, sans qu'ils y consentent ou reçoivent une juste rétribution. Ceci est évidemment problématique et il importe d'y remédier. En outre, il est essentiel que l'autorisation (de type « opt-in » et non « opt-out ») des artistes interprètes soit obtenue préalablement à toute reproduction de leurs prestations, leur voix ou leur image, et qu'une rétribution juste et équitable leur soit versée en contrepartie de cette utilisation. L'obtention de ces consentements devra prendre en compte les particularités de chaque contenu reproduit. Par exemple, dans le cas des prestations fixées, un contentement distinct devra être obtenu auprès des artistes-interprètes si l'autorisation initialement consentie aux producteurs ne couvre pas la FTD, ce qui est le cas pour l'instant. À cet égard, il est également important de rappeler que les artistes interprètes qui ont consenti à ce que leurs prestations soient intégrées à une œuvre cinématographique ne peuvent présentement pas exercer leurs droits de l'article 15 (1) compte tenu de l'article 17(1) de la Loi sur le droit d'auteur et qu'ils ne peuvent pas non plus bénéficier de droits moraux à l'égard de ces prestations audiovisuelles. Afin de résoudre ces enjeux, le Canada devrait ratifier le Traité de Beijing, ce qui permettrait aux artistes-interprètes audiovisuels d'exercer un meilleur contrôle sur leurs prestations incorporées dans des œuvres cinématographiques.
Question 3: Oui. En outre, il est difficile pour les artistes interprètes de déterminer quel contenu est utilisé dans le contexte de la FTD et quelle est l'ampleur de cette utilisation. Afin de pallier cette lacune, il pourrait être envisagé d'imposer une obligation de transparence ou de tenue de registres auprès des entités développant et entraînant des systèmes d'IA.
Question 4: Diverses licences sont disponibles pour les activités de FTD impliquant l'exercice d'un droit réservé aux titulaires de droit d'auteur, à savoir la reproduction. Ces licences peuvent être négociées de gré à gré avec les titulaires de droits d'auteurs incluant les artistes interprètes ou être obtenues par le biais d'une société de gestion collective. En effet, en ce qui a trait aux artistes interprètes, la possibilité de faire des reproductions de leurs prestations aux fins de la FTD n'est généralement pas incluse dans les autorisations qu'ils ont données aux producteurs d'enregistrements sonores ou d'œuvres cinématographiques, ces autorisations visant essentiellement l'exploitation commerciale des enregistrements sonores et des œuvres cinématographiques. Il faudrait donc que des autorisations aux fins de FTD soient obtenues auprès des artistes interprètes ou de leur société de gestion collective. Les artistes interprètes ou leur société de gestion collective seraient tout à fait à même de les émettre.
Pour rappel : ces licences ne semblent présentement pas être obtenues par les personnes menant des activités de FTD. Ceci crée évidemment un manque à gagner notamment pour les artistes interprètes qui peinent à obtenir une juste compensation pour l'utilisation de leurs contenus. Comme les prestations reproduites aux fins de FTD sont des prestations fixées sur des enregistrements sonores ou audiovisuels et qu'il s'agit de prestations d'œuvres, plusieurs mécanismes peuvent être envisagés pour compenser les ayants droits visés pour cette utilisation qui est faite de leurs prestations : l'introduction d'un droit à une rémunération équitable pour la FTD ou un droit à rémunération via un mécanisme semblable à celui de la copie pour usage privé font partie de ces mécanismes.
Question 5: La reproduction des prestations d'artistes interprète aux fins de la FTD ne pas couvertes par les dispositions contractuelles qui encadraient la fixation de ces prestations. C'est donc dire qu'aux fins de cette activité, l'autorisation de l'artiste interprète devrait donc systématiquement être obtenue. En effet, il ne faut pas oublier que la reproduction d'une prestation aux fins de la FTD impliquera souvent la reproduction de la voix et de l'image d'un artiste interprète (des données biométriques), qui sont des attributs de sa personnalité protégés par les droits de la personnalité, le droit à la vie privée et les législations en matière de protection de données personnelles.
Compte tenu de ces différentes protections législatives, il semble donc impossible d'envisager une exception pour permettre l'utilisation de ces prestations impliquant la voix ou l'image d'un artiste aux fins de la FTD puisqu'un ensemble d'autres dispositions législatives contrecarraient et contrediraient l'introduction d'une telle exception.
Nous ne sommes donc pas favorables à l'adoption d'une exception générale permettant la FTD, laquelle serait, par ailleurs, également contraire aux engagements du Canada en vertu de divers traités internationaux, tels que la Convention de Berne, l'ADPIC et l'ACEUM lesquels précisent que toute limitation ou exception à laquelle le Canada entend assujettir un droit d'auteur doit être restreinte à certains cas spéciaux où il n'est pas porté atteinte à l'exploitation normale de l'œuvre, ni causé de préjudice injustifié aux intérêts légitimes de l'auteur.
Ainsi, si jamais le gouvernement décidait d'adopter malgré tout une exception de FTD (ce que nous ne recommandons pas), il devra veiller au respect de ses engagements internationaux, par exemple, en veillant à ce que l'exception soit : (i) limitée à des cas spécifiques (par exemple, à des fins de recherche) ; (ii) assujettie à des conditions d'application strictes (par exemple, l'accès à l'œuvre ou objet de droit d'auteur doit être licite) ; et (iii) assortie du versement d'une juste rétribution au bénéfice des titulaires de droits d'auteur, ainsi que d'un mécanisme de retrait (« opt-out ») pour les titulaires de droits d'auteur.
Finalement, cette exception ne devrait pas s'appliquer aux droits moraux, mais uniquement aux droits dits « économiques ».
Question 6 (tenue de registres): Oui, nous le recommandons : il s'agit d'une obligation essentielle qui devrait être intégrée dans la Loi sur le droit d'auteur. La transparence est l'un des principes fondamentaux qui devraient guider en tout temps les développeurs de systèmes d'IA.
Question 7: Le niveau de rémunération doit être juste et équitable, basé sur les utilisations faites des contenus protégés. Dans tous les cas, la rémunération devrait être arrimée avec les autorisations obtenues et prendre en compte les particularités de chaque contenu reproduit. Par exemple, dans le cas des prestations fixées, une rémunération distincte devra être versée aux artistes-interprètes si l'autorisation initialement consentie aux producteurs des contenus reproduits ne couvrait pas la FTD.
Question 8: Comme exposé plus tôt, il n'est pas recommandé d'introduire une exception de FTD au Canada. Au contraire, il est essentiel de veiller au respect de la Loi sur le droit d'auteur et des autres dispositions législatives trouvant présentement application (tels que les droits de la personnalité et ceux liés à la protection des renseignements personnels qui sont en jeux lors de la reproduction des prestations d'artistes interprètes à d'autres fins que celles initialement consenties ou la reproduction de leur voix et leur image hors-prestations) en s'assurant que l'autorisation des artistes interprètes soit obtenue et qu'une rémunération juste et équitable leur soit versée lorsque leur contenu est utilisé à des fins de FTD.
Nous recommandons également qu'une obligation de transparence ou de tenue de registres soit imposée aux chercheurs et développeurs de systèmes d'IA générative, dans le contexte de la FTD. Si toutefois le Canada souhaitait, en dépit de nos recommandations et en contravention des droits de la personnalité, droit à la vie privée et droits liés à la protection des renseignements personnels qui protègent la voix et l'image des artistes interprètes, introduire une exception de FTD, il devra veiller à ce que cette exception respecte les balises internationales, soit d'application limitée et assortie d'un mécanisme de retrait (« opt-out ») pour les titulaires de droits d'auteur.
À cette fin, le gouvernement canadien pourrait examiner la situation prévalant au sein de l'Union européenne, la Suisse et le Royaume-Uni.
Titularité et propriété des œuvres produites par l'IA
Question 1: Oui. Cette incertitude a notamment des répercussions sur la rémunération des artistes tels que les musiciens, dont les contenus se retrouvent « dilués » sur des plateformes telles que Spotify. En effet, dans la mesure où des contenus « artificiels » envahissent les plateformes de diffusion, les vrais contenus seront noyés dans cette mer de contenus « artificiels » qui pourraient accaparer une proportion des redevances qui seraient autrement destinées aux véritables artistes interprètes.
L'absence de protection des contenus « artificiels » a par ailleurs une incidence sur la protection des véritables prestations d'artistes-interprètes.
En outre, il existe une incertitude entourant la titularité et la rémunération liées à une prestation « artificielle » incorporant la voix, l'image ou la ressemblance d'un artiste-interprète, alors que celui-ci n'a pas autorisé une telle incorporation.
Enfin, selon la Loi sur le droit d'auteur, une « prestation » ne sera protégée que si elle est « rattachée » à une œuvre. Par conséquent, le droit des artistes-interprètes pourrait être mis en péril si ces derniers interprètent des contenus « artificiels », non protégés par le droit d'auteur.
Nous recommandons donc que les prestations soient protégées et ce, indépendamment du fait que les artistes interprètent ou exécutent des contenus « artificiels », non protégés par droit d'auteur. Après tout, les prestations d'artistes interprètes sont protégées même si elles portent sur une œuvre du domaine public ne bénéficiant plus de la protection du droit d'auteur. Il serait donc possible d'étendre la protection des prestations afin de prévoir que la prestation d'un contenu « artificiel » est protégée au même titre que la prestation d'une œuvre, et ce, d'autant plus que l'article 9 de la Convention de Rome est à l'effet que « Tout État contractant peut, par sa législation nationale, étendre la protection prévue par la présente Convention à des artistes qui n'exécutent pas des œuvres littéraires ou artistiques. »
Une meilleure protection des droits des artistes interprètes pourrait donc être atteinte en (i) revoyant les définitions de « prestation » et d'« artiste-interprète » au sein de la Loi sur le droit d'auteur ; (ii) introduisant les droits moraux pour les artistes-interprètes audiovisuels (par exemple, par le biais de la ratification du Traité de Beijing) ; et (iii) introduisant des présomptions de violations des droits économiques et/ou moraux des artistes-interprètes lorsque leurs prestations (ou des composantes de celles-ci telles que la voix ou l'image) sont reproduites dans un contexte d'IA générative à leur insu.
Question 2: Si aucune contribution humaine ne peut être identifiée en lien avec une « prestation artificielle», nous ne recommandons pas de la protéger. Toutefois, dans la mesure où une contribution humaine est identifiable en lien avec une prestation artificielle, que ce soit : (i) par l'intégration d'une prestation, de la voix, de l'image ou de la ressemblance d'un artiste interprètes ou (ii) par une utilisation de l'intelligence artificielle par un humain qui pourrait être assimilée à celle d'un musicien instrumentiste, cette contribution humaine devrait bénéficier d'une protection.
Le gouvernement pourrait aussi préciser qu'un « artiste interprète », aux fins de la Loi sur le droit d'auteur, est obligatoirement un être humain. Il pourrait également prévoir une présomption à l'effet que la voix et l'image d'un artiste interprète constituent une partie importante de sa prestation.
Ceci permettrait d'avoir plus de certitude quant à l'application de la Loi sur le droit d'auteur.
Enfin, il est également recommandé de modifier la définition d'« artiste-interprète » et de « prestation » au sein de la Loi sur le droit d'auteur, afin que la prestation ne soit plus uniquement rattachée à des œuvres. Le fait qu'une prestation en soit une d'un « produit de l'intelligence artificielle » plutôt que d'une œuvre ne devrait pas faire obstacle à sa protection.
Question 3: Non, il n'existe pas d'approches éclairantes dans d'autres pays. Si les législations du Royaume-Uni, de l'Irlande et de la Nouvelle-Zélande ont choisi d'attribuer la titularité d'œuvres générées par ordinateur à la personne qui a pris les dispositions nécessaires à la création de l'œuvre créée, nous ne recommandons pas d'emprunter cette voie, car ces dispositions ont été introduites dans un contexte étranger à l'IA générative. Or, cette technologie soulève des questions bien plus complexes. De plus, nous n'avons pas connaissance que la question spécifique de la titularité de la prestation, de la voix et de l'image d'un artiste interprète dans un produit de l'intelligence artificielle générative ait été abordée dans quelque juridiction que ce soit.
Violation et responsabilité en matière d'IA
Question 1: Il peut être difficile pour un artiste interprète :
(a) d'identifier la ou les personnes responsables d'une violation de ses droits ou d'une contrefaçon de sa prestation ; et
(b) d'établir que la partie qui a utilisé sa voix, son image ou sa ressemblance a eu accès à une prestation préexistante (plutôt que simplement sa voix ou son image hors prestation), que la prestation (et non simplement la voix ou l'image hors prestation) était la source de la copie et qu'une partie importante de la prestation a été reproduite.
De plus, comme il n'y a pas de présomption intégrée à la Loi sur le droit d'auteur à l'effet que la voix ou l'image d'un artiste interprète constituent une partie importante de sa prestation, les critères juridiques existants pourraient ne pas permettre de démontrer qu'un produit de l'intelligence artificielle qui utilise la voix ou l'image d'un artiste interprète viole le droit d'auteur que celui-ci détient sur ses prestations.
Question 2: La pluralité des intervenants, l'opacité des systèmes d'IA, et la pixellisation de certaines prestations, ainsi que de la voix et de l'image des artistes-interprètes, ce qui rend les prestations originales difficilement identifiables.
Question 3: Nous n'avons pas connaissance que les entreprises commercialisant des applications d'IA prennent de telles mesures.
Pour éviter que les produits de l'IA ne violent le droit d'auteur des artistes interprètes sur leurs prestations, les autorisations nécessaires pourraient être obtenues en amont des utilisations par le biais de licences.
L'option d'utiliser des prestations faisant partie du domaine public ne permettrait pas d'éviter à tous coups une violation d'autres droits que le droit d'auteur, tels que le droit à la voix ou le droit à l'image puisqu'il est possible qu'un artiste interprète survive à la durée de protection de ses prestations. Le cas échéant, l'utilisation sans autorisation d'une prestation du domaine public incorporant sa voix ou son image continuerait néanmoins de résulter en une violation de ses droits de la personnalité.
Question 4: Non, la Loi sur le droit d'auteur dispose de mécanismes suffisants pour déterminer la responsabilité en cas de violation de droit d'auteur.
Cela dit, il serait néanmoins souhaitable qu'aux fins de la détermination de ce qui constitue une contrefaçon d'une prestation, il soit reconnu, par l'introduction d'une présomption, que la voix ou l'image d'un artiste interprète constitue une partie importante de sa prestation.
Enfin, le Canada pourrait imposer une obligation de transparence ou de tenue de registres auprès des entités développant et entraînant des systèmes d'IA.
Question 5: voir réponse à la question 2.
Question 6: Oui. Dans son projet de règlement « AI Act », le Parlement européen a introduit une obligation de transparence, de sorte que les entités qui développent des systèmes d'IA devront publier un résumé suffisamment détaillé de leur utilisation de « données d'entraînement protégées par la législation sur le droit d'auteur », ainsi qu'une information appropriée, claire et visible qui distingue le contenu généré de l'original. Cette approche nous paraît louable, mais le Canada devrait aller encore plus loin. En outre, l'obligation de transparence canadienne devrait également s'appliquer aux prestations et à leurs composantes (voix, image et ressemblance de l'artiste-interprète), ainsi qu'aux résultats générés par ou avec IA.
Commentaires et suggestions
La consultation publique est accueillie favorablement par nos organisations, lesquelles voient en cet exercice une volonté du gouvernement de clarifier les incidences de l'IA sur le droit d'auteur. Nos organisations ne souhaitent pas freiner l'avancement de l'IA, mais désirent préserver l'équilibre que la Loi sur le droit d'auteur sous-tend, en veillant à préserver la culture canadienne, la créativité humaine, ainsi que les intérêts des titulaires de droits d'auteur.
Pour ce faire, nous recommandons que les principes regroupés sous l'acronyme « A.R.T. » (Autorisation, Rétribution et Transparence) guident les actions du gouvernement, dans le contexte de cette consultation publique et des possibles amendements à la Loi sur le droit d'auteur qui en découleront.
Par ailleurs, il est important que la consultation publique ne se limite pas aux intérêts des auteurs et autres titulaires de droits d'auteur sur des œuvres, mais qu'elle couvre également les intérêts des artistes-interprètes sur leurs prestations ainsi que sur leur voix, leur image et leur ressemblance.
L'IA générative bouleverse en effet grandement ces créateurs, notamment dans le contexte de l'hypertrucage (ou « deepfake » en anglais). À ce chapitre, les artistes-interprètes audiovisuels ne disposent pas de droits suffisants pour protéger leurs prestations, y compris dans le contexte de l'IA générative et de l'hypertrucage. Afin de pallier cette situation, il est recommandé d'étendre les droits exclusifs et les droits moraux de ces artistes, par exemple, en ratifiant le Traité de Beijing.
H
Hinterland
Technical Evidence
We do not use generative AI tools or LLMs in our business, as we are a creator-focused tech company who believes in the primacy of human creativity, and the value of copyright for the protection of IP and creator's rights.
We do extensively use AI for our products, but they are video games and therefore we use AI differently than the current crop of AI tools like Midjourney and ChatGPT.
Text and Data Mining
Currently it's unclear whether existing copyright, trademark, or other IP-protective legal frameworks are sufficient to protect our creations from TDM or other scraping techniques employed in the development of LLMs and other generative AI technologies.
Authorship and Ownership of Works Generated by AI
Yes, the government should clarify the status of AI-generated outputs as it relates to copyright and trademark protections. There is vast uncertainty within the tech and entertainment industries, and it will be very important for Canada to have clear laws around this. The lack of a clear and safe environment within which to invest in the creation of entertainment IP will have a significantly negative impact on industries like video games, animation, film and television, literature, music, and art, all industries that are culturally significant to Canada and represent a significant amount of economic activity — both in terms of investment, employment, and revenue.
Infringement and Liability regarding AI
N/A
Comments and Suggestions
N/A
I
ID Quotient Advisory Group
Technical Evidence
Within my firm we use Generative AI for RFP Creation, code generation, defining new opportunities for robotic process automation as it pertains to compliance and procurement. Leveraging machine learning and neural network is germane to our ability to meet client timelines and reduce OPEX costs to our clients. In particular, we often use AI to assess gap analysis between legacy IT infrastructure assets.
Human beings are imperative to development of AI systems but caution must be exercised as it pertains to confirmation and sample bias in particular. Circular logic in algorithms is a major concern, particularly when using AI to create services and processes for the public.
Citizen social services offer the greatest area to observe impact of robotic process automation and AI – as we consider the administrative heavy tasks overseen by institutions such as the Canada Revenue Agency, the Ministry of Health, and Service Ontario, amongst others. As we look at metrics of success, savings on taxpayers dollars should be a key metric to expedite implementation of AI
Text and Data Mining
TDM activities are being conducted in Canada, in particular in the finance and insurance industry around use cases where underwriting is considered. Safeguards for TDM should rely heavily on guidance from PIPEDA and GDPR – to ensure that any platforms hosting personally identifiable information are safeguarded against cybersecurity threats.
TDM approaches in countries such as Norway and Estonia take a more innovative and future-focused approach
Authorship and Ownership of Works Generated by AI
There is uncertainty is understanding how to use GenAI to create outlines and draft research – this is analogous to the perception of using internet sources in research papers and content creation that existed in the early 2000s. Education institutions must ensure that training for natural language processing and also proper use cases for content creation are part of curriculum moving forward. Approaches in Japan to AI and education are a good resource for Canada when considering this issues.
Infringement and Liability regarding AI
Infringing outputs should go beyond boolean search and leverage contextual insights when trying to determine copyright infringement. Generative AI can in fact be utilized to collate legislative resources for copyright law.
Comments and Suggestions
N/A
IMPF – Independent Music Publishers International
Technical Evidence
IMPF (Independent Music Publishers International Forum) represents 200 of the world's leading independent music publishing companies. We are engaged in international AI related policy discussions, and have submitted to enquiries in the United States, the European Union ("AI Act"), the United Kingdom and Australia. In October 2023, we published ethical guidelines on generative Artificial Intelligence welcoming technological developments in as far as they improve our business and the capacity to assist the writers we represent. These guidelines are aimed to enhance the relationship between the creative side, in our case writers and music publishers, and tech companies providing AI applications. This should ultimately enable transparent collaboration for the benefit of all stakeholders including users of AI applications. Given the rights we represent our comments concern musical and literary works only.
We welcome this timely consultation. A legally, politically, and commercially successful AI ecosystem depends upon all relevant parts working in tandem for this common goal. The fundamental starting point for the collaboration is compliance with the law, in this case mainly copyright law but also other rules such as data protection and unfair competition.
Tech companies providing AI applications (run by various, sometimes third-party entities selling datasets, ultimately for commercial purposes) scrape the internet to collect data for machine learning. This involves many rights which require express permission by rightsholders including copyright for the reproductions. In our view such requirement is not superseded by any of the potentially available exceptions (e.g., text and data mining, temporary copying, fair use depending on the jurisdiction). In the absence of binding Canadian court decisions on the application of exceptions, general copyright rules apply and the express permission by the creator and the rightsholder is required. In any case, an exception would only apply to copyright but not to other rights such as data protection and unfair competition rules.
Additionally, data scraping is often expressly prohibited in the Terms and Conditions of the scraped websites; this constitutes a legally binding express prohibition which needs to be respected.
Text and Data Mining
Text and data mining
Three preliminary observations on the application of text and data mining (TDM) exceptions in the machine learning process:
- Copyright laws is not the only consideration relating to activities by artificial intelligence service providers; consequentially, we recommend that Government also seeks information on other applicable legal instruments including data protection and unfair competition rules.
- Even within copyright law, other activities including communication to the public, might require a licence; such activities by definition are not covered under a (text and data) mining exception.
- We generally challenge the notion that musical and literary works are only referred to as data; whilst this might suit the purpose of commercial service providers, music and words are much more important for individual humans and society as a whole than "data".
Clarity around copyright legislation and TDM in Canada is important but it is for judges to interpret the law. We are concerned about the separation of power if the interpretation of existing laws is moved from judges to policymakers.
Text and data mining exceptions should not be used to avoid requesting licences. We urge caution should government decide to amend legislation to cater for the asserted needs of a specific sector to the detriment of another sector.
Licensing constitutes the general manner in which the use of human creative talent is permitted. The music publishing industry has been licensing novel uses in response to technological developments from mechanical musical boxes through radio to music streaming for centuries. However, in order to provide such a license, it needs to be requested in the first place, providing details of the requirements of AI service provider, including the potential uses of the output generated on the basis of our creative works. Evidently, it is the choice of the creator and/ or the rightsholder whether or not to allow specific uses. And under which conditions. Observations on licensing are academic if such licences are not requested in practice.
We note that the consultation asks about the details of potential licenses and the potential level of appropriate remuneration. We are concerned about the potential competition law aspect of this question. In general, licensing conditions depend on the individual creator or rightsholder as well as the musical and literary works in question. Any negotiated licence needs to reflect the actual value of an individual song for the creator and/or rightsholder as well as the individual user.
The requirement for such express permission should not be circumvented by "offshoring" the machine learning process to countries setting themselves up as copyright havens. Government should consider guidance to ensure that tech companies do not manipulate jurisdictional rules to flout domestic copyright requirements. Any such guidance should also consider copyright infringements committed by AI service providers in the past; we note that most available AI applications are based on datasets of creative works which have already been ingested, mostly without permission.
Record keeping is an important element of transparency. AI service providers, including AI developers and mere dataset providers should be obliged to keep records of, or disclose what copyright-protected content was used in the training of AI systems.
Internationally, we note that many governments are looking into national approaches without any clear approach crystallising (in particular given the absence of court decisions). However, we refer government to the many useful ethical guidelines put forward by rights holders' organisations in various creative sectors such as photo libraries or the Human Artistry Campaign.
Authorship and Ownership of Works Generated by AI
Authorship and ownership of works generated by AI
We are concerned about the possibility of the perverse situation should AI service providers copy musical and literary works without remuneration for the creator or the rightsholder, generating competing works protected by copyright, competing with the original music they have unduly copied.
Questions of authorship or ownership in relation to AI require a clear differentiation between AI-assisted and AI-generated works.
- Creators using AI applications as a tool (AI assisted): the authors are generally the initial owners of the copyright in the works they create. This is based on general copyright concepts.
- Purely AI generated works (i.e., without human intervention) are different. Stating the obvious, artificial intelligence applications apply algorithms to existing datasets to make predictions/interferences for new settings. It is very sophisticated but in no way creative, a purely stochastic process. In fact: neither artificial nor intelligent.
Copyright rewards the expression of human creativity and talent, which self-evidently is lacking without human input. A work generated by an artificial intelligence application without any human input, is currently not protected by copyright.
To our knowledge this is the case everywhere in the world. For example, in India, where an AI application has been registered as a co-owner of an AI generated painting ("Suryast"), human contribution was required to establish copyright protection in the first place. It was not possible to register the AI application (Raghav) as sole owner of the AI generated painting; a human creator was required to establish copyright protection. Based on philosophical, historical, and legal justifications of copyright (amongst others as a human right under the Universal Declaration of Human Rights) most jurisdictions do not grant copyright protection for purely AI generated works, amongst many the United States (Copyright Office memorandum concerning the registration of copyright) and the European Union (CJEU jurisprudence focusing on the author's own intellectual creation expressing their personality).
However, we note the practical challenges in establishing whether a work is created by a human with the assistance of an AI application or generated without any human involvement. A human will invariably be involved at some stage, even if only "prompting" the AI application. The main challenge however is the delineation between AI assisted works and AI generated works.
We suggest that government provide clarity for instance by issuing guidance on the differentiation between AI assisted or AI generated works and their respective copyright situation.
In this context, it is worthwhile referring to the need of adequate labelling of AI generated works as such. This will particularly ensure the protection of consumers to make an informed decision on what product or service they want to acquire. Aside from personal preferences, consumers might reject AI generated works due to their high energy costs and the environmental impact of their production.
Infringement and Liability regarding AI
Infringement and liability regarding AI
We note the challenges to establish copyright infringement by AI generated output. In the absence of record keeping, it is impossible for rightsholders to determine whether an AI developer has used their works (to our knowledge no technological approach exists yet to obtain such information a posteriori). The commercial attractiveness of AI generated output depends mainly on high quality datasets of creative works. Record keeping of the musical and literary works ingested in the machine learning process is key in this regard as well. It constitutes good business practice to provide information about the constituent parts of a product or service (similar to the requirements on fair trade clothing, where the source of every part of the final product has to be notified to qualify for fair trade certification).
Infringement by the output of AI applications presumably follows "normal" copyright enforcement rules when identifying infringement of previous works or derivative works. We note that a variety of natural or legal persons can be solely or jointly liable including the developer of a generative AI model, the developer of the system incorporating that model, end users of the system but also third parties providing the datasets; and more generally, the person/entity ultimately benefiting from the AI generated output. Government might address the scope of potentially liable persons and entities.
We suggest that measures which business can take to mitigate risks of liability for infringing AI-generated works consists in simply complying with the law, i.e., by obtaining the required permissions. We note the related discussions in the United Kingdom: original plans to introduce a specific exception for text and data mining for any purpose including machine learning were abandoned in early 2023 and replaced by discussions on a code of good practice between all stakeholders. Such a code of practice can only be successful if participants agree on the overriding principle of compliance with existing laws (including but not limited to copyright).
Comments and Suggestions
We note the importance of personality or publicity rights to address the situation where AI is used to imitate a person's likeness, voice, or image. When a distinctive voice of a professional singer is widely known and is deliberately used in order to sell a product, this constitutes misappropriation. It is important to note that the resulting damages can be economic or otherwise (such as damage to their reputation or goodwill or causing distress).
Intellectual Property Institute of Canada
Technical Evidence
The Intellectual Property Institute of Canada (IPIC) is the professional association of patent agents, trademark agents and lawyers practicing in all areas of intellectual property law. Our membership totals over 1,850 individuals, consisting of practitioners in law firms and agencies of all sizes, sole practitioners, in-house corporate intellectual property professionals, Government personnel, and academics. Our members' clients include virtually all Canadian businesses, universities and other institutions that have an interest in intellectual property (e.g., patents, trademarks, copyright, and industrial designs) in Canada or elsewhere, as well as foreign companies who hold intellectual property rights in Canada.
IPIC is pleased to provide these comments in response to the consultation initiated on October 12, 2023, on Copyright in the Age of Generative Artificial Intelligence (AI), and the accompanying consultation paper issued by the Government of Canada.
As a preliminary comment, IPIC remains of the view that the Government of Canada should navigate the interplay between copyright and AI with moderation and restraint. To the extent that stakeholders respond to the Consultation with evidence to suggest that the current copyright regime may not strike the right balance between copyright holders and users, IPIC encourages the Government to first report the evidentiary findings and to invite comment on possible policy implications and whether legislative change is needed based on the evidence. In IPIC's view, the current copyright regime is sufficiently flexible to allow for fair text and data mining activities, to recognize copyright protection in certain works created with the assistance of AI tools, and to appropriately allocate liability in cases of infringement involving AI. Legislative change appears premature; IPIC does not believe there is sufficient evidence to indicate that the current regime requires amendment at this time. Reflexive approaches that do not take into account the speed with which AI is evolving and the diversity of AI technologies have the potential to calcify rapidly stale-dated statutory schemes and to either create unreasonably broad copyright exemptions or to hamper innovation. Such approaches therefore miss the opportunity for courts addressing specific situations to find the appropriate middle ground, and risk disrupting emerging licensing markets benefitting both AI developers and copyright owners. It is also important to highlight that this conversation is not taking place in an isolated Canadian environment. These are questions that are being discussed globally, and Canada should proceed with caution to ensure that any actions we take are consistent with the evolving global standards and principles. Should any amendment be undertaken, it remains important that Canada respects its international treaty obligations, including with respect to the Berne Convention and other international agreements to which Canada is a party, while still making policy decisions that permit Canada to remain competitive internationally, and to remain generally aligned with our trading partners, particularly when it comes to AI.
Text and Data Mining
The provisions contained in the current Copyright Act are likely sufficient to accommodate "fair" TDM activities in connection with copyright-protected works. Introducing specific provisions for TDM now would be premature
Canadian copyright law has long accommodated and embraced new technologies, including via the doctrines of fair dealing and technological neutrality. These doctrines are designed to balance the rights of owners and users in a predictable and fair manner. Accordingly, IPIC does not believe amendments to the Copyright Act (the "Act") pertaining to generative AI are necessary or appropriate at this time, including the introduction of any exceptions or limitations, such as an exception for TDM from the Act's requirements, just because AI is involved.
IPIC supports preserving the existing legal framework for protection of copyright and related rights, including the doctrine of fair dealing applied in an appropriate manner to determine whether a statutory exception to copyright infringement applies (i.e. the dealing is for an enumerated purpose). In such cases, courts can then determine whether the dealing is "fair" in fact-specific circumstances based on the application of six non-exhaustive jurisprudential factors. While other countries have adopted more specific exception-based systems (e.g. Japan and Singapore), IPIC believes that Canada's existing fair dealing and technological neutrality frameworks are adequately robust and flexible to address novel AI uses predictably and fairly. IPIC predicts it would be difficult to draft a TDM exception into the legislation that properly calibrates various interests, whereas courts are best placed to conduct such analyses based on fact-specific circumstances. IPIC is of the view that there is no demonstrable need at this time for Canadian copyright law to adopt special fair dealing exceptions for AI, including any TDM exception. However, if Parliament is inclined to introduce a TDM exception it must be appropriately and carefully calibrated so that any use does not conflict with a normal exploitation of the work; does not unreasonably prejudice the legitimate interests of copyright owners or undermine the policy objectives of the Act; and does not conflict with Canada's obligations in relation to technological protection measures and rights management information under the WIPO Copyright Treaty, the WIPO Performances and Phonograms Treaty or the Canada-United States-Mexico Agreement.
Another reason to refrain from introducing new exceptions is to prevent bad actors from relying on overbroad TDM exceptions as a pretext for both infringement and the downstream use of infringing works for unauthorized purposes. Further, as discussed below, copyright owners' licensing markets for training AI models have been developing, and amendments to the Act that would broadly exempt certain unauthorized uses could interfere with those emerging, mutually beneficial markets. Further, building conditions into a TDM exception to safeguard emerging models may invite complication.
While AI will continue to raise many interesting and important copyright issues, at present IPIC is not aware of evidence to suggest that the Act and case law interpreting it are ill-suited to address these issues. As new issues arise, Canadian courts are well placed to approach these questions in a thoughtful and careful manner, using the existing flexible framework guided by the doctrines of fair dealing and technological neutrality. Accordingly, IPIC submits that any changes to legislation, if necessary, should support and bolster creators' control over their datasets and/or works, including the marketplace that enables a healthy open AI ecosystem and AI development.
In time, there may be justification for the introduction of exceptions and/or limitations to address very specific fact situations.
Affirmative Consent (Opt-Ins) to the Use of a Copyright Owner's Works for Training Materials
As a basic principle, if fair dealing or another copyright exception does not apply to the use of works for AI training, such use is infringing, and it is the user's obligation to affirmatively obtain consent from the owners to use the owners' works (i.e. opt-in processes). While some AI developers have taken a step in the direction of opt-out processes, proposals for opt-out processes present challenges for the proper implementation of consent systems and for enforcement of copyright.
If it is decided that an opt-out process is workable then careful consideration should be given to the respective onuses of rightsholders and users (e.g. developers).
Copyright Policy Should Support Voluntary Direct Licensing
As a matter of policy, the Act should strive to promote voluntary licensing transactions between copyright owners and prospective users. At this time, there are several emergent examples of copyright owners and companies engaged in training generative AI models and systems entering into voluntary licensing agreements, and therefore government intervention appears unnecessary. In fact, as it relates to certain industries, the emergence of direct voluntary licenses has already occurred, because some copyright owners have already entered into licensing agreements with AI companies. For example, the AI company Bria has a license with Getty Images that gives it rights to the photographs it uses for training. OpenAI has entered into license agreements with Shutterstock to pay for specialized content and with Associated Press to access the news agency's archive of stories.
These types of agreements and policies show that market-based solutions, which both respect copyright owners' rights (and provide creators with market-based compensation), and facilitate the training of generative AI models are continuing to develop. Therefore, voluntary direct licensing is both feasible and desirable for different industries and for a variety of rights and uses. Copyright policy should support, not undermine, voluntary direct licensing schemes as they develop in the market. As such, there appears to be no reason to turn to compulsory licensing or extended collective licensing at this time.
Transparency and Record Keeping on Materials Used to Train AI Models
IPIC sees the benefits in obliging those who provide AI services or systems to the public to maintain appropriate records identifying the materials used to train their models. These records could allow rightsholders to track and license uses of their intellectual property while allowing the public and courts to meaningfully assess the lawfulness, as well as the reliability, of the developers' activities. Maintenance of such records may also be required because of anticipated litigation.
Without such obligations, it may prove difficult for rightsholders to discern whether their works and other protected content were used in the development and training of models in such circumstances as when faced with system outputs suggestive of possible infringement or text and data mining activities of a character suggestive of unfair dealing with copyright-protected works or circumvention of technological protection measures. IPIC believes this deserves specific further study, including to help inform transparency requirements from a rightsholder's perspective to ensure their rights are properly balanced against the interests of those dealing in copyright-protected -works for fair text and data mining activities. In order to ensure the safety of these systems and to have functioning copyright and privacy frameworks, AI systems must be accountable for the works they ingest and are trained on. Accordingly, AI developers and creators of training datasets should ideally be required to collect, retain, and when requested by rightsholders, disclose records identifying the material used to train their models.
While there was no consensus on whether amendments to the Act were the better means to achieve these objectives, it was suggested by some that Bill C-27, The Artificial Intelligence and Data Act (AIDA), which is intended to establish requirements for the design, development and use of AI systems, might provide a regulatory vehicle for such obligations. IPIC believes the Government of Canada should be thoughtful about the context and nuances of any recordkeeping requirements to ensure that policies are narrowly targeted to achieve the desired goal. It is also important that any suggested transparency and disclosure requirements not be overbroad in scope. Any record keeping requirements should respect confidentiality when all parties to the transaction (i.e. the owners of the training materials and the AI trainers) agree to maintain confidentiality.
Authorship and Ownership of Works Generated by AI
IPIC does not believe amendments to the Act pertaining to generative AI are necessary or appropriate at this time, including to address questions of authorship and ownership of AI-generated works or AI-assisted works.
Technologies utilizing some form of machine or computational intelligence have existed, and contributed to the creation of original expression for decades. Recent developments have advanced at a significant pace, but that does not necessarily mean that AI developments will require copyright law to evolve in a dramatically different manner. Developments in AI, like preceding technological advancements, have a great potential to enhance—not replace—human creativity. IPIC believes these developments can, and should, co-exist with a copyright system that incentivizes the creation of original expression and protects the rights of copyright owners.
Humans are and will remain at the heart of the creative process, and as such, IPIC believes that consistent with the Act, fully machine generated works should not be copyrightable. At the same time, IPIC believes that AI, including potential uses of generative AI as it continues to develop, can be a powerful tool in the hands of authors. IPIC supports a robust copyright system that facilitates and provides incentives to create copyright protectable works, including by protecting certain works that creators make with the assistance of generative AI – in the same way that such principles apply to uses of other technologies that assist creators in realizing their vision.
Generative AI broadly covers many variations of AI technologies, many of which have been in use for many years and should not raise the copyrightability and authorship issues presented by popular prompt-based tools.
While AI will continue to raise many interesting and important copyright issues, based on its current knowledge of AI technology, including "Generative AI" IPIC believes the Act and case law interpreting it are well-suited to address these issues, and courts should apply existing copyright principles when addressing legal issues arising out of the use of AI technologies. For example, for copyright to subsist in a work, the work must be 'original'. Canadian courts have held that for a work to be 'original' it must result from the exercise of skill and judgment, which necessarily involves intellectual effort. The Supreme Court of Canada has held that the requisite "skill" will involve the use of an author's own "knowledge, developed aptitude or practiced ability in producing the work". The requisite "judgment" will involve the author's "capacity for discernment or ability to form an opinion and evaluation by comparing different possible options in producing the work". The author's exercise of skill and judgment must not be so trivial so as to be characterized as a purely mechanical exercise. The copyright regime also affords protection to compilations, which are works that result from the selection or arrangement of data, or of literary, dramatic, musical or artistic works or parts thereof. For a compilation to be original, the author must have exercised skill and judgment in either the exercise of selecting or arranging materials into the compilation. The Canadian standard for originality differs from that in other jurisdictions. We have yet to have a Canadian court apply the originality standard to AI generated or assisted works. The Canadian standard for originality will result in copyright protection for a work created where an author's skill and judgment is exercised, including if AI is used to assist the expressive activity. Separately, copyright protection would also arise where an author exercises skill and judgment in compiling works or data that have been generated using AI.
IPIC believes that copyright should protect works that result from and reflect human expression, including where AI is a tool or assists in the authorship of original works. Conversely, copyright should not protect subject matter where there is no expressive contribution from a human author. The current Canadian originally standard appears to strike an appropriate balance. The authorship determination should continue to focus broadly on the question of originality, and in particular on the human author's exercise of skill and judgment, and the intellectual effort applied to the author's process and decisions to produce the work (for example, with respect to compilations, how the author selects, arranges, and/or positions elements of the ultimate work). Focusing on these human choices ensures that copyright subsists in original works that are derived from the author's "skill and judgment." The same reasoning can apply to human uses of generative AI. Input material provided to the AI tool (like a drawing or photo), refinements and direction all involve intellectual and creative contributions inseparable from the ultimate work. Creators can employ generative AI systems as tools to enhance the expressive process, just as they have availed themselves of software that render computer enhanced imagery, such as Adobe Photoshop, and received copyright protection for their works. The fact that creators produced some parts of a work with the assistance of AI should not render those portions uncopyrightable, provided the author exercised sufficient skill and judgment in the creation of the work to result in the work being original.
While IPIC appreciates that some commentators call for additional clarity regarding the authorship and ownership of works generated by AI to create more certainty in the marketplace, IPIC cautions against introducing any exception to the longstanding principles of copyrightability when humans employ AI as a tool in their creative processes. Doing so at this time would be premature, and considering the rapid pace of development of AI technology and associated global standards and principles, appears likely to have unintended consequences. An approach that attempts to isolate the use of AI and treat it differently than other technologies used to support human-driven expression threatens to impair rather than stimulate and support creativity. The Government of Canada should therefore resist trying to draw definitive conclusions based on limited experience and information.
Infringement and Liability regarding AI
The Existing Liability Provisions Are Sufficient to Address Copyright Infringement
AI technologies present new opportunities for third parties to make unauthorized use of works and infringing use of works that are substantially similar to those works.
At least two potential scenarios implicate traditional principles of copyright law. First, if a pre-existing copyrighted work is copied to train an AI model, and then the AI system outputs a substantially similar copy, that scenario would present a clear case of infringement of the reproduction right. Second, users' prompts to AI systems may result in AI-generated outputs being unauthorized reproductions or adaptations of copyright works.
With that context in mind, IPIC believes the existing liability provisions in the Act establish a general, well-accepted framework for analyzing claims of direct and secondary copyright infringement in the context of new technologies. While a precise determination of which parties are directly or secondarily liable will depend on the facts, courts should be able to apply traditional copyright principles to new technologies, given the robust case law in Canada regarding the principles of fair dealing and technological neutrality. As a result, IPIC does not believe amendments to the existing liability provisions in the Act to specifically address generative AI is necessary or appropriate at this time.
Comments and Suggestions
N/A
The International Alliance of Theatrical Stage Employees ("IATSE")
Technical Evidence
The International Alliance of Theatrical Stage Employees ("IATSE") is the largest trade union representing workers in Canada's entertainment industry. Founded in 1893 (1898 in Canada), IATSE has over 170,000 members - 34,000 of whom are in Canada. Their membership is comprised of virtually all of the behind-the-scenes workers necessary to the functioning of the entertainment industry - across film & television, animation, live entertainment, conventions, and trade shows. IATSE represents a wide range of creators and highly skilled technicians, including cinematographers, SPFX artists, animators, costume designers, props masters, hair stylists, makeup artists, aerial riggers, scenic carpenters, and many more. On a film set, the majority of the people working will be IATSE members. In a word, they are the crew.
IATSE does not collect copyright protected content, use AI-generated content or develop AI systems. However, IATSE members are directly impacted by the use of AI-assisted and AI-generated content in the entertainment industry. The use of AI systems in the entertainment industry affects both current and future work opportunities and the ability of human workers to protect their livelihoods.
Absent legal safeguards to protect copyrighted works and intellectual property, AI can be used as a tool for sophisticated theft and unauthorized replication of creative works that IATSE members have brought to life. Without legal protections, there will be incentive to train AI technologies to consume creative works for profit to the detriment of workers.
AI is already a tool in use in the entertainment industry which is grappling with its unchecked proliferation. Realistic replications have been made of creative works using AI including voices, faces, performances, and sounds. The rapid advances of this technology have led entertainment industry employers in the industry to consider how AI systems may create opportunities for profit. In both the Writers Guild of America (WGA) and the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) labour disputes in 2023, the use of AI was a key issue.
Without appropriate guard rails, the work of IATSE members can be used to train AI systems to create material without consent or contribution. It is an ongoing concern to IATSE that AI generated content may replace work historically performed by trained professionals including art directors, costume designers, audio visual technicians and others.
The use of AI in the entertainment industry presents a great opportunity to complement existing roles through the availability of additional tools instead of a harmful weapon that will devalue creative work. The legal framework must protect the human efforts engaged in creative work.
In light of its role in the industry, IATSE is well positioned to provide a worker's perspective concerning AI and its use in the entertainment industry.
Text and Data Mining
TDM activities in the entertainment industry must be regulated. AI systems should not be used to consume copyrighted material to create works without permission from and compensation to copyright holders.
TDM activities in the entertainment industry rely on the use of copyrighted works to operate. The TDM systems cannot operate without first consuming human created work such as writings, recordings, photographs or videos. For example, TDM activities can be used to consume entire works of art to create a product such as a screenplay. TDM has been used to create convincing replicas of works including scripts, audios and visuals.
There must be basic floors and protections included in the legal framework for those who may be impacted by TDM activities. TDM activities must be transparent. Without this transparency, the rights holders and creative contributors will face practical impossibilities in understanding where works have been consumed. An opt out system should not be adopted as it would unduly place the onus on copyright holders to enforce their rights. An opt out system would create a windfall for technology companies to profit until each copyright holder asserts a right.
Similar to privacy legislation in Canada, the legal framework should include a clear consent requirement. Express consent should be mandatory before copyrighted works are used to train AI systems and generate material. Records must be kept and disclosed in order to permit inspection or investigation of the content that was used in TDM activities. Without such a requirement, it would be practically impossible to investigate or challenge any TDM uses. Rights holders would lack the requisite knowledge to assert or enforce rights.
Developers of AI systems that engage in TDM activities must be required to collect, retain and disclose records relating to the specific materials used to train their models. This material should be searchable in order to provide a practical mechanism by which individuals and trade unions can monitor and enforce rights. The enforcement tools must include "teeth" in order to be effective. The legal framework must include penalties, damages, and rights of action where it can be demonstrated that a work was used in TDM activities without a corresponding record.
The legal framework must hold those who create TDM systems accountable for the consumption of data and creative material. The act of consuming the material itself must be fairly compensated as it is this material that is used in TDM activities to train the system. The legal framework must reinforce the bargaining power of labour unions such as IATSE in securing fair compensation for the use of material in TDM activities.
Canada should not model legislation in the European Union, United Kingdom, Japan or other jurisdictions which have to some extent passed legislation permitting the consumption of copyrighted works without consent or compensation. For example, Japan allows for the consumption of copyrighted material for use in commercial works and does not require a party to first have legal access to those works. Any broad TDM exemptions or a failure to require consent and compensation will devalue labour and destroy any possibility of a basic floor of ethical rights and acceptable uses of this powerful technology.
Authorship and Ownership of Works Generated by AI
The authorship of AI assisted works is a complex question which is not easily answered. For copyright to arise under the current jurisprudence, work must be produced by an author's skill and judgement. The use of AI as a tool at some point in the creation of a work should not in itself wholly disqualify a work from copyright protection.
There are instances where this technology is already ethically in use as a tool by creatives in the process. Where creative professionals use their human labour to use AI as a tool to create work, there must be distinction in the law to recognize and protect that human effort. Copyright law must distinguish between creative works that are largely machine created and those that are a result of human effort, guidance and creative control. Where AI is used as a tool to create a work, while being guided and molded by a human, that human expression ought to receive legal protection. Works that are predominantly AI generated lack a requisite human element and should not be recognized as eligible for copyright.
Infringement and Liability regarding AI
The legal framework must provide sufficient enforcement mechanisms in order to provide tools by which copyright holders can actually assert their rights. Assessing infringement of copyrighted works by AI systems creates certain difficulties in determining the identity of the creator of the infringing work. Greater clarity must be provided concerning where liability lies both in the circumstances of primary and secondary infringement. Liability must not only attach to those creating AI work that infringes copyright, but also where it is further distributed. Record keeping, stored data sets and transparency mechanisms are necessary to understand what work was used or reproduced in the creation of AI generated work. Ultimately, it is humans who must be responsible for the actions of the AI systems they create. Enforcement mechanisms will be meaningless if malicious actors can hide behind the activities of the technology they have enabled. IATSE welcomes clarity in ensuring liability rests with those who infringe copyright protected works.
Comments and Suggestions
N/A
International Confederation of Societies of Authors and Composers (CISAC)
Technical Evidence
The International Confederation of Societies of Authors and Composers (CISAC) welcomes the opportunity to engage with the Canadian Government in its request for comment in the framework of the Consultation on Copyright in the age of generative Artificial Intelligence.
CISAC is the leading worldwide organisation of authors' societies. We represent more than 5 million creators from all geographic areas and all artistic repertoires (including music, audiovisual, drama, literature, and visual arts) through our 225 members. The position of CISAC is not just a reflection of its members, but of its long history centred on defending the livelihood of creators and supporting creativity for future generations.
This submission delves into legal and policy concerns surrounding generative AI and specifically examines three critical areas:
- text and data mining and the use of copyright-protected materials by developing AI models;
- copyright protection for AI-generated content;
- transparency and record-keeping as fundamental requirements for copyright enforcement.
The goal is to offer insights into emerging issues within the framework of foundational copyright principles while pinpointing opportunities to enhance the current state of play.
I. Introduction
Generative AI stands as a double-edged sword for the creative industry, offering immense potential while simultaneously posing significant risks. When leveraged to support human creativity, this technology emerges as a powerful ally that the creative industry should embrace and foster in partnership with the AI development community.
At the same time, generative AI poses existential threats to the creative sector. To improve its capabilities, this technology heavily relies on the use of vast volumes of copyrighted works. The methods employed, such as data scraping and web crawling, often gather content indiscriminately, raising concerns about the unauthorized use of copyrighted material for AI training. This process disrupts the balance of copyright law by favoring AI technologies at the expense of creators' rights and livelihoods. The lack of transparency from AI companies regarding dataset acquisition exacerbates this imbalance. The overreliance of AI companies on the blanket assumption that such a use is "fair" or otherwise subject to an exception without proper validation poses a threat to the essence of copyright law.
Also, while AI serves as a facilitative tool in the creative process, it must be recognized as an extension of human creativity, not as a replacement. This distinction is vital in preserving the integrity of copyright law and the livelihoods of creators.
This submission seeks to chart a prudent course within Canada's copyright landscape and to foster a dialogue that harmonizes technological innovation with the preservation of creators' rights in the evolving AI landscape.
Text and Data Mining
II. Text and data mining
A. The use of copyright in training AI systems
The use of copyright-protected material to train AI systems has become a pivotal aspect within the realm of AI development. A vast amount of copyright protected works – such as images, texts, music, videos, or other creative content – is ingested into AI models and used to identify patterns, generate new content, and/or perform various tasks.
The acquisition of such data often occurs through indiscriminate "data scraping" and "web crawling" methods, without seeking authorization from rightsholders. Further, this process remains mainly unchecked, due to a lack of transparency as to what materials have been used to train AI models to date, and how these materials have been collected and curated.
Artists and creators using the online space to promote and disseminate their works often remain unaware that their works have been assimilated into AI models for training. This prevents them from authorizing (or prohibiting) such usage and from asserting their rights to be compensated. It is therefore essential that AI developers keep detailed records of works used, alongside the platform through which they were accessed and make this information available to rightsholders (see section IV A). The obligation to keep accurate records should arise from the start of the development, training, and design phase to provide a full chain of use.
To properly address these challenges, several approaches could be considered, as elaborated below. This includes requiring fair licensing practices between creators and AI developers using copyrighted works for training data as well as imposing transparency reporting requirements on AI developers to disclose their training datasets.
B. Inapplicability of TDM exceptions
The essential balance within copyright law between private rights and public interests sometimes requires the application of exceptions to exclusive rights of creators. These exceptions must operate within defined boundaries to safeguard creators' ability to benefit from their creations.
AI developers tend to rely on exceptions like fair use or text and data mining (TDM) to access and use data from the internet for training AI models. However, extending such exceptions to encompass the commercial use of copyrighted works as training data for AI may undermine creators' rights and may challenge the delicate balance inherent in copyright law. Unlike traditional uses covered by exceptions, AI training goes beyond mere analysis or extraction, and results in the production of content that may compete with or replicate the original works, thus affecting the economic incentives and rewards that creators derive from their works.
This diverges significantly from the intended scope of exceptions to copyright, which according to the principle of the three-step test recognized in the Berne Convention, consists in allowing specific uses that do not conflict with the normal exploitation of the copyright-protected works and do not unduly prejudice creators' ability to benefit from their works.
Introducing new exceptions should be carefully weighed to ensure they do not unduly harm creators' rights or disrupt the balance between private rights and public interests. It also raises concerns about potential breach of international treaties and fundamental principles underlying copyright law. Therefore, we strongly recommend refraining from adopting new TDM exceptions that permit AI systems to commercially exploit copyright works without rightsholders' authorization and remuneration.
C. Licensing of works used for training purposes
The use of copyright-protected works by AI encompasses making copies or reproductions of copyrighted material for analysis and learning purposes. Even if this reproduction is for computational processing rather than human consumption, it still falls within the realm of copyright control.
This emphasizes the importance of obtaining appropriate authorization or licensing to ensure compliance with copyright law. Ensuring fair voluntary licensing practices between creators and AI developers becomes imperative in this context, ensuring that creators are fairly compensated for their contributions to AI-generated outputs.
Collective management organizations (CMOs) hold a unique and advantageous position to facilitate the development of licensing schemes for the use of copyright protected works by AI. Their comprehensive experience on licensing vast amounts of works for digital uses such as Internet streaming, their efficient infrastructure, and their capacity to balance interests make them well-suited to drive the development of effective and fair licensing schemes for AI use.
Authorship and Ownership of Works Generated by AI
III. Authorship and ownership rights related to AI-assisted and AI-generated content
The issue of copyright protection for AI-generated content is complex and multifaceted, as demonstrated by recent decisions across the world. As a preliminary remark, a distinction needs to be made between the AI-assisted content, which should continue to benefit from authors' right/copyright protection, and the purely AI-generated content, whose protectability seems largely unjustified under the current copyright regime.
In the United States, the US Copyright Office (USCO) took a significant step in that direction by refusing to register several works, particularly on the grounds that the work in question was created by AI without any creative contribution from a human being. A subsequent decision of the US District Court of Columbia(Thaler v. Perlmutter et al, No. 1:2022cv01564 (D.D.C. 2023)) emphasized the significance of human authorship in copyright protection. These rulings stressed that AI-generated content lacking creative input from a human actor does not meet the threshold for copyright protection. Such decisions align with the fundamental principle upheld by US Courts, according to which human authorship forms an essential element for copyright protection.
Similarly, in Europe, the concept of originality is a critical factor for determining copyright protection of AI-generated works. The extent of human involvement in the creative process and the level of autonomy of the AI system are pondered by the Courts as key elements in assessing the protectability of AI-generated content under copyright law. This is based on a meticulous case-by-case evaluation to ascertain whether the role of AI is merely supportive or contributes substantially to the creation of an "original" work.
These recent decisions show that the current status quo regarding protection for AI-generated content remains rooted in the principles of human authorship and original expression. Both are pivotal aspects entrenched within the essence of copyright, which is incentivizing and rewarding human creativity. In navigating the challenges posed by AI-related innovations, any consideration should build upon the inherent values of incentivizing and rewarding human creativity and fostering an environment that encourages innovation while safeguarding the rights of creators.
Ongoing legal developments in other jurisdictions and case law involving AI-generated content will provide valuable insights into the nuances and complexities of this rapidly evolving context. It is therefore advisable to refrain from any amendment to the current legal framework in Canada unless future developments expose a clear gap or defect related to the applicability of the existing rules to AI-generated content.
Infringement and Liability regarding AI
IV. Liability for infringement
A. Transparency and record-keeping as fundamental requirements for enforcement
Preserving and enforcing creators' rights in the evolving AI landscape will not be possible unless AI developers are transparent about what data they use to train the AI models. At present, rightsholders are unable to identify whether their copyrighted works have been fed into existing AI models. Transparency and record-keeping requirements are essential to ensure that rightsholders can enforce their rights, negotiate licenses, ensure fair compensation for the exploitation of their works and pursue claims for copyright infringement.
Further, such requirement serves as a cornerstone for accountability and trust within the AI ecosystem. By mandating transparent record-keeping, developers are held accountable for the origin and usage of the data utilized in training AI models. This transparency establishes a level of trust between creators, rightsholders, users, and developers by ensuring clarity regarding the incorporation of copyrighted material.
Overall, it is imperative that AI developers keep and make readily available to rightsholders detailed and accurate records of the works they use for training, the works' origin, and the existence of any licenses authorizing such use. Such obligation should go back to the start of the development, training and design phase to provide a full chain of use.
AI developers assert that such an obligation with respect to foundation models would constitute a significant burden for the development of AI technologies. We, however, maintain that transparency is a basic standard for any business that operates in the digital environment and processes vast amounts of data every day. AI services already maintain record keeping practices for internal purposes, such as validating records, removing biases or deciding on the need for additional datasets. These transparency standards should be urgently extended to the ingestion of copyright-protected works into the data pool, as well as to the outputs of the generative AI.
The discussions in the European Union around the EU AI Act highlight the significance of transparency and disclosure requirements for AI developers in relation to copyrighted works. The proposed provisions within this legislation emphasize the need for AI developers to disclose relevant information regarding the copyrighted works used during the training of their AI models (see EU AI Act Final Text, Recital 107, Art. 53(d)).
Once a clear and broad principle of transparency and record keeping is recognised, established copyright principles will apply to determine where liability for infringement attaches. The expectation is that Canadian courts will navigate these complexities by applying established copyright principles to ascertain liability, ensuring that copyright protections are upheld in the rapidly evolving landscape of AI technology and content generation.
B. Technological standards for identification of works and authors
As for the future of developing better and more accurate technological standards to aid in the proper identification of works and authors, there is ample precedent for the development of such technologies. For years, CISAC, rightsholder groups, and many other organisations managing rights on behalf of creators internationally, have channelled their resources into developing and improving technical measures and infrastructures for enabling creators to receive their fair share from their creative efforts. This has included the creation of reliable systems such as industry standard ISWC and ISRC identifiers, among other similar tools, which ultimately streamline the task of confirming copyright authorship in works.
We believe that further collaborative efforts between AI developers and rightsholders are essential to improve the infrastructure surrounding works identification, and that such collaborations should be encouraged and facilitated at the policy level.
Comments and Suggestions
V. Conclusion
The intersection of copyright and AI training requires a delicate balance between fostering innovation and respecting creators' rights. To navigate this evolving scenario, it is paramount to develop comprehensive approaches that ensure legal compliance, ethical treatment of copyrighted content, and preservation of human creativity.
Given the rapid pace of change in this field, CISAC believes that a factual assessment of the issues raised by AI technology must necessarily precede any legislative reform in Canada. Apart from insights gathered from industry stakeholders, the interpretation of fundamental copyright principles by both Canadian and international courts will be essential to inform future adoption of legislative measures.
Respecting copyright law emerges as pivotal for regulators to secure the future of creativity in a fast-evolving technological landscape. Above all, it is imperative that AI technologies be developed and utilized in a controlled manner and do not undermine the overall functioning of the copyright system nor the interest of rightsholders.
Opportunities presented by AI should not come at the expense of creators' rights. Striking a balance between innovation and copyright protection is essential to ensure that AI enriches society without replacing human creativity.
We would once again like to thank the Government of Canada for the overall thought and attention it has given to the intersection of generative AI and copyright. We appreciate the opportunity to submit these comments and look forward to working with the Government and other stakeholders on these issues in the future.
International Trademark Association
Technical Evidence
N/A
Text and Data Mining
N/A
Authorship and Ownership of Works Generated by AI
In June 2023, the AI and 3D sub-committee of the INTA Copyright Committee released a survey report on "Copyright and Neighbouring Rights in Outputs Made by or Made by Means of AI Systems". The report summarized the results of a limited study conducted by INTA in 2022, relating to the protections (if any) afforded to AI outputs in different jurisdictions. 48 countries were surveyed. 71 respondents provided their input and responded to a short questionnaire.
The INTA Survey considered whether (and on what terms) such outputs were (or might be) protected under traditional copyright which has a human authorship component; as a "neighbouring right" (e.g., protections afforded to performer's performances, and makers of sound recordings or broadcasts) which rights arise agnostic to the copyright status of the underlying content; or as a sui generis right. It also considered whether protection afforded to AI outputs give rise to moral rights in the output (such rights arise in tandem with copyright in many jurisdictions, including Canada).
Of note, the INTA Survey distinguished between protections for outputs that are AI-generated versus AI-assisted. The first category — "Generated" outputs — are those for which no human author exists (the only human contribution being "pressing the button"). This category includes outputs produced by autonomous artificial intelligence systems (commonly known as "generative AI" systems). An example of such "generative AI" systems includes ChatGPT.
The second category — "Assisted" outputs — are those generated by one or more humans using one or more AI systems as tools. While the INTA Survey results did indicate jurisdictions might protect such works in theory, protection in these instances would be tied to questions surrounding the requisite quantum of originality and human authorship required for copyright, and whether such conditions were satisfied with respect to the output in question. The INTA Survey did not explore in-depth the question of the level of human participation involved in creating AI-assisted outputs and how different levels of human involvement may affect the copyright analysis.
Survey Results: The INTA Survey found that in jurisdictions without a sui generis right or other specific legislative language, whether the AI output in question is capable of attracting protection under copyright appears to be assessed with respect to the threshold question of "originality". The approach to assessing "originality" of a work for the purposes of copyright is neither internationally settled nor uniform. It varies between jurisdictions. For example, Canada's standard considers the "skill and judgement" of the author in producing the work. This will necessarily involve intellectual effort. As explained by the Supreme Court of Canada, "skill" means the use of an author's own "knowledge, developed aptitude or practiced ability in producing the work". "Judgment" means use of the author's "capacity for discernment or ability to form an opinion and evaluation by comparing different possible options in producing the work". Moreover, "the exercise of skill and judgment must not be so trivial that it could be characterized as a purely mechanical exercise" (such as changing a font to produce "another" work): CCH Canadian Ltd. v. Law Society of Upper Canada, 2004 SCC 13 at para 16. Originality is also relevant to copyright protection for compilations. A compilation takes existing material and casts it in different form. The Supreme Court has explained that the Copyright Act does not require originality in both the selection and arrangement; rather, originality in one will suffice to attract protection (i.e., originality in selection or originality in arrangement: Robertson v. Thomson Corp. 2006 SCC 43 at para 35, 37. Canada's approach differs from the approach taken in other jurisdictions. For example, the United States looks to whether there is "a modicum of creativity" (Feist Publications Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991) at p. 346 (USSC)), while the EU test for originality requires non-copying and the "intellectual creation" of the author".
The INTA Survey indicated that it remains unsettled how the originality test may apply to computer generated works. While human involvement and human originality will likely remain relevant, variabilities in the "originality" standard across jurisdictions may well lead to different results.
AI-Generated Outputs: At the time the Survey was conducted, INTA found that only four countries have specific protections for "computer generated works".
Of those four, only one (1) country — Ukraine — offered a sui generis right outside of copyright. This sui generis right protects non-original subject matter created by software (including AI) — i.e., outputs that differ from other works of a similar type and are created without the participation of humans. Rights arise at the moment of the output's creation There is no requirement that the output meet the "originality" requirement. Rights last for 25 years. They accrue to the owner or licensed user of the software that generated the output. Rights granted include the right to use, and to authorize (or prohibit) third-party use of the output. No moral rights arise in relation to protected outputs.
The other three (3) jurisdictions found explicitly protected "computer generated works" — the UK, Ireland, and South Africa —do so under their respective copyright laws. In these jurisdictions, there is a definitional distinction between the AI-generated output (which is specifically deemed to be a protected "work" or "creation" under the legislation) and the "author" of that work, who is a human. The legislation approach here is to identify the "author" of the computer-generated work as the "person by whom the arrangements necessary for the creation is undertaken." The term of protection is not tied to the life of the author, but rather to an event — for example, the date the work is first lawfully made available to the public (Ireland), or from the date the work is made (UK), or some combination (South Africa). The remaining jurisdictions surveyed had either not yet dealt with the issue of copyright protection for AI output, or required an actual human creator.
The INTA Survey Results Report identifies ways in which a "neighbouring rights" regime may be an appropriate avenue through which to protect AI-generated outputs. However, no jurisdictions were identified that currently took this approach.
AI-Assisted Outputs: Conceptually, where AI is used to assist authorship (e.g., as a tool, resource, mechanism or means for human creation and to facilitate human expression) copyright is more likely to be available for the resulting work. By contrast, where work is the result of AI operating with no or limited contribution by a human, the work may not be protectable. In either case, the threshold question of "originality" would need to be assessed.
Possible Application of Canada's Originality Test: As noted above, variabilities in the "originality" standard across jurisdictions may well lead to different resulting protections. As canvassed in the INTA Survey Results Report, the US Copyright Office's decision involving "Zarya of the Dawn" in February 2023, led to the refusal of copyright in the visual elements of a comic book created with the assistance of a generative-AI program (Midjourney AI), on the basis that the AI at issue did not render predictable results, and thus the human instructing the AI could not be deemed the "master-mind" behind the resulting images. However, protection was granted to those portions of the comic book entirely attributable to the human author (in that case, the text, and the selection, coordination, and arrangement of the work's written and visual elements). Based on the INTA Survey, and considering the different originality standards, the analysis in Canada would be different, and may well lead to a different result. Applying the Canadian test for originality, it is unsettled whether there could be sufficient "skill and judgment" exercised by the human author in generating the visual works that were compiled into the publication, and what level of involvement might be sufficient to shift the output from "unoriginal" (and thus not protected) to "original" (and thus protected). For example, whether the human author's exercise of "skill and judgment" to design the prompt(s) that produce the desired images (output), could be of a sufficient nature so as to render the resulting images (output) as "original" (and thus protected), or if an author's exercise of "skill and judgement" is limited to the prompt design, and does not flow through to the AI output produced by execution of that prompt.
It is likely, however, that in a case like "Zarya of the Dawn", the human author's compilation of text and AI-generated images would satisfy the Canadian test for originality, and lead to as similar result with respect to protection of the work as a compilation, as occurred before the USCO. To illustrate how the Canadian jurisprudence has examined copyright protection for compilations, generally it has been held that while the arranger of a compilation does not have copyright in the individual components, the arranger may have copyright in the form represented by the compilation: "It is not the several components that are the subject of the copyright, bbut the over-all arrangement of them which the plaintiff through his industry has produced": Robertson v. Thompson Corp. 2006 SCC 43 at para 36, citing Slumber-Magic Adjustable Bed Co. v. Sleep-King Adjustable Bed Co. (1984), 3 C.P.R. (3d) 81 (B.C.S.C.), at p. 84; see also Ladbroke (Football) Ltd. v. William Hill (Football) Ltd., [1964] 1 All E.R. 465 (H.L.), at p. 469.
Infringement and Liability regarding AI
N/A
Comments and Suggestions
INTA is available to discuss our comments in more detail. Thank you in advance for considering the views of INTA.
Internet Archive Canada
Technical Evidence
Internet Archive Canada is a not-for-profit digital library based at 330 West Pender in Vancouver, B.C.
For the past nineteen years, Internet Archive Canada has been building a Canadian digital library. We have partnered with organizations across the country–including the University of Toronto, the University of Ottawa, and Library and Archives Canada–to work towards this goal. This has included digitizing more than 650,000 books, government publications, and other works, a great many of which are focused on Canadian cultural heritage.
In recent years, Internet Archive Canada has substantially expanded its footprint and operations, reflecting our pride and confidence in the Canadian library system, our belief in the promise of Canadian innovation, and the opportunity for serving the public good inherent in Canada's balanced and user-focused copyright system. This has included expanding, alongside other organizations, into new physical facilities in Canada (including our home in Vancouver), growing our Canadian workforce, and building new partnerships right here at home.
Internet Archive Canada is committed to using the promise of new technologies to further access to information and benefit the public interest. For example, when we help digitize a book, custom machine learning models automatically suggest page boundaries of scanned materials–making our library processes more efficient and our collections more useful for our patrons. Sometimes, we use these experiences to aid the policymaking process–such as through this submission and our earlier one to the 2021 Consultation. And we try to make sure we are sharing our resources to help the broader community–for example, from November 15-17, 2023, we hosted in partnership with Simon Fraser University and others the 2023 Artificial Intelligence for Libraries, Archives, and Museums Annual Conference at our headquarters in Vancouver.
Internet Archive Canada submits this response to the Government's Consultation on Copyright in the Age of Artificial Intelligence because we care deeply about the future of access to knowledge in Canada and the Canadian library system–and, by extension, copyright law. Our comments are guided by two core principles. First, copyright rules and technological developments should advance the public's right and ability to access knowledge and culture. Second, libraries and other publicly-oriented institutions should be empowered to utilize new technologies to expand access to knowledge.
Text and Data Mining
1. Centering the Public's Interest in Access to Knowledge and Culture
"Copyright law has public interest goals."[1] As the Supreme Court of Canada has explained, "increasing public access to and dissemination of artistic and intellectual works, which enrich society and often provide users with the tools and inspiration to generate works of their own, is a primary goal of copyright."[2] So we were pleased to see the Consultation Paper acknowledge that copyright is at root "a balance between promoting public interest in the encouragement and dissemination of works of the arts and intellect and obtaining a just reward for the creator."[3]
Unfortunately, the Consultation Paper also states that, "[i]n considering possible copyright policy options relating to AI, the Government will aim to balance two main objectives:" to support innovation, and to support Canada's creative industries. While these are certainly important considerations, this balancing framework obscures the public's interest in access to knowledge and culture that is copyright's true objective. The "copyright bargain" is not one to be negotiated between the technology and creative industries; it is "a balance between promoting the public interest in the encouragement and dissemination of works of the arts and intellect and obtaining a just reward for the creator."[4] The relevant copyright balance is not between two competing economic or ministerial interests, but between the public and authors.
We therefore urge the Government, when considering new copyright rules for artificial intelligence or otherwise, to keep the public's interest in access to knowledge and culture at the center of copyright policy. Proposals to amend the Copyright Act to address AI should be evaluated by the impact such new regulations would have on the public's access to information, knowledge, and culture. In cases where proposals would have the effect of reducing public access, they should be rejected or balanced out with appropriate exceptions and limitations. Copyright law should enable, not restrict, the promise of new technology to expand the public's access to knowledge and culture.
---
[1] York University v. Canadian Copyright Licensing Agency, 2021 SCC 32 at para. 91.
[2] Id. at para. 92.
[3] See Théberge v. Galerie d'Art du Petit Champlain inc., [2002] 2 S.C.R. 336 at para. 30
[4] See CIPPIC, Copyright Law, https://cippic.ca/en/FAQ/copyright_law#faq_copyright-bargain
Authorship and Ownership of Works Generated by AI
N/A
Infringement and Liability regarding AI
2. Preserving a Flexible Legal Framework
In our view, a flexible legal framework is the best way to ensure copyright keeps the public's interest at its core while responding to new technological developments like artificial intelligence.
As is often the case with new technologies, artificial intelligence is a rapidly evolving, unpredictable field.[5] And while the consultation paper asks many of the right questions about artificial intelligence today, the answers to those questions—and the relevant questions themselves—may well change tomorrow. Indeed, research and progress in artificial intelligence is moving extremely rapidly.[6] In the circumstances, while legal certainty is no doubt important, the appropriate copyright framework for artificial intelligence must be flexible in order to ensure technological change does not disrupt the copyright balance to the detriment of the public's interest.
Fortunately, Canada is already a global leader in copyright due to its user-centered approach to fair dealing and the doctrine of technological neutrality. The consultation paper rightly recognizes the role fair dealing can play in enabling artificial intelligence work in Canada today. But as the paper also recognizes, the legal situation in Canada is more uncertain than it could be. In particular, increased clarity could be brought on the question of whether training artificial intelligence and machine learning models on copyrighted works gives rise to liability. This could be achieved in two ways.
First, the Government could reaffirm and strengthen fair dealing to provide for an open, fair-use-style approach. Many have called for adopting an explicit "such as" approach to fair dealing in the Copyright Act, making the fair dealing purposes in the Act illustrative.[7] Indeed, the Report of the Standing Committee on Industry, Science and Technology recommended just that in its June 2019 Statutory Review of the Copyright Act.[8] This approach doubles down on the flexibility of fair dealing and, in practice, can provide the certainty needed for technical innovation to thrive.[9]
Second, the Government could take steps directly targeted towards the question whether training artificial intelligence and machine learning models gives rise to copyright liability. To this end, a targeted AI exception could be added to the fair dealing framework.[10] That said, the requisite certainty might also be achieved with a flexible framework without a new targeted exception.
Any reforms must take account of actual marketplace conditions if they are to be efficacious. As it stands, many electronic resources offered to libraries and others are double-locked against text-and-data mining and other similar uses: first, by contractual license terms that purport to override the limitations and exceptions in the Copyright Act, and second, by technological enforcement of those terms through the use to technical protection measures. Thus, for any reform to respond to the actual conditions "on the ground," it should make clear that no exception to copyright can be overridden by contract, and that TPMs can be circumvented for such non-infringing uses.
---
[5] See, e.g., Canada's AI Research Ecosystem, available at https://radical.vc/2021-primer-canadas-ai-research-ecosystem.
[6] E.g., Mike Drolen & Megan Robinson, Global News, Artificial Intelligence: Canada's future of everything, available at https://globalnews.ca/news/10110598/artificial-intelligence-canada-future/.
[7] See Geist, Michael, "Fixing Fair Dealing for the Digital Age," available at https://www.michaelgeist.ca/2019/06/fixing-fair-dealing/
[8] Dan Ruimy et al, "Statutory Review of the Copyright Act"available at https://www.ourcommons.ca/Content/Committee/421/INDU/Reports/RP10537003/indurp16/indurp16-e.pdf
[9] See Corynne McSherry, Fair Use Economics: How Fair Use Makes Innovation Possible and Profitable, available at https://www.eff.org/deeplinks/2016/01/fair-use-economics-how-fair-use-makes-innovation-possible-and-profitable
[10] See, e.g., Canadian Federation of Library Associations and Canadian Association of Research Libraries, Brief to the Government of Canada: Consultation on a Modern Framework for Artificial Intelligence and the Internet of Things, available at https://www.carl-abrc.ca/wp-content/uploads/2021/09/CFLA-CARL-Brief-Artificial-Intelligence-and-the-Internet-of-Things.pdf.
Comments and Suggestions
3. The Need for Human Review
While a detailed examination of machine learning and artificial intelligence techniques is beyond the scope of this submission, it is important to note that it is sometimes necessary to manually review portions of datasets both before and after ingestion.[11] There is increasing understanding that bad datasets lead to bad results.[12] Recent research from Europe confirms that "[t]he fitness of a modern copyright system" vis-a-vis artificial intelligence must therefore be measured, in part, by whether it provides an ability to mitigate potential bad and discriminatory results by permitting "access to the original training data... to scrutinise [it] for mistakes, omissions or bias...."[13]
Humans utilizing large datasets—whether in connection with the machine learning and artificial intelligence technologies available today, or with the as-yet-undeveloped technologies of the future—must therefore be able to review the underlying materials as a part of their work. In our view, one way to do this that both provides the necessary flexibility, and is fully compliant with the Copyright Act today, is through a technique known as controlled digital lending.[14] Controlled digital lending is the digital equivalent of traditional library lending; it permits libraries to digitize books they own and lend them out to their patrons one at a time (just as they would with the physical book, and in place of it). It is a practice that has the support of many libraries, librarians, and others around the world in order to make materials available ethically and within a recognized legal framework.[15] And its potential application here underscores the necessity of flexible copyright exceptions to respond to technological change.
3. Conclusion
Internet Archive Canada greatly appreciates the government's careful consideration and open process with this consultation. Please do not hesitate to contact us if we can be of any further assistance.
---
[11] See, e.g., Margoni, Thomas, & Kretschmer, Martin. (2021). A deeper look into the EU Text and Data Mining exceptions: Harmonisation, data ownership, and the future of technology at 8-9. Zenodo. https://doi.org/10.5281/zenodo.5082012
[12] See, e.g., https://cdt.org/ai-machine-learning/
[13] See Margoni et al., supra.
[14] See Christina De Castell et al., Controlled Digital Lending of Library Books in Canada, available at https://doi.org/10.21083/partnership.v17i2.7100
[15] https://www.ifla.org/publications/ifla-statement-on-controlled-digital-lending/
Internet Society – Canada Chapter
Technical Evidence
CANADIAN COPYRIGHT LAW MUST BE CLARIFIED TO PROMOTE COMPETITION, INNOVATION, AND INVESTMENTS IN CANADIAN COMPUTATIONAL POWER
The organizations that fund, build and operate large-scale computing facilities are closely examining what the next generation of computing will be, and are sensibly examining the regulatory environment to evaluate long-term investments in computing capacity.
Canadians benefit from having computing and networking infrastructure built domestically. It creates jobs, boosts skill sets across the entire economy and creates greater capacity to access information.
Right now, AI is a significant driver for increasing computing capacity. AI requires specialized chips and new technologies that are built-for-purpose. Before making decisions about where to deploy expensive computing capacity, companies will look to where and whether they and their customers can use it in the location of deployment. Any changes to the law that will increase the legal or regulatory risk associated with the use of infrastructure will disincentivize investment in computing capacity in Canada, to the detriment of the socio-economic benefits for Canadians.
Simply put, Investment in computing capacity will be influenced by the degree of clarity provided by copyright law for potential investors. Canada has an opportunity to glean valuable insights from the experiences of its international partners—for example, where clarity arising from text and data-mining (TDM) exceptions have succeeded in other jurisdictions.
In Canada, a clear exception for TDM is crucial to encourage competition and innovation from smaller, Canadian players. If copyright law requires licensing of the internet—or the corners of it originating in Canada—before an LLM can be trained, then only the biggest players with the deepest pockets and hoards of proprietary data may be able to innovate. This would be a massive barrier to entry for an upstart company and would prohibit Canadian companies with Canada's best interest at heart from competing and innovating.
Text and Data Mining
COPYRIGHT AND THE CREATION OF LARGE LANGUAGE MODELS
Any discussion about copyright and large language models (LLMs) needs to start with a common understanding of how LLMs work. The discussion should also distinguish between the "inputs" of an LLM and the "outputs".
To create an LLM, a large data set is ingested, and the words—or portions of words—are analyzed in terms of probabilistic distribution. Software examines the data set and encodes what words are in the data set and what words follow other words. It evaluates the context in which particular words appear. It evaluates the context in which particular words appear. The result is a massive table of numerical tokens that represent words, their frequency, and their tendency to appear together. Other patterns in the appearance of words may be mapped. This creates a word prediction model, not a data set of the works examined or the training data itself. Thus, any "copying" of the training data is incidental, ephemeral, and temporary. The creation of an LLM should not trigger any of the exclusive rights of a holder of copyright under s.3(1) of the Copyright Act.
The output of an LLM depends on the model itself and the prompts given to the system by the user. One cannot enter a citation of an article or any other work and ask for a copy of it. The database would not contain the work. ISCC believes that, at present, the Copyright Act can effectively address the outputs and requires no amendment in this regard.
In the event that a clearly constructed prompt results in an output that appears to be a word-for-word copy of an existing text, it is due to the probability that words exist in this particular sequence—not because the original text is stored in the LLM. For example, if you entered the following prompt into an LLM, "finish the sentence: it was the best of times, it was …" the result would probably be "it was the best of times, it was the worst of times," because of the number of times this phrase has been repeated on the internet, not because the LLM contained a copy of "A Tale of Two Cities."
Understanding the difference between inputs and outputs are imperative, as any radical changes to the Copyright Act because an LLM appears to be copying an artistic work would be misplaced. The use of a copyrighted work is not prima facie infringing. Canadian copyright law has never countenanced restricted or prohibited learning from a copyrighted work or describing a copyrighted work. Creating a new author's right out of thin air is not consistent with the existing statute or the case law with respect to fair dealing.
To the extent clarity is desirable, Canada should follow the leads of Japan, South Korea and Israel, all of whom have clarified that the input, the training of LLMs, is not a violation of their copyright laws. The following clarification to the Act is recommended:
29.23.1 It is not an infringement of copyright for a person to use a work or multiple works for the purpose of information analysis, including the comparison, classification or other analysis of information pertaining to language, sound, images or other elements constituting information extracted from a work, including the creation of systems and databases to support an artificial intelligence system.
Authorship and Ownership of Works Generated by AI
The output of an LLM depends on the model itself and the prompts given to the system by the user. One cannot enter a citation of an article or any other work and ask for a copy of it. The database would not contain the work. ISCC believes that, at present, the Copyright Act can effectively address the outputs and requires no amendment in this regard.
In the event that a clearly constructed prompt results in an output that appears to be a word-for-word copy of an existing text, it is due to the probability that words exist in this particular sequence—not because the original text is stored in the LLM. For example, if you entered the following prompt into an LLM, "finish the sentence: it was the best of times, it was …" the result would probably be "it was the best of times, it was the worst of times," because of the number of times this phrase has been repeated on the internet, not because the LLM contained a copy of "A Tale of Two Cities."
Understanding the difference between inputs and outputs are imperative, as any radical changes to the Copyright Act because an LLM appears to be copying an artistic work would be misplaced. The use of a copyrighted work is not prima facie infringing. Canadian copyright law has never countenanced restricted or prohibited learning from a copyrighted work or describing a copyrighted work. Creating a new author's right out of thin air is not consistent with the existing statute or the case law with respect to fair dealing.
PROTECTING CANADIAN CULTURE, LANGUAGE AND VALUES THROUGH CANADIAN INPUTS TO LLMS
While we are at the beginning of the AI revolution and just starting to see the technology's utility, it is clear that AI is an important way that people seek answers to questions. Thirty years ago, people went to a library. Currently, we use internet search engines. Those search engines are becoming enhanced by AI to understand our questions and match them to appropriate search results. The next step will the use of AI to match a particular question to an appropriate answer.
In effect, AI will shape our understanding of world. It's of utmost importance that Canadian data, language, values and culture are reflected in the information-seeking and world-shaping activities powered by LLMs and other generative AI technologies.
If barriers are created that effectively discourage or prohibit the use of Canadian works to build LLMs, the result is that Canadian data will be excluded. These LLMs, therefore, would not be able to provide answers or results that fully reflect Canada. They may produce answers "about Canada," but only from data originating outside of Canada—about us, not by us. LLMs used in Canada, by Canadians, should include data that is relevant and appropriate for Canadians. To do so otherwise would be actively harmful.
Canadian cultural policy generally rests on the concern that Canadian culture and its cultural products may be overwhelmed by those from the United States; moreover, that Canada's bilingual and multicultural legacy may be diluted. Any scenario in which there is mandatory licensing or other restrictions on the use of Canadian content in the creation of LLMs will obscure results about Canada to Canadians and the rest of the world. This would be a bad outcome.
Infringement and Liability regarding AI
See submissions related to TDM.
Comments and Suggestions
Submission to Innovation, Science and Economic Development Canada re: Copyright in the Age of Generative Artificial Intelligence
By Internet Society Canada Chapter
January 15th, 2024
EXECUTIVE SUMMARY
The Internet Society Canada Chapter (ISCC) welcomes the opportunity to provide insight on how to leverage amendments to the Copyright Act to ensure Canadians can harness the full benefits of generative artificial intelligence (AI) and other AI technologies.
ISCC agrees with the objectives of the Copyright Act and believes that it must facilitate the creation of a positive environment for investment in AI development in Canada. Our laws should ensure that Canadian works are reflected—not excluded—from global AI systems and that Canadian creators have the means to protect their works from copyright infringement due to AI-generated outputs in appropriate cases. As AI further influences how information is gathered online, and therefore the knowledge that shapes worldviews, Canadian data, language, values and culture must be included.
More specifically, the ISCC submits that:
Recognition of the difference between the "inputs" and "outputs" of Large Language Models (LLMs) is imperative;
A modern copyright framework for generative AI must ensure that Canadian data, language and values are reflected in the inputs to LLMs to protect Canadian culture and reflect it in world-shaping AI technologies;
Canadian copyright law must be clarified to promote competition, innovation, and investments in Canadian computational power.
ABOUT THE INTERNET SOCIETY CANADA CHAPTER
The Internet Society Canada Chapter (ISCC) is a member-based not-for-profit that advocates for affordable, fair and secure internet access for all Canadians. ISCC engages on legal and policy issues to promote an open internet. Our focus is to bridge the digital divide along all axes to ensure that Canadians reap the socio-economic benefits the internet provides.
We provide Canadians with a proactive voice on all internet issues through various committees, roundtable discussions, conferences and membership meetups, where leaders and experts from governments, the private sector, civil society, academia, the technical community and end-users can discuss key issues, identify common solutions and share resources.
INTRODUCTION
As stated in the consultation paper, "the [Copyright] Act to promote the creation and distribution of content, to foster investment and job creation, promote just rewards for creators, and to create a thriving marketplace that offers consumers choice ad access to diverse content." These are all objectives that ISCC agrees with.
While Canada is a global leader in AI fundamental R&D, it continues to be a laggard with regards to commercialization and adoption. This is an issue central to Canada's ability to compete in the global digital economy. It is also critically important that Canadians have the skills needed to commercialize and scale adoption of AI-driven technology. If Canada and Canadians are to reap the economic, innovation and cultural benefits of rapidly evolving AI technologies, our copyright law must facilitate the creation of a positive environment for domestic AI investment, development and commercialization.
Our laws should ensure that Canadian works are not excluded from global AI systems such that they become inaccessible and irretrievable as an unintended consequence of overly restrictive Canadian copyright law.
Canadian copyright law should ensure that Canadian creators have the means to protect their works from copyright infringement due to AI generated outputs in appropriate cases.
ISCC is concerned that amendments to the Copyright Act could effectively place a toll or limit on the ability to use Canadian content to develop and train large language models (LLMs), ultimately harming Canadian consumers and Canadian creators. Copyright law reflects a careful balance between the rights of creators and the rights of users of creative content. Any recalibration of this balance needs to be very carefully considered to avoid unforeseen or unintended consequences.
Canada has generally only made changes to its copyright laws in harmony with our main trading partners. A change to our copyright laws that is out-of-step with the international community will inevitably create barriers and disadvantage Canadians. A mandatory licensing scheme will result in Canadian content being excluded from learning models, and thus not available for Canadians to use.
M
Faith O. Majekolagbe
Technical Evidence
N/A
Text and Data Mining
In addressing the seeming tensions between copyright and text and data mining (TDM) or copyright and artificial intelligence (AI), the Government of Canada should give careful consideration to the existing suite of exceptions and limitations in the Copyright Act, in particular the fair dealing exception. This is to avoid inadvertently shrinking the scope of user rights available to the public in Canada. I recommend examining the extent to which the fair dealing exception and other exceptions in the Copyright Act already permit certain TDM activities, including the use of copyrighted materials for AI training, bearing in mind that the Supreme Court of Canada in a plethora of cases has held that the exceptions must be given large and liberal interpretations (CCH v Upper Society of Upper Canada, 2004; Alberta (Education) v Access Copyright, 2012). For example, it is hard to see how the current fair dealing provision in section 29 of the Copyright Act cannot be relied on by researchers and institutions that support research (such as libraries, archives, museums, and educational institutions) to copy, extract, or otherwise use copyrighted works in any way for TDM activities for research purposes.
What is needed in circumstances in which the current framework of copyright law in Canada can be rightly interpreted as exempting certain TDM activities from copyright control, is a clarification by the Government that these activities are indeed covered under the Copyright Act. This would provide certainty to the beneficiaries of the existing exceptions, including but not limited to researchers and institutions that support their research activities, and promote socially beneficial TDM activities that rely on existing copyrighted materials. In addition to a clarification note, it is important to prohibit the use of contractual and technological overrides that undermine existing copyright exceptions and serve as an obstacle to permissible TDM activities. Where libraries have obtained lawful access to copyrighted materials for research purposes, they should not have to negotiate new licences where the needed research purpose involves text and data mining.
In circumstances where the current suite of exceptions cannot be interpreted as permitting a TDM activity, we should consider the nature of TDM activities and whether it is feasible to expect copyright clearances for the vast range of works involved; and whether data mining or the machine learning process that is integral to AI training is different in substance than other acts that we do in relation to copyrighted works outside the TDM or AI environment. Scholars have argued that mining or machine learning is not in essence different from the act of reading a copyrighted material (Flynn et al, 2020; Sag, 2019). Since the act of reading or learning from a copyrighted material does not require a licence, the argument that is that mining or machine learning which is akin to human reading or learning should not (Sag, 2019). This is especially so where the copyrighted materials have been lawfully accessed either through purchase, a use-licence, or outside paywalls on the internet. The mere fact that the works have to be reproduced and sometimes adapted into a machine-readable format should not in itself change the nature of the activity since TDM or machine learning as computational processes are impossible without technological reproduction.
Furthermore, TDM and machine learning are arguably non-expressive uses of copyrighted materials since they often involve taking and using ideas, patterns, data, methods, etc, underlying the expressions in a work. These ideas, patterns, data, methods, etc, are unprotected and recyclable elements of copyrighted works (Majekolagbe, 2024). It has never been the case that copyright users who have lawfully access to copyrighted materials cannot use the ideas, patterns, and information in the materials to generate new ideas, information, and other outputs in so far as the resulting output is not a whole or substantial copy of the copyrighted works that was used in the research and creative process. At the heart of copyright law is the protection of the communication of an author's original expression to the public without the copyright owner's consent; copyright law does not seek to restrict the communication of the ideas, patterns or facts expressed in the work to the public (Sag, 2019). As such, acts like mining and machine learning, that do not communicate the author's original expressions to the public should not be regarded as infringing and clear exceptions to that effect should be contained in our copyright law. To regard these non-expressive acts as infringing is to undermine the longstanding idea-expression dichotomy in copyright law that has served to protect the private interests of rightsholders while also securing the interests of the public in using and enjoying copyrighted works. As Sag rightly notes, 'The idea-expression distinction is essential for any copyright system that aims to foster a vibrant and creative intellectual ecosystem.' (Sag, 2019) Permitting TDM and machine learning (non-expressive uses of copyrighted works) without the prior authorization of copyright owners is consistent with the core structure of copyright law in Canada and elsewhere.
The approach to TDM and machine learning activities in Japanese copyright law could provide useful insights in the consideration of this issue in Canada. While Japan is known as the "paradise for machine learning" because of the liberal approach (which is consistent with the core structure of copyright law in Canada) it took to addressing the issue, the Singaporean approach to the issue is also worth considering. The European Union approach could also be considered, but bearing in mind the need to promote innovation and reduce obstacles to the development of AI technologies in Canada, especially socially beneficial technologies. To give more efficacy to any exceptions that we may develop in response to TDM and machine learning activities, we should prohibit the use of contractual and technological overrides that might undermine the exceptions.
It should be pointed out that the UK approach to TDM is an example of what not to do in Canada because it limits the scope of permissible text and data mining activities to non-commercial research whereas, in Canada, our existing law permits fair dealing for both commercial and non-commercial research (Copyright Act, s. 29; SOCAN v Bell Canada, 2012; CCH v Upper Society of Upper Canada, 2004).
Lastly, permitting the use of copyrighted works for TDM purposes and AI training should be addressed as a separate and distinct issue from the issue of the outputs of generative AI tools. Permitting AI developers or TDM users to use copyrighted works for TDM activities and AI training should not be interpreted as permitting them to produce or generate outputs that are similar or substantially similar to the works used for the TDM activities or AI training process. This difference should inform our approach to law and policy reform actions. There is no concern regarding the existing legal tests for demonstrating that an AI-generated work infringes copyright. There is no reason why the same legal tests for demonstrating that a produced work infringes copyright should not be applied in the context of AI-generated works. This should provide relief to copyright owners that their expressions remain protected under copyright law and that where AI-generated works that have been trained on their copyrighted works include complete reproductions or reproductions of a substantial part of their works, they have recourse under the existing legal framework for copyright protection in Canada. To strengthen the effectiveness of legal remedies, there is however a need to clarify where liability lies when AI-generated works infringe existing copyrighted works.
On the whole, Canada's approach and response to the pressing issue of "unauthorised" use of copyrighted materials for TDM activities and as part of the machine learning process for the development of AI tools should be carefully considered in light of the need to balance the interests of copyright owners in obtaining a just reward for their labour and those of users in disseminating and using copyrighted works for socially beneficial purposes.
Authorship and Ownership of Works Generated by AI
In responding to the issue of the authorship or ownership of AI-assisted and AI-generated works, the Government should be mindful of the fact that humans have used technologies to create works and other materials that have been the subject of copyright protection in Canada and elsewhere. While generative artificial intelligence tools are novel technologies, there are nonetheless tools that are used in creating works. For example, human beings use the technology of camera to capture images rather than paint on a canvas and create a sculpture and we grant copyright protection over those images in the same way we grant copyright over paintings and sculptures. There is therefore no reason why the current framework for the grant of copyright protection is not enough to determine when one gets copyright over an AI-assisted or AI-generated work especially given that these works are generated based on human prompts.
There may however be policy reasons why these "original" works should be given a term of protection that is less than the life of the author plus 70 years, in which case, our copyright regime would need to be modified accordingly. For instance, some jurisdictions give a shorter term of copyright to photographs perhaps because of the huge role that technology plays in creating photographs and the consequent ease of creating photographs. This approach to photograph could be adopted for AI-assisted or AI-generated works given the huge role that the AI technology plays in creating the works and the consequent relative ease of creating those works.
Infringement and Liability regarding AI
There is no concern regarding the existing legal tests for demonstrating that an AI-generated work infringes copyright. There is no reason why the same legal tests for demonstrating that a produced work infringes copyright should not be applied in the context of AI-generated works. This should provide relief to copyright owners that their expressions remain protected under copyright law and that where AI-generated works that have been trained on their copyrighted works include complete reproductions or reproductions of a substantial part of their works, they have recourse under the existing legal framework for copyright protection in Canada. To strengthen the effectiveness of legal remedies, there is however a need to clarify where liability lies when AI-generated works infringe existing copyrighted works.
Comments and Suggestions
N/A
Fenwick McKelvey
Technical Evidence
I am a professor at a non-profit public sector institution. Most recently, I have had to pay ProQuest for the right to data mine newspapers and other articles for not-for-profit research as many data brokers are focusing on commercial markets.
Text and Data Mining
I am expressly concerned that commercial, for-profit research and training AI models will undermine my fair dealing rights and the public interest function of academic research. I am opposed to expanding fair dealing exemptions for training commercial, for-profit AI models usually by large corporate or start-ups. I see no need to modify our fair dealing exemptions and oppose broad exemptions as in Japan for TDM. The fair dealing exemptions focus on uses not technologies and should cover the appropriate exceptions for TDM. If anything better protection of fair dealing need to be developed to prevent situations such as: https://waxy.org/2022/09/ai-data-laundering-how-academic-and-nonprofit-researchers-shield-tech-companies-from-accountability/
Copyright seems well suited already to deal with infringing uses if a user publishes infringing content from a system. My key concern is training data and its negative effects on publication and public data. Information commons, a concept I wish had been discussed further in the Discussion Paper, need to be protected from commercialisation least undermining public access to knowledge and the erosion of the public internet today.
Authorship and Ownership of Works Generated by AI
Ownership of AI is a major issue especially in the classroom when its unclear who owns primarily AI-generated works. The lack of clear ownership raises issues about intellectual honesty.
Any works generated through models trained off public data should be entered into the public domain without copyright protection.
Infringement and Liability regarding AI
N/A
Comments and Suggestions
I would appreciate greater consideration of commons-based approaches to AI and machine learning, see: https://machineagencies.milieux.ca/ai-commons/
Microsoft Corp
Technical Evidence
Access to data is critical for AI development. Recent groundbreaking advancements in AI require the ability to train AI models on vast amounts of training material. Large amounts of varied data is essential to allow AI models to perform accurately and without bias. For example, varied data sets enable diverse languages to be adopted and expressed in AI systems, allows varied cultural perspectives to be reflected, and enables AI to more broadly benefit everyone. Businesses, as well as open source AI projects, rely heavily on publicly accessible data to both develop and use AI.
There is a shortfall in the data that companies and organizations have access to in order to benefit from the promise of AI. Despite having access to publicly available data, many organizations are still unable to access the data they need to be able to develop or benefit fully from AI. More open access to data is needed to help organizations of all types take advantage of AI. Open approaches which encourage data sharing can alleviate this problem. For example, governments are now taking steps to increase access to data: [https://eur-lex.europa.eu/eli/dir/2019/1024; https://ec.europa.eu/commission/presscorner/detail/en/ip_20_2102]
Businesses are also recognizing that more should be done to enable greater access to data to drive societal and economic benefits: [https://www.industrydataforsociety.com/]
Unless governments, businesses and civil society adopt an approach that is more focused on data sharing by default, only a limited number of organizations will have access to the data they need to be successful. Measures to prevent the use of publicly available data for AI training and analysis, for example through preventing TDM on publicly available data online, would greatly exacerbate this problem, and lead to only a handful of organizations who have the means to acquire data benefiting from AI.
In general, training materials can be collected directly by AI developers from public sources, obtained from third parties who collect data and make it available for AI development, through direct agreements with data holders, or generated in house. For example, projects such as the Common Crawl and The Pile play an essential role in making data from publicly accessible sources available for training and using AI, particularly for smaller organizations and researchers that do not have the means to generate or collect massive and varied data themselves. AI developers may also negotiate data acquisition and sharing arrangements directly with data owners for access to proprietary or specialized data.
There is no single approach to collecting and preparing data for training AI models. Different approaches will depend on the model, constraints of the development environment and the intended use of the model. The need to preprocess and curate the data for training continues to change as approaches to technology develop.
Methods for developing AI systems constantly evolve and change as technology develops, but in general, AI models learn by identifying patterns, correlations and concepts across the training data. This process enables new insights from patterns to be gleaned that could otherwise take a lifetime to uncover. A large language model is a highly complex algorithm with billions of parameters. Since AI models are algorithmic functions that read numbers, not text, words are transformed into “tokens” that are represented as numerical vectors. These vectors are generated to represent not just words but information about the semantic and contextual meaning of the words and their relationships to other words in the vocabulary. This enables the model to correlate relationships between words.
During training, algorithms are trained so that they get better at performing a particular task. When training a large language model (LLM), training may involve improving the model’s ability to predict missing words from sentences it has never seen, based on concepts that it learns. The model stores what it has learned by updating the parameters of the function, referred to as “weights”. To do this, the tokenized training data is read by the model. In an example of self-supervised learning, the data will have some tokens masked, i.e., blanked out. This enables the model to predict the missing tokens, and then remove the mask to determine if it predicted the tokens correctly. At the start of the training process the weights may be set randomly and the initial predictions may work poorly. But the model weights will be updated depending on how accurately the model predicts the blanked-out tokens. As more tokens are seen by the model, the model will continue to learn by updating the weights reflecting patterns and trends which relate to underlying concepts in the training data. It is these patterns and trends that relate to concepts that are stored in the model, not the training data.
Once the model has been trained using the data, the model does not use the original training materials or the tokenized training data to perform its tasks. Instead, the trained model can make predictions to guess outputs based on the patterns and concepts it has learned. When the trained model is used, it does not “copy” or “look up” data from a database. They are not recalling and outputting text from a webpage that was contained in the training materials. Fundamentally, they are tools that analyze data to understand patterns so that they can guess outputs. At their core, AI models are extremely advanced and complex statistical models.
To address the concerns of rightsholders, AI developers may take measures to mitigate the risk of AI tools being misused for copyright infringement. Microsoft incorporates measures and safeguards to mitigate potentially harmful uses across our AI tools. These measures include the use of meta-prompts and classifiers, which are controls that add additional instructions to a user prompt to limit harmful or infringing outputs. For example, Bing Chat will decline to provide song lyrics or provide extracts from books if requested.
Microsoft continues to improve current mitigations and implement new ones in response to our learnings and encourages rightsholders to help us think through effective industry best practices. GitHub’s recently announced reference feature was developed with engagement and feedback from the developer community. The feature enables developers to choose whether to block code that matches code in public repositories or allow the code suggestions with information about the matching public code on GitHub, further placing developers in the driver’s seat when using these tools: [https://github.blog/2023-08-03-introducing-code-referencing-for-github-copilot/] In the rare case where a GitHub Copilot suggestion matches public code, developers can see a list of repositories where that code appears and their licenses, so that they can chose whether to use the code, and whether an attribution is needed.
Microsoft has also committed to indemnify and defend customers of our commercial Copilot offerings if a third party sues them for using Microsoft’s commercial Copilot offerings or the output generated by these tools, provided that the customer has used the guardrails built into the products:
[https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/]
This Copilot Copyright Commitment reflects Microsoft’s commitment to building responsible, AI-powered products and tools that limit the risk of infringing outputs. It also provides a strong incentive for Microsoft customers to use the guardrails that Microsoft has included in its products to mitigate the risk of copyright infringement. This program helps Microsoft educate users on appropriate uses of AI technology and reinforce how users can respect intellectual property rights.
Microsoft has also introduced new options for webmasters to control use of their web content in responses provided from Bing Chat. Using this feature, content owners can control the results that are identified in search by preventing their content from being provided through the chat interface. This change came from collaboration with rightsholder communities. And Microsoft has offered the ability for living artists to request that their name not be used to generate prompts in Bing Image Creator.
These steps are not required by copyright law, but Microsoft is committed to listening to the concerns of artists and creators and looking for ways to address potential concerns that arise from the use of generative AI.
The degree of involvement of humans is significant in the overall development of AI systems but varies across each stage. In relation to the training of AI models and the handling of training data, recent advancements have shifted the degree of involvement, and the training process is expected to continue to evolve as research and the technology further develops. For example, until recently, for some machine learning methods, it has been necessary to curate and label specific types of data, making these methods labour and resource intensive. Self-supervised methods of machine learning, which may precede other methods like reinforcement learning from human feedback (RLHF), do not require a human to label the data; as a result, self-supervised methods have vastly increased the scale of data that machine learning methods are able to read, giving rise to the enhanced performance that we are seeing now. Developers of large-scale AI models developed with self-supervised training therefore optimize for the quantity of data – since the more data available to train a model, the better the performance of the model. Narrower data sets and human involvement enable RLHF, and humans are also involved in efforts to develop platform services and applications that leverage models and in the process of mapping, measuring, and managing risks with models and systems.
Text and Data Mining
There is ongoing debate regarding how copyright can be used to prevent the development or use of AI systems, when the development or use involves the analysis of a work protected by copyright. This uncertainty discourages the development and use of AI in Canada, and creates uncertainty among rightsholders regarding how their works may be used, particularly with respect to the works they chose to make freely available online. More clarity could be created by issuing guidance explaining that TDM is not a copyright infringement, pointing to existing legislation that permits TDM such as the exceptions in the Copyright Act for fair dealing and temporary reproductions for technological processes. More information could be made available to rightsholders to explain ways in which rightsholders can control the use of their work, for example by technically restricting access to their works. Clarity would be best achieved through legislative changes to copyright law to introduce an explicit copyright exception for TDM such as that introduced in Japan and elsewhere.
TDM activities, such as AI development and use, are more likely to take place in jurisdictions where there is legal clarity that such acts are permitted. Nevertheless, due to the broad nature of TDM to capture other types of data analysis beyond AI training, it is highly likely that TDM activities are taking place in Canada, particularly in relation to data analytics more generally and in relation to the use of AI to analyse large data sets.
Performing TDM does not require a copyright license because performing TDM is not a copyright infringement. It is not a copyright infringement to analyse works and learn concepts and facts. Intermediate copies that may be created in the technical process in order for the machine to analyse the work would be permitted under existing exceptions in Canadian copyright law. Confusion over this point can create challenges for rightsholders when deciding how to build business models for works that they may want to monetize. While TDM is not a copyright infringement that requires licensing, there are other opportunities to build revenue models around data services. New markets and business models for data providers can flourish as a result of AI development, without the need to place prohibitions on permissionless uses, such as reading and training. For example, copyright owners can enable convenient methods of access that eliminate many of the hurdles and expenses associated with large scale crawling or mass digitization. Owners can include non-public metadata and other materials that make these works more useful than the same content crawled online. While licenses are not required to train AI, rightsholders may choose to provide licenses in relation to the output of a model to enable users to confidently use the rightsholders work in the outputs that would otherwise be a copyright infringement.
If the Government were to amend the Copyright Act to further clarify that the scope of copyright protection does not prevent TDM, Canadian copyright legislation would benefit from a clear TDM exception for both commercial and non-commercial purposes, to address the potential lack of clarity mentioned previously. This will encourage AI development and use in Canada and create more investment in this sector.
Microsoft is committed to transparency to ensure that AI systems are understandable and that their capabilities and intended uses are clear. However, there should not be a requirement for AI developers to keep records of, or disclose what copyright-protected content was used in the training of AI systems. Training large scale AI models, often referred to as foundation models, analyse vast volumes of data, often from publicly available sources. It would not be feasible to record such information and any such requirement would inhibit AI development.
Microsoft is committed to an open dialogue with the creative and publishing industries to ensure that we support the ecosystem of content creators in a changing digital environment. The creator industry is, and will continue to be, at the heart of our culture, shaping our identity and values. This is why AI should remain human centric and is best used as a tool to support human creativity. AI will play an important role in creating value and providing benefit across all sectors and industries, including creator industries. AI will enable greater human expression and creativity, empowering even more people to create like never before. It is important to keep in mind that it is not a copyright infringement to learn from copyright-protected works, and the use of AI to read and learn should not require compensation. However, certain applications of AI may generate outputs that impact existing business models of artists, and we should be discussing how to support artists in those situations.
Authorship and Ownership of Works Generated by AI
It is important that human authors be able to secure copyright protection in their works regardless of what types of tools they use in their creative process, whether more traditional tools such as cameras and filters, or more technically advanced tools such as computer aided design software or generative AI. Often, an author will use substantial creativity and exercise skill and judgment to instruct the AI tool to produce the desired result. For decades, authors have used both human and technical assistants to create their works, particularly for large works such as architectural designs or massive murals, but use of those assistants and tools have never blocked authors from obtaining copyright protection for their original works. The author still controls the creative process and decides on the finished creation.
Consider, for example, software developers that are using generative AI to assist in the generation of code. GitHub Copilot is behind an increasing percentage of lines of code written by developers using the tool, 46% early this year, and predicted to increase to 80% in the coming years: [https://github.blog/2023-02-14-github-copilot-now-has-a-better-ai-model-and-new-capabilities] However, the developer is in control of the entire development process: the structure of the program, how they prompt Copilot for suggestions, how they accept, iterate on, or edit suggestions. And in some ways, developers are in even more control of the work they are creating when using Copilot because they remain focused on the creative process rather than searching for documentation and examples. For this reason, the current guidance from the U.S. Copyright Office to set a threshold level of creativity, and only claim the human contributions, is not feasible to follow and should not be adopted in Canada.
As a policy matter, disallowing copyright protection for works created with the assistance of AI tools risks chilling the adoption of this new technology. Individuals who use these tools as part of their creative process will need certainty that the works they generate will be eligible for copyright protection. Without such assurances, the commercial viability of the works made using AI tools is undermined, which will significantly impact the interests of creators who use these tools. The adoption of these tools will also be impacted, which in turn will discourage the development of these tools, which could ultimately negatively impact the competitive advantage of Canada in the AI space.
No legislative changes are necessary regarding copyright ownership and authorship in light of AI assisted works. AI is a tool like any other tool that artists can use to help them create new and innovative works. Existing copyright law does not prevent a human artist from being an author of a copyright work when AI is used. The AI system should not be considered an artist or author and this is consistent with existing legislation.
Infringement and Liability regarding AI
Canadian copyright law provides rightsholders robust protection against the use of AI to create works that are a substantial taking of a copyrighted work without permission. The existing legal tests in Canada are sufficient.
A rightsholder should not need to know if a work has been accessed by an AI model. This is particularly relevant as technology evolves to increase the effectiveness and efficiency of training. It is conceivable that data will not need to be stored in order to train. Data can already be streamed for training purposes, it is also more energy efficient for algorithms to be close to the data they are trained on. It is possible that algorithms could move across publicly available data in a similar way to how federated learning algorithms are dispersed. Keeping track of data that has ever been read by an algorithm, which may include real time sensor data and transient data sources etc., may add significant processing costs and energy consumption demands, and compromise possible advancements in this area.
One reasonable line of inquiry for rightsholders would be in relation to the circumstances leading to an output of AI that infringes copyright, i.e., that it includes a substantial taking of a copyrighted work without permission. In this case, other factors will be more relevant than the existence of the work in the data that was used for training, such as the way in which the AI was used, including the construction of the prompt which may include the work of the copyright owner. Such an inquiry may include the analysis of confidential information belonging to the user of the AI and should be left to the courts to consider. AI developers may assist in determining whether AI outputs are similar to the original work of a rightsholder, rather than being commonly presented patterns in different sources, by using tools such as code referencing used in GitHub Copilot: [https://github.blog/2023-08-03-introducing-code-referencing-for-github-copilot/]
To address the concerns of rightsholders, AI developers may take measures to mitigate the risk of AI tools being misused for copyright infringement. As previously stated, Microsoft incorporates many measures and safeguards to mitigate potentially harmful uses across our AI tools. These measures include metaprompts and classifiers, controls that add additional instructions to a user prompt to limit harmful or infringing outputs. For example, Bing Chat will decline to provide song lyrics or provide extracts from books that are available online.
Microsoft has also introduced new options for webmasters to control use of their web content in responses provided from Bing Chat. Using this feature, results that are identified in search can be blocked from being provided through the chat interface. This change came from collaboration with rightsholder communities. And Microsoft has offered the ability for living artists to request that their name not be used to generate prompts: [https://www.bing.com/images/create/help?FORM=GENHLP] These steps are not required by copyright law, but Microsoft is committed to listening to the concerns of artists and creators and looking for ways to address potential concerns that arise from the use of generative AI.
Microsoft continues to improve current mitigations and implement new ones in response to our learnings and encourages rightsholders to help us think through effective industry best practices. GitHub’s recently announced reference feature was developed with engagement and feedback from the developer community. It lets developers choose whether to block code that matches code in public repositories or allow the code suggestions with information about the matching public code on GitHub, further placing developers in the driver’s seat when using these tools.
Microsoft has also committed to indemnify and defend customers of our commercial Copilot offerings if a third party sues for using Microsoft’s commercial Copilot offerings or the output generated by these tools, provided that the customer has used the guardrails built into the products: [https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/] This Copilot Copyright Commitment reflects Microsoft’s commitment to building responsible, AI-powered products and tools that limit the risk of infringing outputs. It also provides a strong incentive for Microsoft customers to adopt responsible practices to mitigate these risks. This program helps Microsoft educate users on appropriate uses of AI technology and reinforce how users can respect intellectual property rights.
Generative AI can be applied to many different scenarios, many of these scenarios have nothing to do with the interests of copyright owners such as drug discovery and material development. In some cases, however, where AI can be used by a user to create expressive works, AI could be used to create works that infringe copyright. The way in which the user uses the tool is important. Just like other general-purpose technologies, like the photocopier, computer, search engine, and camera, copyright law places responsibility on the user for infringing activity, and the user must act responsibly when using these tools.
Comments and Suggestions
Microsoft believes that AI has the potential to improve people’s lives in ever-expanding ways. The ability of AI to help advance human knowledge and understanding will lead to improvements in medicine, science, and industry. Organizations and individuals will use AI to innovate, create, obtain critical insights, and address significant societal challenges. AI will power tools that make everyone more productive at work, school, or home. Microsoft is confident in the promise of AI and its capacity to improve the human condition.
Microsoft and its customers use AI systems across a wide range of sectors to solve real world problems by analyzing and understanding processes, methods, information, facts, and insights contained in documents, media, data, and articles (some of which include copyrighted works). This website documents examples of how companies of all sizes are using Microsoft AI in their businesses: [https://www.microsoft.com/en-us/ai/ai-customer-stories] Furthermore, many societal benefits are supported by AI, such as enabling resource management when supporting first aid responders in emergencies and identifying locations that are most vulnerable to climate change. Enabling rapid drug discovery, improving public services and enabling better access to education.
We recognize, however, that some artists, writers, musicians and other creators have questions and concerns around the impact that AI, and especially large-scale “generative” AI, will have on their work and their economic opportunities. These fears echo concerns voiced with innovative technologies in the past: the printing press, the camera, the photocopier, the VCR, the internet. And while these concerns are understandable in a period of technological advancement, we have also heard from other professional artists that generative AI is empowering them to make art more accessible and allowing them to pioneer entirely new artistic mediums. We are committed to continue to listen to all stakeholders to strive to develop AI to serve society broadly.
Sarit K. Mizrahi
Technical Evidence
N/A
Text and Data Mining
Clarity surrounding the legitimacy of text and data mining under copyright is undoubtedly necessary to cement future growth in the AI arena. The approach adopted, however, should avoid the kind of technology specific exceptions that have proven harmful to progress in the past. Rather, it should take the form of a generalized exception that is capable of providing a legislative foundation for the development of AI as well as any future copy-based technologies. And doing so requires legislative modifications capable of limiting copyright protection of uses on the basis of their purpose.
Copyright essentially governs uses, full stop. It doesn’t take the pains to differentiate between uses that are communicative – and therefore compel authors’ speech, essentially inflicting the harm that the concept of infringement was designed to address – and uses that are not; uses that merely wield works’ material form for purposes that have little to do with their nature as instances of dialogue. These are uses – or rather ‘nonuses,’ as law professor Abraham Drassinower dubs them – whose recognition under copyright quite simply overlooks “that it is not as a pattern of ink on a page, so to speak, but only as a communicative act that a work falls within the purview of copyright law” (Abraham Drassinower, What’s Wrong With Copying (Cambridge: Harvard University Press, 2015) at 13).
Yet, even when copies are clearly not being used for communicative purposes, copyright law has found the need to position itself on such issues. And it oftentimes does so by adopting technology specific exceptions that wind up generating future confusion over the permissibility of copying by other technologies that function somewhat differently. Take, for instance, the temporary reproductions that facilitate our Internet browsing activities, the ‘copies’ that enable the technical process responsible for increased dissemination of and access to knowledge. Involving “use of a work’s material form not as a work but as a tool, not communicatively but technically, […] it cannot be said to compel authors to speak” (Drassinower, ibid at 183). It is, in other words, a paradigmatic instance of nonuse. And yet copyright found it necessary to carve out an exception to permit this kind of ‘use’ leading to the impression that such nonuses might have otherwise qualified as infringement. This provision – and the underlying notion upon which it’s based – have done little more than create ambiguity surrounding what, in fact, amounts to an actionable ‘use’ in the digital age.
The same amendments that limited liability for making temporary digital copies, for instance, left Cloud Service Providers wondering whether the more permanent technical copies necessary to facilitate cloud-based storage might be held as infringing. These are copies that are neither perceptible nor consumptive; they’re created strictly for the purposes of ensuring users’ access to data no differently than the temporary copies recognized as legitimate. And yet providers and their users were faced with uncertainty, unsure of where they stood within copyright’s narrative; skeptical about the legal ramifications involved in the provision and use of these services.
Rather than embracing new technologies that could enhance users’ liberty to generate cultural contributions, copyright often acts as a barrier that diminishes users’ freedom to engage in our creative landscape – all over mere technical ‘uses’ and imperceptible ‘copies.’ And the natural byproduct of this trend is what has inevitably led to the current debates surrounding the legitimacy of the copies upon which machine learning algorithms rely. If mere technical, imperceptible, and non-consumptive copies find themselves within copyright’s ambit, surely there can be no question that artificial intelligence’s extraction of value from copyrighted content falls squarely within the sphere of ‘unauthorized use.’
It's such that, by creating an exception for the paradigmatic instance of nonuse characterized by temporary reproductions for browsing, copyright placed in question the viability of future technologies that similarly rely on ‘the copy’ in a sense that’s less temporally limited but equally technical in nature. And in so doing, copyright has pre-emptively delegitimized an entire novel creative sector; has reduced AI developers to mere infringers, rather than provided the necessary foundation to recognize them as equal and autonomous creators under the law. Instead, developers find their creative choices dictated by the purviews of copyright; their autonomy to choose the quality data upon which they train their algorithms limited by the very nature of copyright itself (the human rights consequences of which have been extensively mapped by other academics within the context of the previous government consultation as well as elsewhere, so I will not repeat them here (see e.g. Amanda Levendowski, “How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem” (2018) 93 Wash L Rev 579).
And when it comes to generative AI, developers’ use of subpar data – while perhaps less likely to produce significant copyright harms – instead results in tangible harms that are far more detrimental to a society seeking to promote a diverse and inclusive social dialogue. It’s a trade-off that prioritizes the interests of rightsholders at the expense of our fundamental liberties; one that delegitimizes the creative practices of developers in ways that are already proving to inflict widespread societal damage (See e.g. Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (New York: New York University Press, 2018); Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown, 2016); Ruha Benjamin, Race After Technology: Abolitionist Tools for the New Jim Code (Cambridge: Polity Press, 2019); Anupam Chander, “The Racist Algorithm?” (2017) 115 Mich L Rev 1023; Kate Crawford, Atlas of AI (New Haven: Yale University Press, 2021)).
For these reasons, it is critical to avoid the kinds of pitfalls that previous technology-specific exceptions to copyright have produced lest they place in question the legitimacy of future copy-based technologies. Rather than adopting a text-and-data-mining exception to enable AI development, it would be far more judicious to opt instead for replacing all existing technology-specific exceptions with a single broad and technologically neutral one that permits the use of copyright-protected content’s material form for technological developments, whatever they may be; that recognizes that ‘nonuses’ are not the kind of harm copyright should target. It’s only in so doing that copyright will stop being an impediment to progress.
Authorship and Ownership of Works Generated by AI
Although the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works does not appear to be impacting the development and use of generative AI, the law’s intervention is necessary to provide clarity about which uses of generative AI will be eligible to enjoy copyright protection. It’s crucial that whatever position is adopted differentiates between the uses of generative AI that advance copyright’s purpose of promoting progress and knowledge pursuit in order to ensure that it doesn’t induce unnecessary copyright expansion. To this end, it’s important to differentiate between fully machine-generated content and works that are AI-assisted.
1. Fully Machine-Generated Content Should Enter the Public Domain
Extending copyright to fully machine-generated content risks hindering knowledge pursuit and negating copyright’s fundamental goals. And it can do so in two ways. First, the speed with which generative AI can produce creative content – and in some cases, all possible combinations of content in a given field – would exponentially increase the number of works under copyright protection. Doing so would distance copyright from its central maxim, essentially making it the rule rather than the exception. And this approach would in turn limit the ways in which downstream creators could engage in their own pursuits by subjecting them to constraints over how they can draw on such content. Second, endowing fully machine-generated content with copyright protection is likely to disincentivize many from developing their own knowledge skills. The ease with which they could create and obtain copyright’s benefits using generative AI would likely lead them to rely more heavily on machines to create in their stead as they would be unable to justify the time and effort required to produce creative works themselves, ultimately reducing human participation in our creative culture and stalling the state of human knowledge.
In light of the foregoing, there is little to support the copyright protection of fully machine-generated content. While it’s recognition under copyright would irrevocably alter the nature and substance of our social dialogue, negating the goals that copyright as a construct was designed to promote, it is unlikely that preventing generative AI’s outputs’ protection will thwart advances in this arena. There are already extensive uses for generative AI, even despite the uncertainty regarding their outputs’ copyright protection. They include generating simple newspaper articles, producing artistic works that can be sold for monetary gain, cybersecurity management, and human-machine collaborations – and these are but a few of the increasingly growing uses of generative AI.
2. Considerations Regarding the Protection of AI-Assisted Works
The risks presented by fully machine-generated content, however, are less likely to materialize in cases where generative AI is merely used to assist in the creation of a work. In order to promote the continued use and development of generative AI, and to shape its evolution in ways that are consistent with our cultural and dialogic values, copyright should endow AI-assisted works with protection as long as the human author’s contribution is sufficient and goes far beyond merely introducing prompts into the generative AI system. Whatever regime is devised for the protection of AI-assisted works must be capable of ensuring that the human author’s use of these kinds of systems remains in line with copyright’s values.
First, it should ensure that authors of AI-assisted works do not use generative AI to replace the process of knowledge pursuit. Knowledge forms the building blocks of the kind of creativity that copyright seeks to promote. But how we pursue, create, and proliferate knowledge shifts with each advancement in information technology, forcing us to consider in greater depth whether or how each new development challenges our vision of the social dialogue’s foundation. Generative algorithms are no different. And we must ensure that the integration of algorithms into the creative process does not reduce the act of knowledge pursuit at its core to something more computational and less intellectually stimulating that fails to advance the cultural discourse in any meaningful way.
Incorporating algorithms into the creative process shouldn’t be taken to imply that knowledge isn’t being pursued. If authors use algorithms merely to complement their role as human creator, they may just discover new ways of pursuing knowledge that spur novel forms of creativity. But if algorithms are used to assimilate knowledge in authors’ stead, as opposed to merely enhancing their creativity, it’s crucial to make more pointed inquiries. Chief amongst them being whether replacing the traditional building blocks of creativity with ones that are algorithmically generated negates the resulting work’s advancement of the social dialogue. It’s one thing for an author to turn to algorithms for inspiration after having acquired the knowledge necessary to offer ‘useful’ contributions to the existing creative landscape. It’s quite another when an author employs algorithms to avoid engaging directly with pre-existing culture at all.
In effect, pursuing knowledge through the intermediation of an algorithm changes the building-blocks of knowledge that culminate in creativity, and in so doing, it fundamentally challenges our long-standing conceptions surrounding the ontology of authorship. Authors are traditionally considered as simultaneous consumers, remixers, and producers of works of authorship, works that are borne from social dialogue and that seek to further engage it. Creators who use algorithms to enhance their interaction with the works of their predecessors could still be categorized in this traditional fashion. But creators who use algorithms to assimilate knowledge in their stead – who bypass this important stage meant to develop their connection with pre-existing culture – become another sort of creator entirely: they become consumers and remixers of algorithmic output.
But, as law professors Carys Craig and Ian Kerr profoundly observe, “human communication is the very point of authorship as a social practice” (Carys Craig & Ian Kerr, “The Death of the AI Author” (2021) 52:1 OLR 31 at 86). In other words, both the social dialogue and authorship as a construct rely on authors’ engagement with pre-existing authors. This position doesn’t imply that authors’ use of generative algorithms trained on the works of others necessarily precludes their creations from partaking in the social dialogue. Rather, it suggests that some knowledge of the works that make up these training datasets may be a necessary prerequisite to upholding our cultural discourse.
Additionally, we must consider whether it’s at all desirable to allow our creations to be influenced by algorithmic perceptions of content whose origins we’re unaware of. The algorithm could very well be trained on the paintings of the greats, or the doodles of a 5-year-old, or even (as was discovered in the case of the datasets used to train Stable Diffusion’s algorithms, among others) child pornographic material. It seems contrary to progress to foster an environment in which authors adopt algorithmic contributions unquestioningly, allowing computer code to lead their creative works in distinct directions without reflecting on what exactly is motivating those outputs.
Modern-day algorithms have already proven to strip us of “the human expectation of sovereignty over one’s own life and authorship of one’s own experience” (Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (New York: Public Affairs, 2019) at 521). By mediating and reshaping both our perceptions and experiences, they play a significant role in ‘helping’ us formulate our thoughts and opinions, or even ‘assisting’ us in our decision-making processes. With algorithms’ widespread capacity to influence our actions and worldviews having long raised concerns surrounding their very palpable repercussions on our self-determination, copyright law’s recognition of AI-assisted must take steps to ensure that authors who collaborate with generative AI are provided with the tools necessary to preserve their authorial autonomy and prevent our social dialogue from being invisibly guided by these technologies.
Considering society’s growing tendency to embrace algorithmic suggestions without thinking twice about them, it would be naïve to suggest that authors would be immune to the sort of reflexivity that tends to be fostered by algorithms whose training datasets are unavailable for scrutiny. Authors’ lack of awareness would serve to disempower them, to strip them of the information they need to immerse themselves in the sort of reflection that promotes creative autonomy and enables engagement in the kind of lofty pursuit that contributes to the social dialogue. And authors who aren’t empowered to reflect on the cultural discourse, neglect to exert the sort of independent thinking that pursues the kind of progress that copyright was designed to promote.
In his thought-provoking discussion surrounding what separates us from machines, Rabbi Lord Jonathan Sacks observes that humans are defined by their unique capacity to “shape the world; to be active, not merely passive, in relation to the influences and circumstances that surround [them]” (Jonathan Sacks, <http://rabbisacks.org/three-stages-creation-bereishit-5779/>). If we are to maintain this characteristic that has proven so pivotal to our existence, the law must ensure that it’s recognition of AI-assisted works enhance the agonistic tendencies that inspire our capacity to think critically – and to shape the cultural and creative landscapes as a result – rather than permit them to be constrained by machines.
Infringement and Liability regarding AI
Greater clarity on where liability lies when AI-generated content infringes copyright protected works would certainly be welcome. But so too would it be useful to obtain clarity on what kinds of generative AI outputs are, in fact, infringing. In essence, to avoid unduly hindering progress, it is crucial to carefully assess which harms to authors arising from generative AI’s outputs are, in fact, copyright harms before including them under the umbrella of infringement.
Generative algorithms that are explicitly created to replicate the styles of certain authors have lately been a topic of great debate. While many such services could have significant uses other than to enable infringement (Copyright Act, s 27(2.4)(c)), several question whether these are enough to detract from their ability to so precisely replicate the styles of particular artists. Are their non-infringing uses, in other words, more prominent than the potential harm suffered by these authors?
If the only content used to train an algorithm is of a single artist it might be difficult to argue that any output is not, at least in some measure, an infringing reproduction. But, where it learns to emulate style from a dataset of a particular artist, that it then amalgamates with the billions of other images upon which it was trained, it would be difficult to see any resulting output as a direct reproduction, let alone an infringing one. Drawing on the mathematical representations of other images to produce a new image in the style of a pre-existing artist must, forcibly, be a new work (whether or not this new work ought to be eligible for copyright protection will depend on the extent of input from the human collaborator (see response to above question on generative AI and authorship)). Thus, even if developers of these kinds of AI models were aware that their service was being used in this fashion, and even if they specifically created their generative algorithms to emulate the styles of others, they should not be held liable for infringement. Their goal was not to infringe copyright, but rather to emulate the style of pre-existing artists as so many have done before them.
And despite some debate on the topic, it’s generally held that style is precluded from copyright protection (Jane C Ginsburg, “Exploiting the Artist’s Commercial Identity: The Merchandizing of Art Images” (1995) 19 Colum-VLA JL & Arts 1 at 10). Its exclusion is necessary for achieving the very goal of copyright; for ensuring the promotion of progress. Impressionism, cubism, modern art, and so on, are all styles that were used by numerous artists over the years, each in their own way. If copyright were to encroach on style, the future of culture would hang in the balance. That AI models can generate new images in a pre-existing style in a far more sophisticated fashion than any human should not militate in favour of including this element under copyright’s umbrella.
But, while a claim of copyright infringement ought not to be available to authors whose works aren’t directly copied in the generative AI’s output, where the original author has a distinctive style that is of some renown then she might possess a valid claim in virtue of the tort of passing off if she can prove that she has incurred, or is likely to incur, damages. The extension of the tort of passing off to the creative (as opposed to the trademark) context is, however, quite particular. Courts have generally been reluctant to recognize unfair competition arising from creative works that don’t involve any direct copying, as this would indirectly accomplish through a claim of passing off what copyright doesn’t allow directly, effectively extending a protection to style that might otherwise unreasonably limit the speech of others.
The American case of Romm Art v Simcha International (786 F. Supp. 1126 (E.D.N.Y 1992) is quite illustrative of this tension. Here, the plaintiff, who published posters of artist Tarkay, pursued the defendant in passing off for publishing posters by an artist named Patricia that imitated Tarkay’s visual style in the absence of direct copying. The Court concluded that the likelihood of confusion existed because both posters conveyed “the same overall impression” (ibid at 1137) and appeared to exude Patricia’s publisher’s “intention of capitalizing on the plaintiff’s reputation and goodwill and any confusion between his and [plaintiff’s] product” (ibid 1139). Particular to this case was evidence that some galleries would frame the Patricia posters in a fashion that would conceal her name, which further militated in favour of the potential confusion.
“As applied to the fine arts themselves,” notes law professor Jane Ginsburg, “the decision seems troublesome: many artists styles owe a great deal to their predecessors. The Tarkay ‘look’ itself strongly resembles a Matisse crossed with a Modigliani. One would nonetheless be reluctant to suggest that either of these artists’ heirs should be able to enjoin the dissemination of Tarkay’s work, or that Tarkay should pay them royalties” (Ginsburg, supra at 16). But, Ginsburg continues, “there may be a difference between emulation of another’s artistic style on the development of one’s own pictorial expression, and the commercial appropriation of an artist’s identity” (ibid, emphasis added). In other words, it may be acceptable to draw inspiration from pre-existing artists, but it’s certainly not alright to imitate their style in a bid to pass off one’s work as that of another’s.
What can be drawn from this line of reasoning within the context of stylistic imitations by generative AI? Let’s take Canadian Sam Yang, for example, whose art has been used to train a generative algorithm to produce works in his style. An artist of some renown, he sells his illustrations for educational and commercial purposes while also providing art instruction to nearly one million subscribers through his YouTube channel, Sam Does Arts. Not only do each one of the images produced in his style by the generative algorithm look like they could very well have originated from him, but they each equally include the label SamDoesArts. Although the burden of proving goodwill is quite significant, often requiring sufficient notoriety and distinctiveness of a ‘style’ such that it would automatically be associated with the artist’s work, Yang’s discrete following would likely militate in his favour. Once established, this evidence would go a long way in proving that the AI-generated images in Yang’s style may have the potential to deceive those familiar with his work into believing that he was the source of their creation.
The question is: who would ultimately be liable for passing off in the imitation of Yang’s style? On the one hand, while the developer of the AI model in question may not itself be trading in imitations of Yang’s work, their mere provision of a service that could be used to train algorithms to generate stylistically similar works might lead potential purchasers of Yang’s artwork to quite simply use this technology to replicate his style. But while the likelihood for damage to his goodwill exists, it’s not the AI model itself that creates the risk of confusion; anyone using it remains clear that the images generated didn’t originate from Yang himself. From this perspective, then, it’s those who use these images in public-facing situations, where the potential for confusion as to the provenance of the artwork might occur, that would likely be liable under the tort of passing off. Although this approach may not provide a remedy for those situations in which Yang might lose a sale where ordinary users generate images in his style for personal use rather than purchase one directly from him, it does at least tackle the unfair competition arising from the commercial use of images produced by this algorithm in his style.
I am not, through this analysis, contradicting the uneasiness expressed by many authors and artists who feel as if their creative expressions are being misused by generative AI, nor am I claiming that these issues shouldn’t be tackled by the law. What I am saying is quite simply that copyright isn’t the appropriate legislative vehicle to address all the harms suffered by pre-existing authors arising from the use of generative algorithms; that before jumping to the conclusion that copyright must tackle every single misuse of original works by AI, we must carefully consider whether the harm in question is, in fact, a copyright harm. If it isn’t, we must resist the urge to inappropriately expand the scope of copyright, and quite simply rely on other legal avenues that are better suited to address the harms in question. Doing otherwise risks producing unnecessary and avoidable harms that are far more threatening to our free and democratic society.
Comments and Suggestions
N/A
Heather Morrison
Technical Evidence
In libraries, machine learning AI in the form of recommender systems ranking results by relevance (like Netflix) is in widespread use. Generative AI is in earlier stages of exploration and/or implementation in libraries and information management as a means of further automating and enriching information resource description and classification. On the other hand, the tendency of popular AI tools such as ChatGPT to invent content is raising concerns about spread of mis- and disinformation, complicating the work of ensuring that the public has access to high quality, accurate information. In academia, AI is in early stages of use for the purposes of accelerating research. AI raises both interest and concern with respect to pedagogy. Noteworthy examples of emerging types of applications include language learning supports for students, brainstorming, and automated translation, noting that results to date are best considered as early drafts.
Text and Data Mining
TDM for discovery purposes should be legal across all kinds of materials (e.g. to find songs, films, novels, and stories of interest, not for AGI training). To facilitate the advances AI is making possible in scientific and non-commercial research, TDM for training AGI should be legal for these purposes (follow UK / Switzerland example). One recommended change in copyright law to facilitate AI advances in Canada is to eliminate Section 41 Technological Protection Measures and Rights Management Information from the Copyright Act. This section prohibits circumvention even for purposes that are legal under the Act while it is unnecessary for purposes that are illegal under the Act. AI developers should be required to track and disclose materials used for training purposes. Legislation to this effect at this time would encourage development of efficient automated processes at an early stage in AI development.
Authorship and Ownership of Works Generated by AI
Rapid growth of AI-generated content demonstrates that concerns about authorship and ownership are not a significant impediment. A Google search for "Amazon ChatGPT self-publishing" retrieves over 21 million results, with how-to books and publishing services at the top of the list. ChatGPT can produce a story "in the style of" a human author such as Margaret Atwood in seconds. This rapid growth raises two types of concerns 1) for human creators whose works and identity can easily be used with AI training to create new works to compete with the original creator and 2) for increasing production and distribution of mis/disinformation when a tool like ChatGPT (described by AI experts as having a tendency to "hallucinate") is used to create nonfiction works without the oversight of human experts.
Infringement and Liability regarding AI
N/A
Comments and Suggestions
Achieving the potential benefits of AI requires TDM exemptions for scientific and non-commercial research following the UK / Switzerland example and elimination of Section 41 of the Copyright Act Technological Protection Measures and Rights Management Information. Most potential benefits of AI do not involve the use of others' copyrighted material – for example, companies and individuals using AI to automate or build on their own work. Encouraging AI users to make use of the copyrighted work of human creators raises two concerns, 1) the possibility of training AI using the work and identity of a human creator to capitalize on their identity and compete with them in the marketplace, and 2) the possibility of increasing creation of mis/dis-information in the case of non-fiction works. Concern about AI identity misuse is broader than traditional copyrighted works, for example use of images of individuals in pornographic works without their knowledge or consent.
Motion Picture Association - Canada
Technical Evidence
Please see MPA-Canada's responses to the other sections of the Consultation Paper.
Text and Data Mining
1) Exceptions/Limitations to Address Generative AI, including an Exception for Text and Data Mining (TDM), Should Not be Introduced
Canadian copyright law has long accommodated/embraced new technologies, including via the doctrines of fair dealing and technological neutrality. These doctrines are designed to balance the rights of owners and users in a predictable and fair manner. MPA-Canada does not believe amendments to the Copyright Act pertaining to generative AI are necessary/appropriate at this time, including the introduction of any exceptions/limitations, such as an exception for TDM from the Copyright Act’s requirements, just because AI is involved.
Commercial uses in the context of training AI models should not be treated differently than other commercial uses. The Government should navigate the interplay between copyright and AI with moderation, restraint, and respect for copyright. Reflexive approaches that do not take into account the speed with which AI is evolving and the diversity of AI technologies have the potential either to create unreasonably broad copyright exemptions or to hamper innovation. Such approaches miss the opportunity to find the appropriate middle ground and risk disrupting emerging licensing markets benefitting AI developers and copyright owners.
MPA-Canada supports the existing international legal framework for protection of copyright and related rights. That framework provides a principled consistency based on global norms while still allowing for differences in national approaches, e.g., Canada relies on “fair dealing” to determine whether a statutory exception to copyright infringement applies (i.e., the dealing is for an enumerated purpose). If a statutory exception applies, the court will then determine whether the dealing is fair in fact-specific circumstances based on the application of six non-exhaustive factors, an approach that is similar to that applied by courts in the U.S. in respect of the “fair use” defence. While other countries have adopted more specific exception-based systems, MPA-Canada believes that in its current state, the existing “fair dealing” and “technological neutrality” frameworks are sufficiently robust to address novel AI uses predictably and fairly. At this time, there is no need for Canadian copyright law to adopt special exceptions to copyright for AI, including a TDM exception.
In contrast to the approach under Canadian law, which applies doctrines designed to balance the rights of owners and users in a predictable and fair manner, other jurisdictions have hastily introduced broad/inflexible TDM exceptions in the name of promoting innovation, e.g., Japan has enacted a “non-enjoyment” exception for TDM, which generally exempts TDM from the requirements of Japanese copyright law, provided: (1) it is “not a person’s purpose to personally enjoy or cause another person to enjoy the thoughts or sentiments expressed in [the copyrighted] work”; and (2) the use does not “unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation.” Singapore’s TDM exception is equally broad and does not provide any ability for rightsholders to opt out. Aiming to address creators’ concerns and to minimize risks of copyright infringement by AI developers and users, the Japanese government is consulting on the relationship between AI and copyright around the scope of the TDM exception, particularly the lack of a lawful access requirement and an opt out for rightsholders.
MPA-Canada submits these types of exemptions are bad policy, and they likely fail to comply with the Berne Convention’s “three-step” test, e.g., bad actors may use overbroad TDM exceptions as a pretext for both infringement and the downstream use of infringing works for any purpose. As discussed below, copyright owners’ licensing markets for training AI models have been developing and amendments to the Copyright Act that would broadly exempt certain unauthorized uses would interfere with those emerging, mutually beneficial markets. MPA-Canada believes the Copyright Act should not be amended to adopt broad exceptions that Japan, Singapore, and other jurisdictions have enacted, and should instead adhere to the fair dealing and technological neutrality frameworks for analyzing particular use cases.
While AI will continue to raise many interesting/important copyright issues, the Copyright Act and case law interpreting it are well-suited to address these issues. As new issues arise, Canadian courts are well placed to approach these questions in a thoughtful/careful manner using the already flexible framework provided under domestic copyright law. Accordingly, the Government should resist trying to draw definitive conclusions for nascent technologies based on limited experience and information. To the extent developers wish for certainty in their use of copyrighted works to train AI systems, they can always seek licenses from the copyright holders. Such licensing will also help avoid other concerns about AI use, because the training inputs are likely to be of higher quality, tailored to the specific AI system’s needs, and supported by the requisite consents, minimizing risks of error, bias, and violations of privacy.
2) Affirmative Consent (Opt-Ins) to the Use of a Copyright Owner’s Works for Training Materials
As a basic principle, if “fair dealing” or another copyright exception does not apply to the use of works for AI training, such use is infringing and it is the user’s obligation to affirmatively obtain consent from the owners to use the owners’ works, i.e., opt-in.
While some AI developers have taken a step in the direction of an opt-out process, proposals for opt-out processes present significant challenges for the proper implementation of consent systems and for enforcement of copyright. In addition, proposals for opt-out may also prove to be unworkable.
At present, there are two main types of opt-out: (1) opting out with the AI developer directly, which some AI developers require to be done for each individual piece of content; and (2) tagging the content with metadata so parties know the owner does not consent to training. Because MPA Members’ libraries include thousands of works and promotional/other material, the sheer scale/volume means these proposed opt-out regimes will likely be insufficient/overly burdensome for the copyright owner. Moreover, such solutions likely will not address the problem of infringing content used as training material.
In general, the AI developer, and not copyright owners, should bear the burden of establishing a high-functioning, accessible, and reliable process for copyright owners to opt-out. This is because using others’ works to train AI models benefits the developer of the model. If opt-out is required, it must at a minimum: (1) allow copyright owners to exercise an opt-out covering all of the works they own without the need to specify every work/location where an owned work might be found on the internet; and (2) require the AI developer to ensure the opt-out is honoured, and not require copyright owners to police the model for individual works/excerpts.
3) Copyright Policy Should Support Voluntary Direct Licensing
Copyright legislation should always promote voluntary licensing transactions between copyright owners and prospective users. Any variations from traditional direct individual licenses should be initiated/tailored to the needs of the particular copyright owner industry.
At this time, there is no reason to believe that copyright owners and companies engaged in training generative AI models and systems cannot enter into voluntary licensing agreements, such that government intervention might be necessary. As it relates to certain industries, the emergence of direct voluntary licenses has already occurred because some copyright owners have already entered into licensing agreements with AI companies, e.g., the AI company Bria has a license with Getty Images that gives it rights to the photographs it uses for training. OpenAI has also entered into license agreements with Shutterstock to pay for specialized content and with Associated Press to access the news agency’s archive of stories.
These types of agreements/policies show that market-based solutions, which both respect copyright owners’ rights (and provide creators with market-based compensation) and facilitate the training of generative AI models, are continuing to develop. Voluntary direct licensing is feasible/desirable for different industries and for a variety of rights/uses. Copyright policy should support, not undermine, voluntary direct licensing schemes as they develop in the market. There is no reason to turn to compulsory licensing or extended collective licensing at this time.
4) Transparency and Record Keeping on Materials Used to Train AI Models
MPA-Canada sees the benefits in having those who provide AI services/systems to the public maintain and make available appropriate records regarding the materials used to train their models. These records would allow the public/courts to meaningfully assess the lawfulness/reliability of the developers’ activities. Maintenance of such records may also be required because of anticipated litigation.
MPA’s Members believe the Government should be thoughtful about the context/nuances of any recordkeeping requirements to ensure that policies are narrowly targeted to achieve the desired goal. MPA’s Members appreciate that the questions raised in the consultation are meant to apply only to AI developers rather than copyright owners’ potential use of their own copyrighted works. However, it is important that any suggested transparency/disclosure requirements not be overbroad and not apply to copyright owners’ potential uses of their own copyrighted works, or content routinely generated with their own works with the assistance of AI.
Authorship and Ownership of Works Generated by AI
1) Amendments to the Copyright Act Are Not Needed to Address Questions of Authorship and Ownership of Works Generated by AI
MPA-Canada does not believe amendments to the Copyright Act pertaining to generative AI are necessary or appropriate at this time, including to address questions of authorship and ownership of AI-generated works or AI-assisted works.
Technologies utilizing some form of machine or computational intelligence have existed, and contributed to, the creation of original expression for decades. Recent developments have advanced at a significant pace, but that does not necessarily mean that AI developments will require copyright law to evolve in a dramatically different manner. Moreover, AI-specific changes to copyright law would violate principles of technology neutrality.
Developments in AI, like preceding technological advancements, have a great potential to enhance – not replace – human creativity. MPA’s Members further believe these developments can, and should, co-exist with a copyright system that incentivizes the creation of original expression and protects the rights of copyright owners.
Humans are and will remain at the heart of the creative process, and as such, MPA-Canada believes that, consistent with the Copyright Act, fully machine generated works should not be copyrightable. At the same time, MPA’s Members believe that AI, including potential uses of generative AI as it continues to develop, can be a powerful tool in the hands of human artists and those involved in creating motion pictures to enhance and serve the filmmaking process. MPA-Canada supports a robust copyright system that facilitates and provides incentives to create movies, television programs, and other art forms, including by protecting certain works that human creators make with the assistance of generative AI – in the same way that such principles apply to uses of other technologies that assist creators in realizing their vision.
AI is a tool that can, and does, assist creators in the creative process. Given that reality, creators who use AI as a tool to assist them with their creation of original expression do produce human-authored copyrightable works. Generative AI broadly covers many variations of AI technologies, many of which have been in use for many years and should not raise the copyrightability and authorship issues presented by popular prompt-based tools. MPA’s Members may use AI as a production and post-production tool in the hands of human creators to enhance expressive material.
For example, animators and visual effects artists for decades have used a process called “rotoscoping”, which involves manually altering individual frames within a single shot to align live-action and computer-generated images and is incredibly detail oriented and time consuming. Contemporary visual-effects artists now have sophisticated tools, some of which incorporate AI technology, to assist with this type of work, which frees artists to focus their energies on the creative aspects of the visual effects. AI also helps creators realize their vision and enhance the audience experience by making visual effects more dramatic, realistic, and memorable. Creators can use AI for everything from colour correction, detail sharpening, and de-blurring; to removing unwanted objects from a scene; to more involved work like aging and de-aging an actor; or to adjusting the placement of computer-generated images to make sure everything in a scene flows smoothly and aligns properly. Artists have expressed enthusiasm for AI tools that enhance their work, and for continued technological development of these and similar tools. In short, the use of AI technology presents developing opportunities for creators and their audiences. MPA’s Members are optimistic about that future.
While AI will continue to raise many interesting and important copyright issues, based on their current knowledge of AI technology, including “Generative AI”, MPA Members believe the Copyright Act and case law interpreting it are well-suited to address these issues and courts should generally apply existing copyright principles when addressing legal issues arising out of the use of AI technologies. In particular, MPA-Canada believes the authorship determination should focus broadly on the human author’s creative process and decisions, e.g., how to arrange, select, and position elements of the ultimate work. Focusing on these creative choices ensures that copyright subsists in original works that are derived from the author’s “skill and judgment.” The same reasoning can apply to human uses of generative AI: material they provide to the AI tool (i.e., inputs, like a drawing or photo), refinements, direction, and then use of the output can all involve intellectual and creative contributions inseparable from the ultimate work. Creators can employ generative AI systems as tools to enhance the creative process, just as they have availed themselves of cameras and Adobe Photoshop and received copyright protection for their works.
This point is particularly important for MPA’s Members’ works. Perhaps different from the creation process for literary, visual, or even musical works, the motion picture creation process is exceedingly complex in the number and type of creative contributions, across sometimes thousands of individuals. A large motion picture may include dozens of individuals working in writing and story, the art department, camera and electrical, stunts, sound and music, special effects and visual effects, makeup, animation, costume and wardrobe, production, editing, and more. Many of these individuals contribute creative elements to the ultimate motion picture. They may use AI technologies, including using those that potentially qualify as generative AI, as a tool to enhance materials humans create. These elements are then interwoven into a single motion picture work.
The fact that creators produced some parts of the film with the assistance of AI should not render those portions uncopyrightable. The practical result would be untenable. Take a hypothetical example of a superhero motion picture. The movie might be copyrighted, but would a scene involving AI-assisted visual effects depicting a battle in space receive the same protection? Can a studio protect its rights if the underlying characters and scene script are protectable, but the visual output that involves the AI-assisted effects is not?
While MPA-Canada appreciates that there are calls for additional clarity regarding the authorship and ownership of works generated by AI to create more certainty in the marketplace, there also should not be an exception to the longstanding principles of copyrightability when humans employ AI as a tool in their creative processes. An approach that attempts to isolate the use of AI and treat it differently than other technologies used to support human-driven creativity threatens to impair rather than stimulate and support creativity. The Government of Canada should therefore approach these questions in a thoughtful and careful manner and should resist trying to draw definitive conclusions based on limited experience and information.
Infringement and Liability regarding AI
1) The Existing Liability Provisions Are Sufficient to Address Copyright Infringement
MPA’s Members have an obvious interest in robust copyright protection and enforcement and are acutely aware that AI technologies present new opportunities for third parties to make unauthorized use of MPA Members’ works.
The importance of meaningful copyright protection and enforcement cannot be overstated. MPA’s Members make enormous investments in the creation of their copyrighted works. Copyright rights, including the right to license their exclusive rights to others, provide MPA’s Members the opportunity to recoup those investments and to create new content. Unauthorized (and uncompensated) use of MPA Members’ works undermines this entire process.
MPA Members’ copyrighted works likely have been, or will be, the subject of unauthorized reproductions in the context of training AI models. Likewise, users of AI tools have used, and will continue to use, that technology to create unauthorized copies of MPA Members’ works and infringing works that are substantially similar to MPA Members’ intellectual property. It is imperative that the unauthorized exercise of the rights of MPA’s Members, whether through the use of AI or any other technology, remain subject to meaningful enforcement and remedies.
There are at least two potential scenarios that implicate traditional principles of copyright law: (1) if a pre-existing copyrighted work is copied to train an AI model, and then the AI system outputs a substantially similar copy, that scenario would present a clear case of infringement of the reproduction right; and (2) if AI-generated outputs are then distributed or otherwise made available to third parties, those outputs could then infringe not only the reproduction right, but also other exclusive rights under copyright law, including communication to the public, making available, adaptation, and the authorization rights. Additionally, if copyrighted works are used as inputs to an AI system, it is possible that such use infringes the reproduction right, regardless of the ultimate output.
With that context in mind, and provided exceptions and limitations are not introduced, MPA-Canada is of the view that the existing liability provisions in the Copyright Act establish a general, well-accepted framework for analyzing claims of direct and secondary copyright infringement in the context of new technologies. While a precise determination of which parties are directly and/or secondarily liable will depend on the facts, courts should be able to apply traditional copyright principles to new technologies, given the robust case law in Canada regarding the principles of fair dealing and technological neutrality. As a result, MPA-Canada does not believe that amendments to the existing liability provisions in the Copyright Act to specifically address generative AI are necessary or appropriate at this time.
Comments and Suggestions
MPA members’ use of technology exists against the backdrop of a stable legal regime governing the rights and responsibilities of copyright owners and consumers alike. The Canadian copyright system functions well and has successfully accommodated technological change since its inception. Based on their current knowledge of AI technology, including “Generative AI”, MPA’s members believe courts should generally apply existing copyright principles when addressing legal issues arising out of the use of AI technologies and that there should be no blanket exemption from the Copyright Act’s requirements just because AI is involved. In ensuring a careful and considered approach to AI and copyright, and avoiding disruption of emerging markets, there are no grounds to deviate from fundamental copyright principles.
Music Canada
Technical Evidence
1. Music Canada is the trade association for Canada’s major labels: Sony Music Entertainment Canada, Universal Music Canada and Warner Music Canada. Our members help Canadian artists reach their creative goals, engage with fans around the globe, and help them find commercial success. We welcome the opportunity to engage in these Consultations.
Our members constantly invest in and adopt new technologies and innovations, and work with artists to develop and use new tools to advance the creative process. This includes working with AI technologies, from the use of machine learning to better understand user behaviour and preferences, to systems that assist in the creative process. Tools driven by AI play an increasingly important role in the artistic process in the music sector -- including in audio mixing, automated mastering, computer-aided orchestration, and elsewhere. AI is a tool that can help unlock an artist’s creativity or find efficiencies in processes, but it is not a substitute for the human element of creativity.
While these submissions will delve deeper into the specific questions posed by these consultations, the principles of the Human Artistry Campaign (humanartistrycampaign.com) are the fundamental guideposts. We, alongside hundreds of organisations from around the world (including many from Canada) representing music, literary, sports, entertainment, and other sectors have agreed to these principles. We urge policymaker and AI developers to take into account these principles:
(i) technology has long empowered human expression, and AI will be no different;
(ii) human created works will continue to play an essential role in our lives;
(iii) use of copyrighted works and the use of voices and likenesses of professional performers requires authorization and free-market licensing from all rightsholders;
(iv) governments should not create new copyright or other IP exemptions that allow AI developers to exploit creations without permission or compensation;
(v) copyright should only protect the unique value of human intellectual creativity;
(vi) trustworthiness and transparency are essential to the success of AI and protection of creators; and
(vii) creators’ interests must be represented in policy making.
2.1 Government should be attuned to how systems are trained and how they obtain those works
While sometimes it might look like it, generative AI is not magic. Its outputs are only as good as the creative works it’s trained on. These systems are not able to generate vocal clones that sound like Drake or produce images that look like the works of Norval Morrisseau if it had not copied and retained a reproduction of the artist’s voice or the painter’s painting in some digital form. The fact that AI models can produce identical copies of copyrighted lyrics or copyright images make this point clear. For example, a recent lawsuit by The New York Times includes exhibits to underscore that OpenAI and Microsoft’s LLM’s which power ChatGPT “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style".
We caution the government to pay close attention to the language used by developers in explaining what works their systems train on or how their systems learn. From observing consultations in other jurisdictions, we have noticed a tendency of developers to adopt language which characterises their ingestion and use of works as falling outside the legal understanding of a “reproduction” or language which suggests that a statutory exemption to copyright is in play. They may use words like “learning,” “migration,” “memorization,” or “simulations,” instead of words that describe what their systems do to learn: make “copies” and “reproductions.” They will talk about the speed at which their systems learn and they will point to different digital formats to try to escape the traditional understanding of copyright and how infringements arise. They often use phrases like “publicly available” (which is entirely different from being in the public domain), or assert that their systems use works to merely “educate” the model or drive “research” for further AI systems. In our view, these are attempts to get ahead of copyright infringement suits -- where they will argue that they do not need to license the reproduction, ingestion, and use of the copyrighted works that are essential in training their models.
We also recommend that the government pay attention to how sources of works used to train systems are described. Generative AI systems are typically trained on multiple sources, often from websites whose terms of service forbid scraping or reproducing their information -- and without authorization from the copyright owner. We have noted that developers will often not disclose their sources or if they do, they do it in nonspecific and vague terms. For example, OpenAI claims that it acquires data to train ChaptGPT from “three sources of information: (1) information that is publicly available on the internet, (2) information that [they] license from third parties, and (3) information that [their] users or our human trainers provide.”
It is our understanding that generative AI models typically ingest vast sums of information from the internet. And because low quality inputs lead to low quality outputs, they will scrape the web for professional content -- such as high quality images, music, and text. In order to make the most accurate impersonation of an artist or an artist’s work, they are motivated to scrape the copyright work from the highest quality sources, like digital streaming platforms.
Typically, the commercial relationship between rightsholders and digital platforms will include terms which state that their intellectual property can only be accessed by a user for personal and non-commercial use. But generative AI training accesses these works and makes productions outside of what is permitted under these licences. Generative AI systems will also access pirated sources -- but taking content from pirated sources is still stealing and harms the underlying work and creator.
Just because a system is trained on third party data sets which are purportedly open does not mean that those sets permit licensing for AI training purposes. There are wide reports of datasets that are licensed for non-commercial, research, or scientific purposes -- and yet they are used for obviously commercial purposes in subscription based AI models.
Some developers will claim that they only train on “ethical sources,” such as public domain content, stock libraries, etc. -- but in reality, this is often merely their own mistaken belief. Datasets often contain copyright protected works and they are not licensed for AI training. Just because something is “publicly available” on the internet does not mean that it is in the public domain, free of copyright, or that it need not be licensed.
We urge the government to not be lulled by this type of language designed to deflect from the provenance of data and how it is used by models and systems. AI developers and the lucrative industry that has built around them should be held to the same standard as other industries. Their business model should not be built on the backs of creators without their consent, compensation, or credit.
2. 2 Not all AI Developers Ignore Creator Rights and Licensed Markets are Emerging
In spite of the concerns flagged above, markets where content is properly licensed and creators are fully compensated are emerging, and government can play a role in helping ensure that these markets continue to develop and flourish.
The major recording labels are leading the way in the market based partnerships where ethical AI is a tool and artists are kept at the centre of the initiatives:
--The launch of YouTube’s AI incubator in partnership with Universal Music Group which brings together a group of leading UMG artists, songwriters and producers, Google DeepMind technologists, and YouTube and UMG experts to explore AI-related musical tools and products. This project is predicated on human creativity and takes into account the interests of creators and copyright holders.
--Sony Music Entertainment’s debut project with David Gilmour (of Pink Floyd), Legacy Recordings, and Vermillio, where the artist invites fans to remix music from the album Metallic Spheres and its cover art, using AI-powered tools (Vermillio is a generative AI platform that empowers creators and protects their work using authenticated AI);
--Warner Music Group’s use of AI technology, with the consent of the estate of legendary French artist Edith Piaf, to recreate the artist’s voice and image for an animated biopic. This project uses AI technology trained on hundreds of voice clips and images spanning decades to help capture her story and impact authentically.
Other industries are similarly making collaborative progress. For example, Getty Images has launched Generative AI by Getty Images, which trains its library on licensed photos and compensates the artists whose images were trained. Shutterstock has licensed its image, video, and music libraries to OpenAI for training and established a Contributor Fund that compensates artists for their creative inputs. And other leading tech companies are reportedly negotiating licensing arrangements with rightsholders. In July, The Associated Press reportedly reached a deal with Open AI to license its archive of news stories. A voluntary market which allows for AI innovation and compensation to creators is emerging and growing.
If the Government wishes to see these types of marketplaces develop and thrive, ones where creators receive credit and compensation, and retain consent and control, while AI developers continue to create exciting and world changing technology, then we must ensure that t existing intellectual property protections remain robust. How to do that is discussed through these submissions.
Text and Data Mining
3.1 Ingestion of copyright works and sound recordings to train systems constitutes infringement
Generative AI can only generate what it has already seen and learned from, and in order for it to produce quality outputs, it needs quality inputs. In many cases, those inputs are sound recordings found on digital platforms. Rightsholders to those sound recordings must be able to choose whether to grant or refuse permission for their use. The IP rights engaged in the use of training AI systems with sound recordings is not uncertain. It engages a copyright owner’s exclusive rights, which are the building blocks of a fair and competitive market.
While the technology behind AI is new, using recorded music to make new music is not new. The tech of sound recording reproduction has evolved from analog digital mixing, sampling, and encoding techniques. Generative AI systems built on sound recordings is just another use and too must be licensed.
3.2 Exceptions to copyright infringement must be decided on a case-by-case basis
Some AI developers will argue that copies made during ingestion are “ephemeral” and not subject to copyright. In practice, copies are made throughout the training process: first in compiling and cleaning datasets, and then in model and development fine tuning. It’s often necessary to keep copies of the dataset on hand through different iterations of the process. All of these are unauthorised reproductions.
Some will argue that the use of ingestion copies is fair dealing. In Canada, fair dealing is fact specific and determined on a case-by-case basis. But unauthorised reproduction of works by AI developers to develop models that produce AI-generated works that compete with the input works is presumptively unfair. There is a market ready and willing to license these uses. Each of these factors indicates that this copying is presumptively unfair. But ultimately, whether an infringement is permitted by fair dealing must be assessed on a case-by-case basis.
3.3 “Opt out” systems are ineffective and place an impossible burden artists and rightsholders
AI developers may also advocate for “opt out” systems, whereby copyright owners would have to take positive action to “opt out” of their works being used by systems and models. Such an approach would fly in the face of the foundation of copyright, which is a permissions based, “opt-in” regime. Copyright owners have certain exclusive rights set by the Act.
Opt-out regimes would place undue burdens on artists and creative rightsholders. It is effectively a game of whack-a-mole, where creators are required to patrol AI datasets and models, most of which lack the requisite transparency and accountability to even begin taking steps for enforcement. Opt-outs burden the wrong party.
3.4 TDM activities can and should be licensed by rightsholders
The recording industry is ready and willing to license the use of their recordings for compensation that matches the value of the use of their works. It has the infrastructure and proven track record for licensing large catalogues of music.
The music industry has built a marketplace where massive amounts of copyrighted works, performances, and sound recordings are licensed every day to a variety of platforms and technologies. Record labels routinely and directly license new technologies and evolving uses of music – including for sampling in other recordings, for sync licenses that permit use in TV shows, films, and in advertising. They also license recordings to social media and UGC sites (e.g, YouTube, Snapchat, Instagram, TikTok), fitness services (Apple Fitness+, Peloton), and other emerging and evolving business models. There is no need for compulsory collective licensing to license these works.
3.5 Compensation for TDM activities must be determined in a competitive marketplace
Compensation for TDM activities must be determined in a free and competitive marketplace. Policy choices that remove competition or distort the market – such as compulsory licences, collective licensing through government intervention, or exceptions without proof of market failure – must be avoided.
The recorded music marketplace has a proven track record of negotiating compensation for different uses of music. The majority of record industry revenues are determined in the marketplace. Organisations and individual rightsholders can license on a collective basis, provided that such collective licensing is done without compulsion or intervention and consequently reflects a fair and voluntary marketplace.
3.6 Challenges in licensing TDM activities emanate from AI developers, not creative rightsholders
Rightsholders face challenges in licensing their works for TDM activities primarily because AI developers have built models and systems trained on their copyright without seeking permission from rightsholders. Some AI developers operate under the mistaken belief that all uses are exempt from copyright liability. Currently, there is almost no way for rightsholders to know if and how their content was used. This makes it extremely difficult for copyright owners to exercise agency over how their works are used and whether they receive compensation or credit.
Some AI developers argue that their systems are trained on such vast quantities and sources of text and data that they could not be burdened with licensing those works. But everyday, millions of sound recordings and copyright works containing multiple rights are licensed and they have metadata which allows their licensing and compensation to be easily administered and tracked. This is no different.
Some AI developers argue that the economic requirements of licensing will harm AI innovation. This type of thinking nearly wiped out the music industry 15 years ago, but fortunately licensed streaming services are driving revenues back to creators as music has commercial value again. AI developers’ reluctance to compensate rightsholders to train their systems on high quality recordings does not justify the creative industries subsidising them.
Some developers argue that the value of a license for training owed to an individual artist would be so small that creators are not likely to benefit. Even if negotiated marketplace rates were individually small, there is no justification for why AI companies should get to keep that compensation.
3.7 Approach in other markets
This past year, the UK rejected plans for a broad TDM exception. The Parliamentary DCMS committee recommended that “the Government … not pursue plans for a broad text and data mining exemption to copyright,” and that “[it] should support the continuance of a strong copyright regime in the UK and be clear that licences are required to use copyrighted content in AI. . . . [T]he Government should act to ensure that creators are well rewarded in the copyright regime.” In January of 2024, the UK Government response to that report stated that: “the reproduction of copyright-protected works by AI will infringe copyright, unless permitted under licence or an exception. …[T]he Government instead committed to develop a code of practice on copyright and AI, to enable the AI and creative sectors to grow in partnership. This supports the Government’s ambition to make the UK a world leader in research and AI innovation, while ensuring that the UK copyright framework continues to promote and reward investment in creativity.”
While Japan has implemented a TDM exception, there are clear guardrails to ensure that it does not enable or permit the free and unfettered use of copyright-protected content to develop AI models and to ensure Japan’s compliance with its international treaty obligations. In particular, the use of copyright-protected content is not permitted where the use would “unreasonably prejudice the interests of the copyright owner.” The plain text of the provision and the three-step test enshrined in international treaties make clear that views that the exception permits the wholesale ingestion of copyright works and other protected subject matter for the development or training of an AI system are mistaken.
The EU has two provisions for TDM exemptions. Article 3 provides a TDM exemption for scientific research conducted by non-commercial organisations or cultural heritage institutions. Article 4 provides for TDM exemptions for commercial uses, subject to an opt-out provision requiring a machine-readable opt-out request from the rightsholder. Both of these exemptions apply only where there was lawful access to the copyrighted works in question.
Canada should continue to work with its international counterparts to promote policies that protect copyright and creators internationally. Canada should oppose broad and novel copyright exemptions that would violate international treaty obligations.
3.8 Developers should be required to maintain records of training and ingestion copies
AI developers and creators of training datasets should be required to collect, retain, and disclose records regarding the material used to train their models. Absent such requirements, it is virtually impossible for rightsholders to discern how their works were used in the development and operation of these systems. In order to ensure the safety of these systems, and to have functioning copyright frameworks, AI systems must be accountable for the works they ingest and are trained on.
Complete recordkeeping of copyright works, including how they are used to train AI systems and to generate outputs is essential. The government should require the maintenance of auditable records, as many AI companies are not currently voluntarily providing sufficient information about the data they have used. Similarly, AI developers and deployers who use third party training datasets or pre-trained models should obtain, keep, and make available necessary data from upstream sources. [Please see pdf of submission for full recording keeping specifics]
Authorship and Ownership of Works Generated by AI
Given that the technology for training AI models at this time necessarily requires making copies, we are not seeking changes to the Copyright Act with respect to authorship or ownership. However, that position may change depending on how the technology and methods of ingestion evolve. Copyright owners may need further clarity to confirm that ingestion of copyright is the exclusive right of the copyright owner for them to exercise or authorise others to do so.
4.1 Outputs created with the use of copyright works and sound recordings can be infringing
Ingestion copies used to develop AI models that generate audio outputs are just a new method to process sound recordings. The output is not a recording of a new performance. While AI systems process sound recordings in a more sophisticated and obscure manner compared to analog means, it is still a type of remixing of the copyright owner’s sounds and as such can be infringing depending on the circumstances.
In addition to infringement of rightsholders’ exclusive rights through use of inputs, outputs may also be infringing. When AI outputs are identical to pre-existing sound recordings or when the outputs include identifiable portions of pre-existing sound recordings, the exclusive right of the copyright owner in the sound recording is engaged. Even where the human ear can’t identify the full or partial copies of the pre-existing recordings in the outputs, the outputs can still be infringing.
4.2 Works generated solely by AI without any human involvement do not warrant copyright protection or new forms of protections
Copyright protection exists to incentivize and reward human creativity, skill, and judgement, and as such, it should not be awarded to content that is generated without any human creativity. Human artistry provides a unique value and remains fundamental to the creation of creative works such as music. Extending copyright to generative AI outputs created without any human authorship risks devaluing human creators and flooding the marketplace with machine-made content, making it harder for consumers to find and support the human-made works from artists they love. Large volumes of AI-generated works are already being dumped into the marketplace. Moreover, this puts human artists at risk of infringing the myriad of works made instantaneously by machines.
AI-generated music is already diverting the flow of royalties and fan engagement away from human creators. Our industry is working to combat AI-generated tracks that are followed by bots designed to create “fake listens” on streaming platforms, a method of automatized streaming manipulation that siphons royalties away from legitimate creators. Our members are working with digital platforms and others to prevent this issue, but it is another example of resources being expended by creators to combat those who game systems at the expense of human creators.
There are currently more than adequate incentives to encourage the continued development of generative AI, as demonstrated by the extraordinary amount of investment in AI technologies and the companies behind them. Such companies already benefit from patent law, copyright in computer code, and other incentives that spur such extraordinary investments. Fundamental changes in copyright law or new sui generis regimes protecting AI generated works are demonstrably unnecessary.
Infringement and Liability regarding AI
5.1 Concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright
Our copyright framework is generally sufficient to address the copyright related issues that have arisen to date in connection with generative AI. That said, as the technology continues to evolve, the Copyright Act may need to be amended to ensure that copyright owners retain exclusive rights over the ingestion of their copyrighted works or subject matter and authorise others to do the same.
At present, the most significant impediment for rightsholders to determine whether AI-generated work or systems infringe their copyright is the lack of information on when and how their intellectual property is copied and used to train the AI system and models. AI developers must be required to keep detailed records regarding training inputs and other information (as more fully discussed in our submission on TDM issues). Any such measures should also include appropriate disclosure requirements and penalties for non-compliance.
These requirements are not only necessary for transparency as an end goal itself, but are also vital to promoting accountability, facilitating licensing, and enabling effective enforcement which helps to maintain the incentives for creating, distributing, and marketing the new sound recordings and other creative works that Canadians enjoy.
Moreover, record keeping would help the government better understand the potential biases of systems or how other essential rights may have been infringed, such as privacy rights of individuals and whether these systems are safe.
5.2 Need for Labelling of Generative Outputs
Works generated solely with AI should be labelled and identified as such, as well as any works that are substantially modified with AI to mimic a sound recording artists’ name, image, voice, or likeness without authorization. Labelling is particularly important where the risk of deception is high or a human is being impersonated or mimicked. The public interest warrants consumers understanding what is real and created with the involvement of a human or what is made purely by machine. This labelling should be tamper resistant and include digital watermarking and metadata identification.
5.3 Government must be mindful of the risk of AI laundering
Canada, alongside its other major trading partners, has the opportunity to set policies that avoid an international “race to the bottom.” AI developers should not be able to import AI models into Canada that were trained on infringing ingestion copies in a jurisdiction where that training was not prohibited. AI companies should also not be able to claim that their training met a fair dealing exception, but then sell access to those systems to companies who will use them for presumptively unfair commercial purposes.
The EU AI Act has taken a leadership position on this issue, stating that the Act applies to any model placed in or put into service in the EU, regardless of whether they were developed and/or trained in- or outside the EU.
International consistency will help protect the human element of creativity and create a marketplace where ethical AI can flourish. Canada has an opportunity to be a leader on this front.
5.4 Clarity on where liability lies when AI-generated works infringe existing copyright protected works
Who is liable for copyright infringement is, as always, determined on a case-by-case basis. However, anyone who reproduces a substantial part of a copyrighted sound recording, communicates it to the public by telecommunication, or publicly performs it is an infringer. Copyright enforcement regularly contemplates multiple infringers depending on the facts, and in the case of AI, it could include the model developer, the system developer, and the users of the system which generates infringing outputs.
5.5. Existing legal protections for rights of publicity must be modernised to protect
Canadians from harmful deepfakes and vocal clones
Generative AI has made such significant advancements that very real “deepfakes” and vocal clones of artists, athletes, politicians, celebrities, and everyday individuals can be made with little effort and with such close mimicry that it can be hard to tell whether it’s real or not.
For recording artists, unauthorised deepfakes steal and manipulate their voice and image without their knowledge or consent. They create unfair competition and threaten their reputation. An artist’s voice and image is their livelihood. When an artist’s voice or visual likeness is cloned, they are robbed of their identity, reputation, and relationship with their fans, who don’t know if the images they see or the music they hear is authentic or simply AI-generated. Athletes, celebrities, and public figures face similar threats.
But while the music industry’s concern centres on protecting artists’ reputations and careers from such theft and manipulation, similar digital assaults on government representatives and politicians raise potentially grave national harms. Beyond the theft and manipulation of their voice and image lurk serious risks to national security, impacts on our markets, and threats to our democratic institutions. And everyday Canadians are also susceptible to having their voice and image clones by bad actors looking to exploit them online.
Canada’s current privacy and publicity laws which offer some shields to these harms were designed for a time when we were worried about a celebrity’s image being used without their permission to promote a product in a magazine. But today, misappropriating one’s voice or visual likeness through AI-generated deepfakes is much more invasive and convincing, and can be carried out on a massive scale, around the world, in a matter of minutes. Virtually anyone can do it at any time – with publicly available code and common hardware. Once one discovers the deepfake of their face or voice, they face a long and slow process in court with an unpredictable and difficult outcome. By then, the damage has been done.
The government has the opportunity to harmonise and solidify existing piecemeal provincial and common law protections by legislating the following:
(i) A federal publicity right so that Canadians can protect how their voice and visual likeness is used. The right can be licensed or assigned where and how they choose. And there can be reasonable limits on this prohibition (e.g., where the use is essential to news, public affairs, parody, satire or other expressly limited purposes).
(ii) Create a cause of action restricting not only the unauthorised deepfakes themselves but also the creation and dissemination of AI systems and models whose main purpose or function is to create misleading deepfakes or vocal clones.
Opponents to a modernised publicity right may argue that such guardrails will stifle innovation or chill free speech. But bad actors putting words or messages in the voice of a human artist or politician against their will or without their knowledge or consent has nothing to do with innovation. And it’s not an exercise in free expression to steal someone else’s face or voice and make them say what you want. Digital disclaimers or labelling will not be enough to reverse the large majority of harms caused by these misappropriations.
Interestingly, in response to the US Copyright Office’s Notice of Inquiry (“NOI”) of August 30, 2023, there was even a consensus among a number of AI tech companies who filed responses to the Inquiry that a new US federal right of publicity ensuring that individuals can control their voice and likeness is necessary to address the unique threats of generative AI.
To allow the positive uses of this technology to continue, deepfakes or vocal clones created with an individual’s consent would continue to be permissible. For example, individuals who wish to preserve their image and voice in the case of medical situations, or who wish to license their image, name or likeness for specific purposes, will still be able to do so where and how they choose.
We encourage the government to closely examine the motives of organizations who seek permissionless “innovation” at the expense of others.
Conclusion
Please note that our submission has also been provided to department officials in pdf form as it contains extensive footnotes citing secondary materials. That pdf delivers the same comments as provided in this online form.
We appreciate the opportunity to contribute to the government’s consultations on generative AI and copyright. We welcome any questions or requests for further information.
Comments and Suggestions
[Please note that we have provided to department officials a pdf of our submissions which includes full footnotes with secondary sources supporting our comments]
We applaud the government’s work to help ensure Canada is a leader in responsible AI and we welcome the opportunity to engage in these Consultations.
Our major label members constantly invest in and adopt new technologies and innovations, and work with artists to develop and use new tools to advance the creative process. This includes working with AI technologies, from the use of machine learning to better understand user behaviour and preferences, to systems that assist in the creative process. Tools driven by AI play an increasingly important role in the artistic process in the music sector -- including in audio mixing, automated mastering, computer-aided orchestration, and elsewhere. AI is a tool that can help unlock an artist’s creativity or find efficiencies in processes, but it is not a substitute for the human element of creativity.
But because music is widely and easily accessible in ways that other works are not, we have also been early targets of unauthorized reproduction of and training on the copyright works and recordings of music labels, publishers, and songwriters, and on the performances of recording artists and their voices. While licensed music services have been employing and continue to develop different technologies to protect their platforms from unauthorized access, web scrapers still manage to bypass these blocks in order to reproduce and train their systems on the music of creators. Music creators have been among the first in the creative industries to feel the harms of generative AI that does not respect their intellectual property and right to protect their agency, voice, and image.
The fake Drake and the Weeknd track that was released last spring made one thing abundantly clear: AI models and systems have already ingested massive amounts of proprietary datasets without authorization from the source of the data or rightsholders. While developers will argue that this is essential to innovation, or that it’s fair because it’s for “research” or to “educate” AI systems -- there is nothing “fair” for the artists whose voices were cloned or their images taken without their permission.
But with the risks of AI, there is also great promise, and it is not too late to set the course for ethical and innovative AI markets. Major labels are working with companies who are willing to respect the rights of artists and creators -- and they’re finding exciting ways to bring world-changing AI to consumers and new tools to help artists achieve career and creative goals. Markets where creative works are licensed and where creators receive commensurate compensation are developing. The government has the opportunity to set policies that encourage the ethical development of this exciting new marketplace, which if managed properly, will elevate our creative industries while also developing AI tools capable of solving some of the world’s most pressing challenges. Innovation in AI and a thriving creative industry can be mutually enriching, rather than mutually exclusive.
Progress in AI innovation and adequate copyright protection that fuel the creative industries are not mutually exclusive. On the contrary, AI processes that depend upon the “input” of protected works or subject matter derive their purpose and value from those works or subject matter. Accordingly, a reduction in the protection of works or sound recordings (for example by broadening or introducing new exceptions to copyright), would in turn reduce incentives for their creation, ultimately harming innovation and investment in this area. Supporting thriving creative sectors through adequate legal frameworks should be a central pillar of any policy aimed at stimulating developments in AI. In short, our members must retain control over whether and how their recordings are used in AI processes.
As the Government develops policies around generative AI and copyright, we urge it to ensure that artists and the businesses that invest in them are incentivized to continue producing creative works that are essential to the human experience and the quality of our lives.
Thank you again for the opportunity to participate in these consultations.
Music Publishers Canada
Technical Evidence
1. INTRODUCTION
Music Publishers Canada (“MPC”) is a membership-based organization that ensures the views of music publishers working in Canada are heard. It is our mission to create business opportunities for our members and to promote their interests and those of their songwriting partners through advocacy, communication, and education. Music publishers invest in thousands of Canadian songs and songwriters that are heard daily on the radio, on streaming services, in video games, and in film, television, and other screen-based productions around the world.
MPC recognizes that artificial intelligence has the potential to be enormously beneficial when it is implemented in a responsible and ethical manner, and MPC embraces that potential. In the music space, AI has the potential to support the valuable work of human creators, which in turn enriches Canadian culture and society. Our members are already exploring the benefits of this new technology.
However, the astonishing rate of both acquisition (or sometimes appropriation) of copyright-protected datasets and content on the input side, together with the development of generative AI models on the output side, pose serious risks for Canada’s creators and the companies that invest in them.
Copyright is the key protection that allows MPC’s members to control and be paid for the use of their music. Copyright ensures that our members can share in the value if a third party wishes to use their music and ensure that the value is not appropriated solely by the user.
When an AI company uses music that has been scraped from the Internet without authorization, whether for training or other purposes, it prevents rights holders from controlling and realizing value for the use of their works. It also contributes to the destruction of a developing market for the licensing of copyright-protected content to AI developers before the market can flourish. Further, the development and commercialization of unlicensed AI model inputs and generative AI products can—and, in many ways, already are—creating serious market distortions, raising concerns about fair competition.
Human creation and expression, and their contributions to Canadian culture, must not be sacrificed at the altar of rapid technological progress. To strike the appropriate balance, Canada must approach generative AI in ways that respect creators and copyright and incentivize human expression. AI companies, like other commercial users, require permission from copyright owners to use copyright-protected content through negotiated licences.
The development of public policy surrounding AI is in its infancy. That gives the Government an important opportunity to lead the world in maintaining strong respect for copyright and the rights of creators.
2. THE USE OF AI IN THE MUSIC INDUSTRY
MPC’s members embrace technological change and invest in innovation. We know that there are already uses of AI in the music space. For example, AI is used to analyze and predict the audiences for an artist’s music and to identify and target emerging artists. AI technology has also long played a role in the recording studio, including automation tools that can augment human-mixed recordings or even assist in the process of creating brand-new audio mixes.
But a new market is also developing for the licensing of music and other works to be used as training materials for generative AI models. Reported examples of the licensed use of copyright-protected content by AI companies include the following:
a) Meta’s MusicGen tool was trained on 20,000 hours of licensed music from ShutterStock and Pond5 [https://techcrunch.com/2023/06/12/meta-open-sources-an-ai-powered-music-generator];
b) Stability AI’s new generative audio model, Stable Audio, was trained on a dataset provided under a licensing deal with a provider of stock music [https://stability.ai/stable-audio];
c) Universal Music Group (UMG) announced a collaboration with YouTube to experiment with generative AI tools predicated on human creativity and that account for the important interests of creators and copyright holders [https://www.latimes.com/business/technology/story/2023-08-21/youtube-universal-music-ai-artificial-intelligence];
d) UMG has also announced collaborations with generative AI developers to explore how their technology can promote and enhance the creative process, such as with Endel, an AI tool that allows artists to generate content from their own sound recordings, and with Bandlab, the world’s largest social music creation platform;
e) The generative AI imagery used during U2’s new live show at the Spherem reportedly uses legally-ingested artwork by Es Devlin to spectacular effect.
As we discuss in more detail in the text and data mining section of this submission, large-scale licensing of copyright-protected works can be practicable and effective.
3. AI DEVELOPERS USE EXISTING WORKS AS TRAINING MATERIAL WITHOUT AUTHORIZATION
AI developers obtain training material from multiple sources, including by scraping vast troves of content from the Internet and via pre-existing data sets. In many cases, the content is obtained without authorization from the rights holders and the AI developer does not disclose the source of the content.
This poses unique challenges in relation to musical works, which can be taken from the Internet in a variety of formats, including digital audio and audiovisual files containing musical works, song lyrics in text format, musical tablature and sheet music, MIDI files, and more.
Web-scraping occurs on a vast and indiscriminate scale and inevitably yields vast amounts of copyright-protected content. Scraping typically occurs without authorization of rights holders. In fact, it can be performed on webpages that, themselves, host and make available pirated content.
For example, it is reported that the training set used to train Google’s T5, Meta’s LLaMA, and other generative AI models, was trained using scraped content, including a subscription-based digital library, Scribd [https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/]. OpenAI, a leader in generative AI, has also stated that its experience training AI models has involved “the use of large, publicly available datasets that include copyrighted works” [https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf].
AI developers may also acquire training material from datasets collected and made available by third parties, some of which contain copyright-protected content obtained through large-scale web-scraping. A notable example is the Common Crawl dataset [https://commoncrawl.org/], which is a publicly available collection of large-scale web data used as the primary training corpus for most major LLMs. OpenAI has reported that 60% of the training data for its GPT-3 model was drawn from Common Crawl [https://arxiv.org/pdf/2005.14165.pdf].
From a copyright perspective, copying works to create a dataset, making the dataset available to be viewed and downloaded on the Internet, and exploiting the dataset to train, re-train, and fine-tune an AI model are each a separate and distinct exercise of a protected right.
4. LICENSING AND TRANSPARENCY CAN MITIGATE LIABILITY
The Consultation asks, “What measures are taken to mitigate liability risks regarding AI-generated content infringing existing copyright-protected works?”
The best and most appropriate way to mitigate liability risk is for AI developers and dataset aggregators to obtain prior permission to exploit rights holders’ works, in accordance with Canadian copyright law and policy. The Government can incentivize that in two important ways.
First, a market is already developing for the licensing of music to AI companies. The growth and maturation of that market ought to be encouraged. Licensing large catalogues of music, including to new and disruptive technology companies, is what music publishers, copyright collective societies, and other rights holders do. MPC’s members, and the collective societies that represent them, have extensive experience negotiating bespoke licence agreements for the use of their repertoires by technology companies. MPC implores the Government to permit the nascent market to develop and flourish, and not to eradicate it by introducing new or modified copyright exceptions for text and data mining or other AI activities.
Second, AI developers and data aggregators involved in any stage of training or testing AI models should also be required to disclose the dataset used to train or test the models and maintain complete and detailed records of that data. This requirement would ensure transparency and promote a functional licensing market, disincentivize unauthorized use of copyright-protected works, and restore the appropriate copyright balance by enabling rights holders to be compensated for the use of their works and to pursue enforcement options against infringers.
Certain market developments underscore the importance of ensuring that AI companies comply with existing copyright laws and norms. OpenAI, which as noted above has acknowledged training its models using vast amounts of protected content, is receiving massive investment from Microsoft, reportedly in the amount of $10 billion. In turn, Microsoft, Google, and other deployers of AI products have announced a commitment to indemnify users of certain of their AI products, if the users are sued for copyright infringement in connection with the use of those products. [https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/]. These developments, combined with the prospect of AI companies seeking to perform TDM activities in territories with the weakest copyright protections, suggest that, left unchecked, AI companies will continue to seek a competitive advantage built on a rampant neglect for the rights of creators and rights holders.
Text and Data Mining
1. TDM SHOULD BE LICENSED, NOT EXEMPT
The Copyright Act is intended to achieve a balance between promoting the public interest in the encouragement and dissemination of original works and obtaining a just reward for the creator or, more accurately, to prevent someone other than the creator from appropriating whatever benefits may be generated [Théberge v Galerie d’Art du Petit Champlain inc, 2002 SCC 34 at para 30].
This balance requires that AI developers obtain permission and pay for the use of copyright-protected works as AI training materials. TDM activities should not be given special status by introducing new copyright exceptions or modifying existing ones. In fact, a TDM exception would likely put Canada in breach of its international treaty obligations.
Copyright works add value to the AI training process. There is no legal or factual reason to allow AI developers to appropriate that value exclusively to themselves, especially by scraping online content on a vast and indiscriminate scale. To derive fair value for the use of their repertoires, music publishers routinely grant licences to technology companies. AI developers should be no different. The nascent market for licensing music to AI developers should be encouraged, including by requiring AI companies to disclose, and maintain records of, all their training data.
Finally, MPC urges the Government to reject any suggestion that TDM should engage a mere right of remuneration or be subject to an opt-out model.
2. NO NEW EXCEPTION FOR TDM
To maintain the proper copyright balance, the Government must reject calls for a categorical copyright exception for TDM. Rights holders must be able to control, and realize value for, the use of their works as AI training material, in accordance with Canadian copyright law and policy. Indeed, the ability to grant licences is central to the livelihood of creators and rights holders, and it is “a hallmark of copyright” [Euro-Excellence Inc v Kraft Canada Inc, 2007 SCC 37 at para 117].
Copyright content is a particularly valuable form of training material for an AI model. The quality of an AI model’s output is proportional to the quality and quantity of its training materials. Copyright works are the products of human skill and judgment and the investment of time and resources. As such, many copyright works feature qualities that make them particularly valuable for use as AI training materials: nuance, richness, contemporary relevance, reduced “noise”, integrity, reliability, and formatting consistency. All of this helps AI models turn out high-quality content that will attract and retain users. OpenAI has acknowledged that, if protected works are not used for training purposes, it would “lead to significant reductions in model quality.” [https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf, at footnote 33.]
Rights holders must be entitled to control the use of their works as AI training material and share in the value created when they are used. That is best achieved through copyright protection and licensing.
Calls for a new copyright exception for TDM must be rejected. A new exception would eliminate the nascent licensing market before it can flourish and deprive rights holders of value for the use of their works as AI training material. An exception would appropriate the entirety of that value for the benefit of AI developers and data aggregators.
A categorical TDM exception would also violate the three-step test of the Berne Convention, violating Canada’s international treaty obligations. This test limits permissible exceptions to certain special cases that do not conflict with a normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the author” [Berne Convention for the Protection of Literary and Artistic Works (1979) at art 9(2)]. A categorical exception for TDM would not be limited to “special cases”. It would also eradicate the developing market for licensing works as AI training material, thus interfering with the normal exploitation of works and prejudicing the legitimate interests of rights holders.
Affording special status to TDM would also violate the principle of technological neutrality, which requires that copyright law operate consistently, and not favour or discriminate against any particular form of technology [Canadian Broadcasting Corp v SODRAC 2003 Inc, 2015 SCC 57, para 66].
The Consultation Paper notes that the existing exceptions for fair dealing and temporary reproductions for technological processes could potentially apply to TDM. While it is doubtful that these exceptions would apply to TDM, due in part to the application of the Berne Convention three-step test, the potential application of these exceptions is highly fact-dependent. It would not be prudent to attempt to address the potential application of these exceptions to TDM on a general or presumptive basis. That is a matter best addressed by the courts.
3. LICENSING MUSIC IS NOT AN INSURMOUNTABLE CHALLENGE
The music business is a licensing business. Any argument that licensing is impractical due to the quantity of data involved must be rejected.
Rights holders are experienced in licensing and administering large catalogues of works, including to technology companies. Canadian copyright collective societies like CMRRA and SOCAN process and license billions of lines of music data, or individual performances, each year. Any challenges that might arise are foreseeable and not insurmountable.
It also cannot be assumed that all AI models are trained on the same size of datasets. Introducing a TDM exception based on perceived challenges for AI models that train on massive datasets would destroy licensing markets for other AI training methods, such as the use of smaller and more carefully curated datasets.
In any event, any licensing challenges that may exist cannot justify the eradication of an exclusive right or the deprivation of a rights holder’s ability to realize value for the use of its works.
4. RECORD-KEEPING AND DISCLOSURE IS CRITICAL
Transparency is critical to protect creators and rights holders and to strike an appropriate balance between fostering innovation in new technologies, on one hand, and protecting the legitimate interests and exclusive rights of rights holders, on the other. AI developers and deployers should be required to keep, and make readily available to rights holders, detailed and accurate records of their training data, the sources of that data, and the existence of any licences authorizing its use. Without those obligations, it will be extremely difficult, if not impossible, for rights holders to detect infringement and pursue enforcement options.
Further discussion on this topic can be found in our response to the section of the Consultation addressing infringement and liability.
5. THERE SHOULD BE NO COMPULSORY LICENCE OR OPT OUT SYSTEM FOR TDM
The Consultation Paper asks what “level of remuneration would be appropriate for the use of a given work in TDM activities.” The appropriate level of remuneration should be determined in the developing market for the licensing of copyright-protected works for AI training and other uses, not by the Government.
To be clear, the Government should not entertain any suggestion that would eliminate a rights holder’s exclusive right of reproduction in favour of either a compulsory licensing system or an “opt-out” system. Since a functional market for the licensing of musical works to technology platforms already exists, and is adapting rapidly to the needs of AI companies, there is no reason for the Government to impose either approach.
Compulsory licensing would be an extreme and prejudicial response to a non-existent problem. Among other things, it would (i) deprive rights holders of their right to contract freely in the market, preventing them from assessing and capturing fair value for the use of their works; (ii) prevent rights holders from choosing how their works are used and by whom, forcing them instead to allow the copying of their works for uses they cannot control or anticipate; and (iii) impose significant administrative burdens, including the creation of a needless and complicated infrastructure to administer and enforce the regime. Quashing an exclusive right of reproduction in favour of a right of remuneration would also raise serious concerns under Canada’s international treaty obligations.
An opt-out system would also be antithetical to Canadian copyright law and policy and would drastically shift the copyright balance away from rights holders. In Canada, copyright is an opt-in system: prospective users of protected works must obtain advance permission from the copyright owner, who has an exclusive right to authorize—or refuse to authorize—the use. To depart from these fundamental principles would lead to an unwarranted erosion of copyright protection in Canada. It may also be contrary to the Berne Convention, which prohibits conditioning copyright protection on any formality requirement [Berne Convention, art 5(2)].
An opt-out system would place a disproportionate burden on creators and rights holders. It would require them to investigate and implement affirmative steps to prevent the infringement of their rights and to monitor compliance on a user-by-user basis. Rights holders who lack the legal or technological sophistication or resources to do so would be treated inequitably as unwitting licensors. In addition, because many rights holders do not control the websites on which their works appear, they would be unable to directly access the website’s code to exercise an opt-out right. The same is true for online piracy sites that rights holders might not be aware of.
Finally, an opt-out approach would impose an all-or-nothing assumption on rights holders who may instead be willing to license their works for specific purposes under certain conditions, including fair remuneration.
Authorship and Ownership of Works Generated by AI
Consistent with our submissions to the Government’s “Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things”, MPC submits that Canada’s existing copyright law framework is sufficiently robust and flexible to address issues raised by AI.
The Copyright Act is intended to incentivize human creativity and expression. For example, several provisions of the Copyright Act indicate that an author must be a human: sections 6, 7(1), and 9 link the term of copyright to “the life of the author”, while section 14(1) imposes limitations on an author who is the first owner of the copyright based on “the death of the author”. These provisions suggest that some degree of human involvement is necessary for a work to attract copyright protection.
Courts have confirmed on several occasions that authorship requires human involvement [see, for example, PS Knight Co Ltd v Canadian Standards Association, 2018 FCA 222 at para 147 <https://canlii.ca/t/hwj3l#par147.; Setana Sport Limited v 2049630 Ontario Inc (Verde Minho Tapas & Lounge), 2007 FC 899 at para 4 <https://canlii.ca/t/1sxmt#par4>]. They have also made clear that originality—the sine qua non for copyright protection—requires an author to exercise skill and judgment that is more than a purely mechanical exercise [see CCH Canadian Ltd v Law Society of Upper Canada, 2004 SCC 13 at paras 16, 25 <https://canlii.ca/t/1glp0#par16>]. That too suggests strongly that a human author must be involved in the creation of a protected work.
The determination of copyright authorship and ownership rights related to AI-generated and AI-assisted works is highly fact-dependent. It should be made on a case-by-case basis. In the United States, the courts are now considering several cases that will assist with clarifying the boundaries of copyright law regarding generative AI content.
Infringement and Liability regarding AI
1. THERE ARE SIGNIFICANT BARRIERS TO DETECTING INFRINGEMENT AND ENFORCING RIGHTS
The Consultation Paper asks, “What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?”
As noted in the Consultation Paper, to establish infringement by reproduction, a rights holder must establish that the defendant had access to the original copyrighted work, that the original work was the source of the copy, and that all or a substantial portion of the work was reproduced. A court may infer copying if the defendant had access to a plaintiff’s work and there is substantial similarity between the works [Pyrrha Design Inc v Plum and Posey Inc, 2022 FCA 7, paras 38 & 48 <https://canlii.ca/t/jlqcs#par38>].
Without appropriate transparency, large-scale infringement might go undetected by rights holders. Rights holders only have access to the output of an AI system, from which it is nearly impossible to identify what copyrighted works were used in the dataset used to train the AI system. Even if a rights holder suspects infringement, it would be equally difficult, if not impossible, for a rights holder to establish that its work was used to train the AI model. This would create significant barriers to the rights holder’s ability to obtain a remedy—and a right without a remedy is no right at all.
Thus, full transparency, including robust record-keeping and disclosure obligations, is necessary for rights holders to protect their intellectual property. Disclosure and record-keeping requirements would enable rights holders to know whether their works have been used. Standard copyright liability principles can then be applied to determine whether there has been an infringement, identify the infringer, and assess the resulting damages.
In addition, record-keeping and disclosure of the materials used in AI training and testing activities would accomplish three key objectives:
(i) promoting the development of a functional licensing market by incentivizing AI developers to seek authorization before using works and by disincentivizing unauthorized uses;
(ii) reinforcing the rights of creators and rights holders to control the use of their copyright-protected works and to obtain fair remuneration for such use; and
(iii) ensuring that the remedies in the Copyright Act are not nullified by the practical impossibility of detecting infringement and pursuing enforcement options.
Transparency would also serve consumer interests. While consumers, unlike rights holders, may not need to know exactly what data was fed into the AI system they are using, they should not have to guess whether the system was trained on legitimate, authorized copyrighted works rather than infringements or fakes.
For these reasons, developers and deployers of generative AI systems should be required to keep and make readily available detailed and accurate records of the data they have used for training, the source of that data, and the existence of any licences authorizing the use of that data. The records should be understandable to a layperson and detailed enough to identify (i) each specific work used in training, retraining, refining, or testing the AI model, or any similar use, (ii) any metadata associated with each work (e.g., title, author, owners), the immediate source of each work, (iii) the purposes for which each work has been used, and (iv) whether a licence has been obtained for each work.
Record-keeping and disclosure obligations should apply to every person involved at each stage of the training, retraining, refining, testing, and other development of the AI model, including dataset aggregators.
2. WITH RECORD-KEEPING AND DISCLOSURE OBLIGATIONS IN PLACE, THE CURRENT COPYRIGHT ACT WOULD BE SUFFICIENT TO ADDRESS AI-SPECIFIC ISSUES
The Consultation Paper asks, “Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?”
Requiring detailed logs of data used by an AI model is a necessary best practice. As an example of transparent disclosure obligations, the European Union’s draft Artificial Intelligence Act would require automatic logging of events while high-risk AI systems are operating. The logging capabilities must address, at a minimum, (i) the recording of the period of each use of the system; (ii) the reference database against which input data has been checked; (iii) the input data for which the search has led to a match; and (iv) the identification of the natural persons involved in the verification of the results” [European Commission, Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union Legislative Acts (“EU Proposal”), at art. 12 <https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206>].
At the same time, the EU is also contemplating imposing similar record-keeping and transparency obligations on providers of so-called “foundational models”, which are defined as AI system models that are trained on broad data at scale, are designed for generality of output, and can be adapted to a wide range of distinctive tasks. Also under discussion are provisions that would require providers of so-called “general-purpose AI systems” to maintain detailed technical documentation for at least 10 years. That would include “data requirements in terms of datasheets describing the training methodologies and techniques and the training data sets used, including information about the provenance of those data sets, their scope and main characteristics; how the data was obtained and selected; labelling procedures (e.g. for supervised learning), data cleaning methodologies (e.g. outliers detection).” [EU Proposal, at art. 50 and Annex IV <https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206>].
Given that record-keeping by AI companies is important for purposes that extend beyond copyright protection, record-keeping obligations could be enacted outside the Copyright Act (for example, in the Artificial Intelligence and Data Act).
Importantly, AI developers should not be able to argue that they are shielded from liability for infringement because they do not retain copies of their training material once training is complete or that they did not compile the training data. Authorization from copyright owners is required before assembling and curating datasets that include copyright-protected works, much less before training AI models on those datasets. Whether or for how long the copies are retained is irrelevant.
With appropriate proper record-keeping and disclosure obligations in place, the current Copyright Act will be sufficient to address issues specific to AI. Liability could potentially arise for primary or secondary copyright infringement, moral rights infringement, removal of digital rights management information, and circumvention of technological protection measures.
Comments and Suggestions
To strike the appropriate copyright balance, it is imperative that Canada approach generative AI in a manner that respects creators and copyright and incentivizes human expression. AI companies, like all technology companies, require permission from copyright owners before using copyright-protected content, whether to curate and assemble datasets or to train AI models on those datasets once assembled. That permission can and should be obtained through negotiated licences, not rendered moot by copyright exceptions, remuneration rights, or an opt-out system. Human creation and expression, and their contributions to Canadian culture, must not be sacrificed at the altar of rapid technological progress.
Canada should lead the international community in respecting creators. The development of public policy surrounding AI is in its infancy. That presents the Government with an important opportunity to lead the world in maintaining strong respect for copyright and the rights of creators.
Canada should not follow any international approaches to AI and copyright that would exempt or limit the scope of copyright protection in relation to TDM activities. MPC acknowledges and endorses commitments made by the G7, which broadly emphasize “multi-stakeholder” participation in the development of AI standards that prioritizes fairness, transparency, and adherence to existing law; commitment to “human-centric and trustworthy AI”; and continued discussion and analysis of how best to safeguard copyright and other IP rights [European Commission, “Hiroshima AI Guiding Principles and Codes of Conduct” <https://digital-strategy.ec.europa.eu/en/library/hiroshima-process-international-code-conduct-advanced-ai-systems>; Government of Canada, “G7 Hiroshima Leaders’ Communiqué” <https://www.international.gc.ca/world-monde/international_relations-relations_internationales/g7/documents/2023-05-20-hiroshima-leaders-communique-dirigeants.aspx?lang=eng>].