Consultation on Copyright in the Age of Generative Artificial Intelligence: Submissions U-W

The information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages and privacy requirements.

U

UNB Libraries and the UNB Legal Innovation Laboratory (UNB Law)

Technical Evidence

The University of New Brunswick Libraries (UNB Libraries) respond to the learning and research needs of the UNB community through developing collections and providing access services, including online databases for discovery and document delivery. We currently maintain access to collections of over 2.5 million items in multiple formats and are committed to providing the broadest possible access to the resources that this community needs. UNB is a comprehensive university, and the premier research institute in New Brunswick – responsible for 75% of publicly funded research. As a public institution, UNB Libraries’ mandate includes serving as a resource to the wider community in New Brunswick, the Atlantic Canadian region and beyond, through such services as an institutional repository for published works, the digitisation of historic New Brunswick newspapers and other documents, and the collection of New Brunswickana.  

Recognizing the importance of access to knowledge for all people across Canada, UNB Libraries are positioned as a pivotal resource for the intellectual, scholarly, and creative communities of our province. We also invest specifically in the collection and preservation of New Brunswick’s creative and scholarly output which we preserve for the benefit of all Canadians. We take pride in being a leader in the support of Canadian publishing specifically from the New Brunswick and the Atlantic region. Our Centre for Digital Scholarship hosts 22 journals, all of which contribute to the international publishing market. Furthermore, University Archives and Special Collections, a unit of UNB Libraries, purchases the works and papers of provincial and regional authors with a commitment to acquiring and preserving copies of New Brunswick’s literary heritage, which include works produced in English, French, and in Indigenous languages of our region.

UNB Libraries have longstanding licensing relationships with aggregators and vendors of vast databases of copyrighted materials. We regularly negotiate licensing agreements on behalf of researchers and learners to provide legitimate access for teaching, innovation, and discovery. While these digital collections have expanded the quantity of our holdings, the move to electronic access has evolved in ways that undermined many fundamental user rights encoded in the Copyright Act. Although UNB Libraries do not advocate for legislation that would eliminate incentives for a viable market for electronic books and other digital information resources, we are concerned about an ever-increasing number of constraints in licensing agreements proposed by aggregators and vendors. These constrains expand to cover non-consumptive use of works by our researchers, thus creating challenges and uncertainty about our ability to encode in training datasets copyright-protected content that we lawfully own or have access to.

Over the past two years UNB Libraries have investigated and subsequently approached multiple vendors exploring the licensing options that market datasets for text and data mining (TDM) and other computational uses. The trend around such products is typified by inconsistency and is alarming considering the lack of clarity around copyright. We acknowledge that computational methods of research and discovery are in their infancy for many areas of study. Meanwhile, our experience points to fundamental contradictions, such as the requirement that all TDM analysis be conducted on a publisher's platform for an extra fee, in relying on market models and licensing regimes for access to textual data across multiple platforms, especially platforms that are not interoperable. Moreover, a lack of interoperability result in barriers to access diverse datasets, and ultimately producing non-representative training data for Generative AI systems. 

Libraries must keep up with technological advances to manage collections, across various formats, that are outstretching human resource capacities. UNB Libraries have been utilizing AI to enable access to the more than 2.5 million items catalogued in our collection. UNB Libraries’ computational experts are exploring the potential of Generative AI tools in the management and usage of these collections, specifically in the development of applications that will support user inquiries particular to our collections. Access to quality pre-trained large language model (LLM) tools is essential to the effectiveness of ‘fine-tuning’ the task-specific layers for our application. Library collections are curated sources of invaluable insights and can be the foundation for further development of Generative AI. It is equally important for UNB Libraries to enable legitimate access to resources unique to the Atlantic Canada region and provide opportunities to inform and enrich analytical research methodologies with their data. 

UNB is home to leading experts in Data Science and AI, who conduct research across our campuses, including at the Research Institute in Data Science and Artificial Intelligence, the Canadian Institute for Cybersecurity, the New Brunswick Institute for Research, Data and Training, the Institute for Biomedical Engineering and the SPECTRAL laboratory for spatial computing. Human researchers, which include faculty members, graduate and undergraduate students, and other members of the UNB community, are involved in the development of AI systems. Members of our university community are leading research projects in all the above institutes which involve multidisciplinary teams working in areas such as natural language processing, computational linguistics health, cybersecurity, mapping and positioning, manufacturing, finance, and transportation. Some of these experts lead partnerships with industry and other private actors in the region and beyond. In their respective projects, these teams use training datasets to develop AI systems. For instance, experts in the Faculty of Computer Science and CIC are working to build language technologies to improve the identification of spam and phishing emails across multiple languages.

UNB Libraries are committed to support our community and their partners in curating datasets for the development of AI systems that adhere to stewardship principles, developing findable, accessible, interoperable, and reusable data and metadata in both official languages of the province and in Indigenous languages. Specifically for Generative AI applications, the quantity and quality of data are equally important, rendering stewardship and other policies designed specifically to ensure data quality very important. (See Wong, Janis. "Data Practices and Data Stewardship." Interactions 30.3 (2023): 60-63, https://doi.org/10.1145/3589133; Kahana, Eran. AI Data Stewardship Framework (March 9, 2023), https://law.stanford.edu/2023/03/09/a-data-stewardship-framework-for-generative-ai/) 

Furthermore, we are committed to support the UNB community of researchers, instructors, and students in their uses of Generative AI for both research and learning purposes. We support our users in navigating the complexities of copyright and licensed access to resources. This includes providing guidance and support in accessing copyrighted works through the UNB Libraries catalogues and services, as well as providing guidance and support in the making and curating datasets for research purposes. To mitigate liability risks regarding AI-generated content, UNB Libraries organize workshops and provide information online. Our staff members are also trained to support with questions around data protection and ethical uses of data. Furthermore, UNB’s Research Ethics Boards (REBs) ensure compliance with Tri-Council and UNB policies for all research projects at UNB that involve collecting data from humans.

Overall, as stewards of information and a resource for the responsible use of protected works through our institution, UNB library professionals require more clarity to confidently provide guidance in evolving methods of research and inquiry. Without this clarity and absent clear exceptions and limitations that have traditionally allowed libraries to perform their public service role, our work will be jeopardized.

Text and Data Mining

UNB Libraries have long advocated for clarity on Canada’s copyright framework, especially concerning text and data mining (TDM) activities. Recent developments in Generative AI technology further highlight the need for clarity. TDM, as a form of analysis involving the automated identification of patterns within extensive datasets, is an example of a non-consumptive use of material. It extracts patterns from expressed facts and ideas. Generative AI exemplifies an application of TDM; it enables users to generate non-human content. The social and scientific impact of tools like Generative AI depends, among other factors, on fair access to the collective knowledge within published works including via TDM. UNB libraries recognize that obstacles to TDM analysis jeopardize hard-earned user rights. Libraries need more clarity as to whether and under which circumstances copying of works for TDM uses can constitute infringement and as to how statutory damages for infringing activities are calculated. The lack of clarity around TDM is illustrative of larger issues that fundamentally question access and use of information in digital format.

We are concerned about the negative impact on university research if TDM licensing packages become additional costs required to support access to information. These packages are derived from materials that our researchers and learners already have legitimate access to, and in certain cases we require further legislative support to use these resources without additional cost. University libraries cannot be expected to pay additional costs for access to datasets of information for which legitimate access already exists. UNB Libraries are confident that the Copyright Act maintains the balance necessary to ensure the economic interests of the copyright owner are protected. What is needed is to ensure user rights by way of an explicit exception for TDM and limiting contractual override of user rights.

With decades of experience managing diverse formats and licensing models from numerous vendors, UNB Libraries allocate over $5 million annually, constituting half of their operating budget, to ensure access to essential research and learning materials for a campus population of fewer than 9,000 full time equivalent students. Maintaining this access is a continuing challenge, buffeted by market influences, including threats posed by the consolidation of the publishing industry and volatile rates of exchange for the Canadian dollar.

In some cases, UNB Libraries partner with other libraries and negotiate consortia packaging arrangements. These practices tend to increase our negotiating power to reflect the bargaining leverage of bigger institutions. Such consortia arrangements have helped us reclaim user rights to a certain extent. An example is the academic library industry standard Canadian Research Knowledge Network Model License in which Section 3.9 identifies a users' rights for TDM research across publishers' works. Consortia packaging arrangements are not always possible, nor a reliable or sustainable way to support the needs of our community. We regularly find ourselves independently negotiating many licenses that often come with burdensome and confusing terms of use along with non-disclosure clauses. Smaller and regional institutions like us lack the negotiating influence to mitigate market forces placing our researchers and learners at a disadvantage. For example, recently we invested more than $50,000 piloting specific TDM licensing packages with two publisher vendors. Our experience thus far demonstrates that such licensing packages do not satisfy our colleagues research needs. It is – again – unsustainable to cover current and future TDM needs maintaining similar licenses on a case-by-case, or especially on an on-going basis.

We posit that as it currently stands language from the Copyright Act can enable licensing terms that override user rights and clear jurisprudence. From our perspective, enabling or creating additional access barriers to works in a digital format is contrary to Canada’s commitment to technological neutrality and to modernizing the balance of rights for creators and users. Many jurisdictions have implemented and recognized TDM provisions and exceptions into their legal frameworks, as for instance, the European Union with article 3 of the EU Directive 2019/790 on Copyright in the Digital Single Market. The United States has established a strong foundation for non-consumptive use of copyrighted materials within its fair use framework. Moreover, scholars have convincingly argued that copyright reform clearly allowing TDM practices will benefit research. See Fiil-Flynn, S. M., Butler, B., Carroll, M., Cohen-Sasson, O., Craig, C., Guibault, L., ... & Contreras, J. L. (2022). Legal reform to enhance global text and data mining research. Science, 378(6623), 951-953, available at https://www.science.org/doi/10.1126/science.add6124

While expanding Canada’s fair dealing framework is essential for a flexible and balanced approach to developing tools of textual analysis, UNB Libraries would benefit from a more direct approach such as the explicit exceptions implemented in Japan and Singapore and recommended by the Australian law reform. See

Japan Copyright Act https://www.japaneselawtranslation.go.jp/en/laws/view/1980/en

Singapore Copyright Act 2021 S244(2)(d). https://sso.agc.gov.sg/Acts-Supp/22-2021/Published/

Australian Government Law Reform Commission (2013) https://www.alrc.gov.au/publication/copyright-and-the-digital-economy-dp-79/8-non-consumptive-use/non-consumptive-uses-and-fair-use/

It is important to acknowledge that the library community’s support for applications of this technology is not concerned with or attempting to encroach on the original expression of the work, but to extract and understand the patterns, information, correlations, essentially the facts and ideas behind these works. The non-expressive nature of these uses of works is an important concept to build into any technologically durable copyright policy. Acknowledging the separation of the economic interests vested in the original expression, on the one hand, and the user rights to the information behind the work, on the other, further supports a TDM exception for both commercial and non-commercial applications. While the application of TDM in certain areas, like Generative AI, has not been tested in court, TDM itself does not impede the original publication's incentives or rewards. Simultaneously, advancements in computing capacity have allowed for the storage and collection of copies of published material, leading to the development of various commercial opportunities, including platforms like Google Books, Hathi Trust, CCC, Elsevier, etc. Finally, enabling TDM from legitimately accessed published works safeguards against unfair monopolies and unnecessary enclosure of information.

As a university library we advocate for the widest possible access to data and knowledge for our community of researchers, instructors, and students, to enable them to study and build unbiased and ethical AI applications. The changes we advocate for in Canada's legislation, outlined below aim to modernize the copyright framework and re-establish checks and balances. This would align with legislative approaches of our most influential trading and research partners globally and satisfy current research demands. Future proofing copyright legislation that will enable libraries to maintain their public service role in the age of AI is key.  

Recommendations for changes to the Copyright Act:    

1. Make fair dealing purposes illustrative. UNB Library supports the adoption of an illustrative rather than an exhaustive list of purposes, through the addition of the words ‘such as’ in the fair dealing framework. 

2. Create a specific exception for Text and Data Mining. To meet the needs of UNB’s research and innovation goals this exception should be for both commercial and non-commercial applications. 

3. Strengthen Canada’s commitment to Technological Neutrality by implementing measures to safeguard libraries and their users from licenses and terms of use that supersede essential copyright exceptions. Many of these exceptions are enforced through technological protection measures (TPMs).

Authorship and Ownership of Works Generated by AI

UNB Libraries submit that any uncertainty around this matter should not impact the development and adoption of responsible AI technologies. This is appropriate given the current state of the technology and the national and international debates around authorship or ownership of AI-assisted and AI-generated works. Generative AI, in particular, is already widely used in various stages of the creative production. While AI-assisted and AI-generated works are proliferating, AI systems and models and the ways in which creators experiment with them are still evolving.  

Therefore, we believe that there is no imminent need for the Government to propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works. Most jurisdictions have taken this approach, requiring human authorship for copyright protection. The Copyright Act supports and rewards human authorship and creativity. It is unclear whether there are any benefits in treating AI-assisted or generated output as potentially copyrightable content. In fact, at this stage UNB Libraries support that AI-generated works should not be protected by copyright to avoid further erosion of the public domain. Our position aligns with IP Scholars' 2021 Joint Submission to the Canadian Government Consultation “A Modern Copyright Framework for Artificial Intelligence”

See Carys Craig et al, Submission on Artificial Intelligence from IP Scholars to the Minister of Innovation, Science, & Industry & the Minister of Canadian Heritage (26 September 2021) for the Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things, [unpublished, archived at Schulich Law Scholars, Dalhousie University], available at https://digitalcommons.schulichlaw.dal.ca/reports/70/

Infringement and Liability regarding AI

In the case of AI-assisted or AI-generated content, it is unclear to us whether existing legal tests for demonstrating copyright infringement apply at the input or output phase. Instead, we understand that Generative AI models use statistical analysis to learn patterns from large amounts of data. Looking at input, is accessing and analyzing data an act of copyright infringement? We understand that it is not. At this phase it is unclear to us whether potential liability could be tied to any specific act of reproduction, and specifically to temporary copies. It is also unclear to us whether liability could be established in the absence of output substantially similar to an existing work. Furthermore, in the case of substantially similar output, it is unclear whether that is a result of the system memorizing and reproducing training data, or of the end-user's prompts, or both. It is also unclear how opt-in/opt out regimes or voluntary collective licensing schemes might affect liability questions. An additional point of concern relates to the exercise of moral rights of rightsholders and how to determine or enforce infringement of moral rights in the context of AI-generated content. We understand that questions around copyright infringement and liability questions are unclear in more jurisdictions and follow global developments, which include legislative efforts and litigation. See Andres Guadamuz, A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs, GRUR International, 2024;, ikad140, https://doi.org/10.1093/grurint/ikad140

Explicit copyright exceptions, including for TDM, would remedy the current lack of clarity around liability. Canada’s unique orphan works framework (Copyright Act s77) might also be a solution to some of the uncertainty surrounding copyright liability issues with AI generated outputs, especially in cases of commercial output. Training data for transformer-based Generative AI almost certainly contains both copyrighted material and orphan works. A user who wants to publish or fix a form of generative output could follow the orphan works framework by making reasonable efforts to find a rights holder, then applying for an orphan works license through the copyright board. If this was successful, the user would pay a royalty to a collective society and be granted a non-exclusive license to use the output. Assuming the process was efficient and simple to meet market and general demands, this could reduce uncertainty and provide some recompense to a potential copyright holder if the work turned out to be substantially similar to an existing work and the copyright holder came forward. In this case, however, particularly for non-commercial uses, the administrative costs and inefficiencies are worrisome. This discussion focused on infringing output, not on computational analysis of data (input) which, we posit, should always be covered by a clear TDM exception. Finally, besides robust exceptions or regulatory schemes such as the orphans works framework, technical solutions such as watermarking data sources might also prove helpful in detecting and deterring copyright infringement, so long as copyright liability is clearer and exceptions, including fair dealing, are sustained.

Currently we have little knowledge of the training datasets that feed Generative AI models. Lack of transparency around the training datasets is another barrier to determining copyright infringement, without knowledge of the input data we cannot determine whether specific copyright-protected content has in fact been used. Trade secret protections and other intellectual property protections, such as copyright or patent protection of algorithms, could act as additional barriers. Thus, being able to audit both the training dataset and the algorithm, especially in cases of infringing output, might prove important. Last but not least, recent revelations regarding illegal and unethical content included in popular training datasets confirm the urgent need for more transparency around training data. (Catherine Thorbecke, Hundreds of images of child sexual abuse found in dataset used to train AI image-generating tools, CNN, December 21, 2023, at https://edition.cnn.com/2023/12/21/tech/child-sexual-abuse-material-ai-training-data/index.html citing studies such: Thiel, D. (2023). Identifying and Eliminating CSAM in Generative ML Training Data and Models. Stanford Digital Repository. Available at https://purl.stanford.edu/kh752sm9123).

UNB Libraries believe that libraries are best positioned to engage and assist their users in testing computational uses of data extracted by non-infringing copies of lawfully acquired or licensed content. This includes the use of Generative AI tools and the development of Generative AI applications. This is because the quality of the output of such tools and applications is dependent upon the quality of their data input.  Furthermore, dataset creation and curation involve careful decision-making about the data that should be included in a training dataset. These decisions can shape a model’s outputs. See Lee, Katherine and Cooper, A. Feder and Grimmelmann, James, Talkin’ ‘Bout AI Generation: Copyright and the Generative-AI Supply Chain (July 27, 2023). Forthcoming, Journal of the Copyright Society 2024, Available at SSRN: https://ssrn.com/abstract=4523551 or http://dx.doi.org/10.2139/ssrn.4523551. The data scientists employed by UNB Libraries have the expertise and training to create and curate training datasets and develop standards for appropriate computational uses of those datasets by their users. We have the expertise to create datasets, but not additional funds to essentially buy our resources again in a new format. We believe that copyright law was not intended to restrict new ways of learning or analysing content that is lawfully accessed by or within a library, nor to chill innovation within research libraries and universities. 

In view of the above, and to better serve the needs of the UNB community, UNB Libraries would welcome more clarity around attribution and liability, which would be achieved by robust exceptions. In addition, and as discussed in our answer to the previous question, an explicit exception for TDM and the making of the fair dealing purposes illustrative would help UNB Libraries better serve the needs of the UNB community without risking primary or secondary infringement even in the rare case of substantially similar output. We also endorse legislative approaches such as the one taken by the EU’s AI Act whose latest drafts include disclosure data requirements, such as requirements to describe the training data sets used, how the data was obtained and selected for training purposes, labelling procedures, data cleaning methodologies, and the training methodologies and techniques.

Comments and Suggestions

Further suggestions-

FAIR Principles for data management and data stewardship frameworks to ensure high quality data: UNB Libraries would welcome support to formalize and promote FAIR Principles for data management in the space of Generative AI. Recognizing the transformative potential of AI tools, UNB Libraries points to the key role of government to support standards in data, including metadata, management and stewardship such as the FAIR (Findable-Accessible-Interoperable-Reusable) principles. See FAIR Guiding Principles for scientific data management and stewardship, https://www.go-fair.org/fair-principles/

This emphasis on findable, accessible, interoperable, and reusable data is essential to lay a sturdy foundation for future data management and can be critical in current efforts to develop explainable AI models. For our purposes, the successful implementation of these standards is crucial as it directly influences the effectiveness of AI tools in facilitating entry and enabling growth of the library collections. This position resonates with the position of the broader Library, Archive, and Museum community, emphasizing the need for cohesive efforts in ensuring robust metadata practices. Information specialists continuously advocate and adhere to standards that strive to create best practices and guidelines for the responsible management of information. See, for example, the important work of National Information Standards Organization (NISO) https://www.niso.org/

Overall, we believe that libraries are best positioned to engage and assist their users in testing computational uses of data extracted by non-infringing copies of lawfully acquired or licensed content. This includes the use of Generative AI tools and the development of Generative AI applications. The data scientists employed by UNB Libraries have the expertise and training to develop standards for appropriate computational uses of those datasets by their users and are in the position to collaborate with university experts towards establishing fair and ethical uses of AI tools and other computational methods.

Union des artistes

Preuve de nature technique

Cette soumission constitue une position commune des associations et société suivantes :

Artisti, une société de gestion collective canadienne représentant divers artistes-interprètes pour la gestion collective de leur droit à la rémunération équitable et leur droit à la rémunération découlant de la copie privée ainsi que tout ou partie de leurs droits exclusifs ;

L’Union des artistes (UDA), un syndicat professionnel représentant les artistes de plusieurs disciplines œuvrant en français ou dans toute autre langue à l’exception de la production faite et exécutée en anglais ;

La Guilde des musiciens et musiciennes du Québec (GMMQ), une association d’artistes légalement reconnue au Québec pour représenter les musiciens professionnels, notamment lors de la négociation d’ententes collectives visant leurs conditions de travail et de rémunération.

Celles-ci voient le potentiel de l’IA à titre d’outil de création : plusieurs de leurs membres s’en servent d’ailleurs comme d’instruments leur permettant de livrer une prestation. Il est néanmoins essentiel d’encadrer l’utilisation de la technologie, particulièrement dans le contexte de la fouille de textes et de données (« FTD »), puisque les prestations des artistes interprètes sont actuellement utilisées à cette fin à leur insu, sans rétribution.

Fouille de textes et de données

Une plus grande clarté et transparence permettraient de mieux appréhender le fonctionnement de la FTD, incluant la façon dont les prestations d’artistes interprètes sont utilisées, ainsi que les rôles et responsabilités des différentes parties prenantes. Ceci permettrait également de déterminer : (i) dans quel(s) contexte(s) l’analyse informationnelle est autorisée, ou non, par le régime actuel de droit d’auteur canadien et ainsi, (ii) quelles licences et rétributions doivent être versées aux titulaires des prestations d’artistes interprètes.

Des activités de FTD sont actuellement menées au Canada, afin d’entraîner des modèles algorithmiques. Les activités de développement et d’entraînement de systèmes d’IA peuvent impliquer la reproduction de contenus protégés par droit d’auteur dont des prestations d’artistes interprètes ou leur voix et leur image hors prestation, et ce, sans qu’ils y consentent ou reçoivent une juste rétribution. Ceci est évidemment problématique et il importe d’y remédier. En outre, il est essentiel que l’autorisation (de type « opt-in » et non « opt-out ») des artistes interprètes soit obtenue préalablement à toute reproduction de leurs prestations, leur voix ou leur image, et qu’une rétribution juste et équitable leur soit versée en contrepartie de cette utilisation. L’obtention de ces consentements devra prendre en compte les particularités de chaque contenu reproduit. Par exemple, dans le cas des prestations fixées, un contentement distinct devra être obtenu auprès des artistes-interprètes si l’autorisation initialement consentie aux producteurs ne couvre pas la FTD, ce qui est le cas pour l’instant.

À cet égard, il est également important de rappeler que les artistes interprètes qui ont consenti à ce que leurs prestations soient intégrées à une œuvre cinématographique ne peuvent présentement pas exercer leurs droits de l’article 15 (1) compte tenu de l’article 17(1) de la Loi sur le droit d’auteur et qu’ils ne peuvent pas non plus bénéficier de droits moraux à l’égard de ces prestations audiovisuelles. Afin de résoudre ces enjeux, le Canada devrait ratifier le Traité de Beijing, ce qui permettrait aux artistes-interprètes audiovisuels d’exercer un meilleur contrôle sur leurs prestations incorporées dans des œuvres cinématographiques.

Les titulaires de droits face à des défis en lien avec en ce qui concerne l’octroi de licences pour les activités de FTD. En outre, il est difficile pour les artistes interprètes de déterminer quel contenu est utilisé dans le contexte de la FTD et quelle est l’ampleur de cette utilisation. Afin de pallier cette lacune, il pourrait être envisagé d’imposer une obligation de transparence ou de tenue de registres auprès des entités développant et entraînant des systèmes d’IA.

Diverses licences sont disponibles pour les activités de FTD impliquant l’exercice d’un droit réservé aux titulaires de droit d’auteur, à savoir la reproduction. Ces licences peuvent être négociées de gré à gré avec les titulaires de droits d’auteurs incluant les artistes interprètes ou être obtenues par le biais d’une société de gestion collective.

En effet, en ce qui a trait aux artistes interprètes, la possibilité de faire des reproductions de leurs prestations aux fins de la FTD n’est généralement pas incluse dans les autorisations qu’ils ont données aux producteurs d’enregistrements sonores ou d’œuvres cinématographiques, ces autorisations visant essentiellement l’exploitation commerciale des enregistrements sonores et des œuvres cinématographiques. Il faudrait donc que des autorisations aux fins de FTD soient obtenues auprès des artistes interprètes ou de leur société de gestion collective. Les artistes interprètes ou leur société de gestion collective seraient tout à fait à même de les émettre.

Pour rappel : ces licences ne semblent présentement pas être obtenues par les personnes menant des activités de FTD. Ceci crée évidemment un manque à gagner notamment pour les artistes interprètes qui peinent à obtenir une juste compensation pour l’utilisation de leurs contenus.

Comme les prestations reproduites aux fins de FTD sont des prestations fixées sur des enregistrements sonores ou audiovisuels et qu’il s’agit de prestations d’œuvres, plusieurs mécanismes peuvent être envisagés pour compenser les ayants droits visés pour cette utilisation qui est faite de leurs prestations : l’introduction d’un droit à une rémunération équitable pour la FTD ou un droit à rémunération via un mécanisme semblable à celui de la copie pour usage privé font partie de ces mécanismes.

La reproduction des prestations d’artistes interprète aux fins de la FTD ne pas couvertes par les dispositions contractuelles qui encadraient la fixation de ces prestations. C’est donc dire qu’aux fins de cette activité, l’autorisation de l’artiste interprète devrait donc systématiquement être obtenue. En effet, il ne faut pas oublier que la reproduction d’une prestation aux fins de la FTD impliquera souvent la reproduction de la voix et de l’image d’un artiste interprète (des données biométriques), qui sont des attributs de sa personnalité protégés par les droits de la personnalité, le droit à la vie privée et les législations en matière de protection de données personnelles.

Compte tenu de ces différentes protections législatives, il semble donc impossible d’envisager une exception pour permettre l’utilisation de ces prestations impliquant la voix ou l’image d’un artiste aux fins de la FTD puisqu’un ensemble d’autres dispositions législatives contrecarraient et contrediraient l’introduction d’une telle exception.

Nous ne sommes donc pas favorables à l’adoption d’une exception générale permettant la FTD, laquelle serait, par ailleurs, également contraire aux engagements du Canada en vertu de divers traités internationaux, tels que la Convention de Berne, l’ADPIC et l’ACEUM lesquels précisent que toute limitation ou exception à laquelle le Canada entend assujettir un droit d’auteur doit être restreinte à certains cas spéciaux où il n'est pas porté atteinte à l’exploitation normale de l’œuvre, ni causé de préjudice injustifié aux intérêts légitimes de l'auteur.

Ainsi, si jamais le gouvernement décidait d’adopter malgré tout une exception de FTD (ce que nous ne recommandons pas), il devra veiller au respect de ses engagements internationaux, par exemple, en veillant à ce que l’exception soit : (i) limitée à des cas spécifiques (par exemple, à des fins de recherche) ; (ii) assujettie à des conditions d’application strictes (par exemple, l’accès à l’œuvre ou objet de droit d’auteur doit être licite) ; et (iii) assortie du versement d’une juste rétribution au bénéfice des titulaires de droits d’auteur, ainsi que d’un mécanisme de retrait (« opt-out ») pour les titulaires de droits d’auteur.

Finalement, cette exception ne devrait pas s’appliquer aux droits moraux, mais uniquement aux droits dits « économiques ».

Nous recommandons fortement la tenue de registre ou la divulgation des contenus protégés par le droit d’auteur qui ont été utilisés pour la formation des systèmes d’IA. Il s’agit d’une obligation essentielle qui devrait être intégrée dans la Loi sur le droit d’auteur. La transparence est l’un des principes fondamentaux qui devraient guider en tout temps les développeurs de systèmes d’IA.

Le niveau de rémunération approprié pour l'utilisation d'une œuvre ou d'un objet du droit d'auteur dans les activités FTD doit être juste et équitable, basé sur les utilisations faites des contenus protégés. Dans tous les cas, la rémunération devrait être arrimée avec les autorisations obtenues et prendre en compte les particularités de chaque contenu reproduit. Par exemple, dans le cas des prestations fixées, une rémunération distincte devra être versée aux artistes-interprètes si l’autorisation initialement consentie aux producteurs des contenus reproduits ne couvrait pas la FTD.

Comme exposé plus tôt, il n’est pas recommandé d’introduire une exception de FTD au Canada. Au contraire, il est essentiel de veiller au respect de la Loi sur le droit d’auteur et des autres dispositions législatives trouvant présentement application (tels que les droits de la personnalité et ceux liés à la protection des renseignements personnels qui sont en jeux lors de la reproduction des prestations d’artistes interprètes à d’autres fins que celles initialement consenties ou la reproduction de leur voix et leur image hors-prestations) en s’assurant que l’autorisation des artistes interprètes soit obtenue et qu’une rémunération juste et équitable leur soit versée lorsque leur contenu est utilisé à des fins de FTD.

Nous recommandons également qu’une obligation de transparence ou de tenue de registres soit imposée aux chercheurs et développeurs de systèmes d’IA générative, dans le contexte de la FTD.

Si toutefois le Canada souhaitait, en dépit de nos recommandations et en contravention des droits de la personnalité, droit à la vie privée et droits liés à la protection des renseignements personnels qui protègent la voix et l’image des artistes interprètes, introduire une exception de FTD, il devra veiller à ce que cette exception respecte les balises internationales, soit d’application limitée et assortie d’un mécanisme de retrait (« opt-out ») pour les titulaires de droits d’auteur.

À cette fin, le gouvernement canadien pourrait examiner la situation prévalant au sein de l’Union européenne, la Suisse et le Royaume-Uni.

Titularité et propriété des œuvres produites par l’IA

L’incertitude entourant la titularité ou la propriété d’œuvres et d’autres objets du droit d’auteur produits par l’IA ou à l’aide de l’IA a notamment des répercussions sur la rémunération des artistes tels que les musiciens, dont les contenus se retrouvent « dilués » sur des plateformes telles que Spotify. En effet, dans la mesure où des contenus « artificiels » envahissent les plateformes de diffusion, les vrais contenus seront noyés dans cette mer de contenus « artificiels » qui pourraient accaparer une proportion des redevances qui seraient autrement destinées aux véritables artistes interprètes.

L’absence de protection des contenus « artificiels » a par ailleurs une incidence sur la protection des véritables prestations d’artistes-interprètes.

En outre, il existe une incertitude entourant la titularité et la rémunération liées à une prestation « artificielle » incorporant la voix, l’image ou la ressemblance d’un artiste-interprète, alors que celui-ci n’a pas autorisé une telle incorporation.

Enfin, selon la Loi sur le droit d’auteur, une « prestation » ne sera protégée que si elle est « rattachée » à une œuvre. Par conséquent, le droit des artistes-interprètes pourrait être mis en péril si ces derniers interprètent des contenus « artificiels », non protégés par le droit d’auteur.

Nous recommandons donc que les prestations soient protégées et ce, indépendamment du fait que les artistes interprètent ou exécutent des contenus « artificiels », non protégés par droit d’auteur. Après tout, les prestations d’artistes interprètes sont protégées même si elles portent sur une œuvre du domaine public ne bénéficiant plus de la protection du droit d’auteur. Il serait donc possible d’étendre la protection des prestations afin de prévoir que la prestation d’un contenu « artificiel » est protégée au même titre que la prestation d’une œuvre, et ce, d’autant plus que l’article 9 de la Convention de Rome est à l’effet que « Tout État contractant peut, par sa législation nationale, étendre la protection prévue par la présente Convention à des artistes qui n’exécutent pas des œuvres littéraires ou artistiques. »

Une meilleure protection des droits des artistes interprètes pourrait donc être atteinte en (i) revoyant les définitions de « prestation » et d’« artiste-interprète » au sein de la Loi sur le droit d’auteur ; (ii) introduisant les droits moraux pour les artistes-interprètes audiovisuels (par exemple, par le biais de la ratification du Traité de Beijing) ; et (iii) introduisant des présomptions de violations des droits économiques et/ou moraux des artistes-interprètes lorsque leurs prestations (ou des composantes de celles-ci telles que la voix ou l’image) sont reproduites dans un contexte d’IA générative à leur insu.

Si aucune contribution humaine ne peut être identifiée en lien avec une « prestation artificielle», nous ne recommandons pas de la protéger. Toutefois, dans la mesure où une contribution humaine est identifiable en lien avec une prestation artificielle, que ce soit : (i) par l’intégration d’une prestation, de la voix, de l’image ou de la ressemblance d’un artiste interprètes ou (ii) par une utilisation de l’intelligence artificielle par un humain qui pourrait être assimilée à celle d’un musicien instrumentiste, cette contribution humaine devrait bénéficier d’une protection.

Le gouvernement pourrait aussi préciser qu’un « artiste interprète », aux fins de la Loi sur le droit d’auteur, est obligatoirement un être humain. Il pourrait également prévoir une présomption à l’effet que la voix et l’image d’un artiste interprète constituent une partie importante de sa prestation.

Ceci permettrait d’avoir plus de certitude quant à l’application de la Loi sur le droit d’auteur.

Enfin, il est également recommandé de modifier la définition d’« artiste-interprète » et de « prestation » au sein de la Loi sur le droit d’auteur, afin que la prestation ne soit plus uniquement rattachée à des œuvres. Le fait qu’une prestation en soit une d’un « produit de l’intelligence artificielle » plutôt que d’une œuvre ne devrait pas faire obstacle à sa protection.

Il n’existe pas d’approches éclairantes dans d’autres pays. Si les législations du Royaume-Uni, de l’Irlande et de la Nouvelle-Zélande ont choisi d’attribuer la titularité d'œuvres générées par ordinateur à la personne qui a pris les dispositions nécessaires à la création de l’œuvre créée, nous ne recommandons pas d’emprunter cette voie, car ces dispositions ont été introduites dans un contexte étranger à l’IA générative. Or, cette technologie soulève des questions bien plus complexes. De plus, nous n’avons pas connaissance que la question spécifique de la titularité de la prestation, de la voix et de l’image d’un artiste interprète dans un produit de l’intelligence artificielle générative ait été abordée dans quelque juridiction que ce soit.

Violation et responsabilité en matière d’IA

Quant à l'existence de préoccupations quant à l’application des critères juridiques existants, nous soumettons qu'il peut être difficile pour un artiste interprète :

(a) d'identifier la ou les personnes responsables d’une violation de ses droits ou d’une contrefaçon de sa prestation ; et

(b) d’établir que la partie qui a utilisé sa voix, son image ou sa ressemblance a eu accès à une prestation préexistante (plutôt que simplement sa voix ou son image hors prestation), que la prestation (et non simplement la voix ou l’image hors prestation) était la source de la copie et qu’une partie importante de la prestation a été reproduite.

De plus, comme il n’y a pas de présomption intégrée à la Loi sur le droit d’auteur à l’effet que la voix ou l’image d’un artiste interprète constituent une partie importante de sa prestation, les critères juridiques existants pourraient ne pas permettre de démontrer qu’un produit de l’intelligence artificielle qui utilise la voix ou l’image d’un artiste interprète viole le droit d’auteur que celui-ci détient sur ses prestations.

La pluralité des intervenants, l’opacité des systèmes d’IA, et la pixellisation de certaines prestations, ainsi que de la voix et de l’image des artistes-interprètes, ce qui rend les prestations originales difficilement identifiables, sont autant d'obstacles qui empêchent de déterminer si un système d’IA a accédé ou copié un contenu spécifique protégé par le droit d’auteur ou les autres droits d'un artiste-interprète lors de la génération d’un extrant portant atteinte à ces droits.

Nous n’avons pas connaissance que les entreprises commercialisant des applications d’IA prennent des mesures pour atténuer les risques de violation du droit d’auteur par des œuvres générées par l’IA lorsqu’elles commercialisent des applications d’IA. Pour éviter que les produits de l’IA ne violent le droit d’auteur des artistes interprètes sur leurs prestations, les autorisations nécessaires pourraient être obtenues en amont des utilisations par le biais de licences.

L’option d’utiliser des prestations faisant partie du domaine public ne permettrait pas d’éviter à tous coups une violation d’autres droits que le droit d’auteur, tels que le droit à la voix ou le droit à l’image puisqu’il est possible qu’un artiste interprète survive à la durée de protection de ses prestations. Le cas échéant, l’utilisation sans autorisation d’une prestation du domaine public incorporant sa voix ou son image continuerait néanmoins de résulter en une violation de ses droits de la personnalité.

À la question de savoir si l'on devrait clarifier davantage la responsabilité dans les cas où une œuvre générée par l’IA viole les droits d’une œuvre déjà protégée par le droit d’auteur, nous répondons que la Loi sur le droit d’auteur dispose de mécanismes suffisants pour déterminer la responsabilité en cas de violation de droit d’auteur.

Cela dit, il serait néanmoins souhaitable qu’aux fins de la détermination de ce qui constitue une contrefaçon d’une prestation, il soit reconnu, par l’introduction d’une présomption, que la voix ou l’image d’un artiste interprète constitue une partie importante de sa prestation.

Enfin, le Canada pourrait imposer une obligation de transparence ou de tenue de registres auprès des entités développant et entraînant des systèmes d’IA.

Quant aux approches d'autres pays qui pourraient éclairer l'examen de cette question au Canada, nous rappelons que dans son projet de règlement « AI Act », le Parlement européen a introduit une obligation de transparence, de sorte que les entités qui développent des systèmes d’IA devront publier un résumé suffisamment détaillé de leur utilisation de « données d’entraînement protégées par la législation sur le droit d’auteur », ainsi qu’une information appropriée, claire et visible qui distingue le contenu généré de l’original. Cette approche nous paraît louable, mais le Canada devrait aller encore plus loin. En outre, l’obligation de transparence canadienne devrait également s’appliquer aux prestations et à leurs composantes (voix, image et ressemblance de l’artiste-interprète), ainsi qu’aux résultats générés par ou avec IA.

Commentaires et suggestions

La consultation publique est accueillie favorablement par nos organisations, lesquelles voient en cet exercice une volonté du gouvernement de clarifier les incidences de l’IA sur le droit d’auteur. Nos organisations ne souhaitent pas freiner l’avancement de l’IA, mais désirent préserver l’équilibre que la Loi sur le droit d’auteur sous-tend, en veillant à préserver la culture canadienne, la créativité humaine, ainsi que les intérêts des titulaires de droits d’auteur.

Pour ce faire, nous recommandons que les principes regroupés sous l’acronyme « A.R.T. » (Autorisation, Rétribution et Transparence) guident les actions du gouvernement, dans le contexte de cette consultation publique et des possibles amendements à la Loi sur le droit d’auteur qui en découleront.

Par ailleurs, il est important que la consultation publique ne se limite pas aux intérêts des auteurs et autres titulaires de droits d’auteur sur des œuvres, mais qu’elle couvre également les intérêts des artistes-interprètes sur leurs prestations ainsi que sur leur voix, leur image et leur ressemblance.

L’IA générative bouleverse en effet grandement ces créateurs, notamment dans le contexte de l’hypertrucage (ou « deepfake » en anglais). À ce chapitre, les artistes-interprètes audiovisuels ne disposent pas de droits suffisants pour protéger leurs prestations, y compris dans le contexte de l’IA générative et de l’hypertrucage. Afin de pallier cette situation, il est recommandé d’étendre les droits exclusifs et les droits moraux de ces artistes, par exemple, en ratifiant le Traité de Beijing.

Union des écrivaines et des écrivains québécois (UNEQ)

Preuve de nature technique

Les revendications du milieu artistique et culturel ne visent aucunement à réprouver l’utilisation de l’IA. On reconnaîtra les avantages et les avancées importantes que cette technologie représente dans plusieurs secteurs mais son utilisation et son déploiement doivent être encadrés par une structure transparente, équitable et respectueuse des artistes et de l’industrie culturelle.

Utilisation courante selon les supports d’information et de communication.

Outil de création complémentaire à la pratique.

Recherche et documentation.

Fouille de textes et de données

La clarté ou la transparence demeure une problématique majeure, mais on aurait toutefois tort d’en faire l’unique facteur de réflexion. Réduire l’objectif de la consultation à la clarté pourrait mener à la fixation de dispositions dans la Loi sur le droit d’auteur, au détriment des titulaires de droits.

Le besoin de clarté et de transparence concerne d’abord le fonctionnement de l’IA et non l’application de la Loi. L’Union des écrivaines et des écrivains québécois (UNEQ) souligne le risque encouru par une uniformisation et une simplification des lois internationales qui pourraient mener à des exceptions menaçant les droits d’auteur.

En concertation avec d’autres associations du milieu culturel, l’UNEQ recommande déjà, dans le cadre de la révision de la Loi sur le droit d’auteur, que les questions de l’utilisation équitable et de l’exception à des fins pédagogiques soient prioritairement définies plus clairement. Les pratiques croissantes de FTD ne viennent qu’accroître la nécessité de mieux définir les exceptions intégrées à la Loi et d’en circonscrire la portée pour protéger les œuvres littéraires et artistiques.

La FTD soulève de nombreux enjeux qui découlent à la fois des intrants (les textes et les données utilisés par l’IA générative) et des extrants (les produits générés par l’IA). L’UNEQ recommande qu’un groupe de travail constitué d’experts soit mis sur pied afin de répondre aux enjeux légaux, éthiques et économiques que représentent l’IA générative pour les titulaires de droits. Une révision trop hâtive et précipitée de la Loi sur le droit d’auteur risquerait non seulement de contourner ces enjeux sans y répondre, mais plus encore, d’entraîner des problèmes supplémentaires et irréversibles.

L’UNEQ est d’avis que des licences de droits d’auteur doivent nécessairement être mises en place pour encadrer l’utilisation, par l’IA, des œuvres protégées par la Loi sur le droit d’auteur. L’analyse de structures déjà existantes en matière de traçabilité, d’autorisations, de mesures d’indemnisation ou de compensation en gestions collectives demeurent des pistes de solutions qui méritent d’être explorées et adaptées pour répondre aux différents enjeux liés aux FTD.

L’UNEQ est d’avis que les recommandations issues de l’examen parlementaire de la Loi, en 2021 - « élargir les objectifs autorisés en vertu de l’exception d’utilisation équitable pour inclure le FTD ; modifier l’exception d’utilisation équitable pour la rendre aussi ouverte que le fair use américain ; modifier l’exception relative aux reproductions temporaires pour procédés technologiques de la Loi pour couvrir le FTD ; et créer une nouvelle exception dédiée spécifiquement pour TDM » – constituent de réels dangers pour l’avenir de la création littéraire et artistique.

L’UNEQ tient à rappeler que les recommandations formulées par les titulaires de droits lors de cet examen demeurent tout aussi pertinentes, sinon plus, dans le contexte de la FTD : clarifier l’utilisation équitable, notamment dans le domaine de l’éducation ; augmenter le plafond des dommages-intérêts préétablis ; veiller à ce que le Canada respecte ses obligations découlant des traités internationaux ; promouvoir le fonctionnement d’une Commission du droit d’auteur efficace.

L’UNEQ recommande que les développeurs de systèmes d’IA tiennent des registres et déclarent quelles sont les œuvres protégées qui ont été utilisées pour la génération de contenu et pour nourrir leur système.

L’UNEQ invite le gouvernement à confier les recommandations et les ajustements de rémunération aux acteurs du milieu culturel. Une fixation de barèmes tarifaires et d’échelles d’indemnisation ne devrait pas relever des instances politiques, mais découler de négociations entre les titulaires de droits et les utilisateurs (ou de leurs représentants). Le rôle du gouvernement devrait se limiter à mettre en place l’obligation de rémunération et à s’assurer que les négociations se déroulent de bonne foi entre les parties prenantes.

Les approches étrangères en matière de FTD et de droits d’auteur peuvent certes inspirer la réflexion et l’analyse, mais « rappelons que le Canada, premier signataire de la Convention de 2005 de l’UNESCO pour la protection et la promotion de la diversité des expressions culturelles, a traditionnellement joué un rôle de leader en la matière. Il doit aujourd’hui poursuivre dans cette voie. »

(Commentaires de la Coalition pour la diversité des expressions culturelles dans le cadre de la Consultation sur l’élaboration d’un code de pratique canadien pour les systèmes d’intelligence artificielle générative. Présenté à Innovation, Sciences et Développement économique Canada, 14 septembre 2023).

Titularité et propriété des œuvres produites par l’IA

L’Union des écrivaines et des écrivains québécois (UNEQ) constate les incertitudes entourant la titularité et la propriété des contenus générés par l’IA.

Par souci de cohérence et d’efficience, elle recommande que la protection des contenus générés par l’IA soit soumise à des analyses distinctes où les circonstances propres à chaque production seront considérées. Il importe d’accompagner les créateur·rice·s qui souhaiteraient utiliser l’IA comme outil afin de permettre à notre culture de se développer et de rayonner au sein d’un cadre législatif qui s’inscrit à la fois dans une numératie actuelle et dans une pleine reconnaissance de la créativité humaine.

Pour le moment, l’UNEQ est d’avis que la Loi est suffisamment neutre pour être applicable à la détermination des œuvres protégées ou non par le droit d’auteur. Elle recommande néanmoins une étude approfondie des critères équilibrés entre une évaluation trop inclusive (automatisation des fonctions) et une évaluation trop restrictive (qui pourrait freiner les créateur·rice·s dans l’utilisation des technologies au profit de leurs pratiques).

Violation et responsabilité en matière d’IA

L’Union des écrivaines et des écrivains québécois (UNEQ) est très préoccupée par la gestion des litiges qui pourraient découler de l’utilisation non-autorisée et non-rémunérée des œuvres protégées par la Loi : les titulaires de droits devraient pouvoir se référer à des dispositions claires visant à protéger les créateur·rice·s.

Les obstacles sont nombreux et considérables : volume, rapidité, manque de transparence, traçabilité, critère de partie importante, etc. Mais la Loi sur le droit d’auteur est néanmoins adaptée à son objectif : aucune modification n’est nécessaire pour accommoder l’IA générative.

Considérant qu’un contenu généré par l’IA ne pourrait être protégé que s’il répond aux critères de la Loi sur le droit d’auteur, et que l’UNEQ considère que la protection du droit d’auteur ne devrait pas être étendue aux contenus générés par l’IA sans contribution humaine originale, aucune mesure supplémentaire ne devrait être nécessaire.

L’UNEQ est d’avis que le gouvernement réaffirme sa volonté de soutenir et d’encourager la création par un cadre de règlementation incluant les dispositions pénales de violation et des mesures d’aide pour résoudre les litiges face aux géants de l’IA.

Commentaires et suggestions

Depuis le lancement de la consultation du gouvernement canadien à propos du droit d’auteur et de l’intelligence artificielle (IA), l’Union des écrivaines et des écrivains québécois (UNEQ), comme plusieurs organismes culturels et associations professionnelles, travaille à faire entendre la voix des artistes et des titulaires de droits, dont les revendications s’opposent à celles des développeurs de systèmes d’IA. Faut-il rappeler que « les lois et les règlementations sur le droit d’auteur au Canada sont conçues pour assurer la reconnaissance des créateurs et autres détenteurs de droits d’auteur » et qu’« une protection efficace des droits d’auteur est essentielle à l’expression culturelle, à l’engagement des citoyens et à la croissance économique stimulée par l’essor de l’économie du savoir » ? En regard de ses propres engagements, nous mettons le gouvernement canadien en garde contre les dangers et les risques considérables que des modifications à la Loi pourraient entraîner.

**De l’utilisation d’œuvres protégées par le droit d’auteur aux fins de l’entraînement des systèmes d’IA**

Le secteur culturel, déjà affaibli par les exceptions intégrées à la Loi sur le droit d’auteur en 2012, demande au gouvernement de ne pas en élargir la portée et de n’ajouter aucune exception supplémentaire à l’égard de l’utilisation, par les systèmes d’IA générative, des œuvres protégées par la Loi. Plus encore, il importe de préciser et de circonscrire l’application des exceptions déjà existantes afin que des interprétations trop larges et erronées ne permettent aux développeurs et aux utilisateurs de s’en prévaloir et d’utiliser les œuvres protégées sans autorisation, ni rémunération. De l’avis de l’UNEQ, le respect du droit d’auteur ne freine en rien le développement technologique, au contraire : une création riche et diversifiée, reconnue et protégée par un cadre législatif efficient, ne peut qu’être bénéfique pour tout développement technologique, culturel et social.

** De la titularité et la propriété des droits en ce qui concerne le contenu produit par l’IA **

Les critères qui déterminent si une œuvre peut être protégée par la Loi sur le droit d’auteur sont déjà clairement énoncés et un contenu généré par un algorithme ne saurait y répondre. Prévoir un mécanisme d’analyse exhaustive dans le processus de génération de contenu, où l’artiste utilise accessoirement l’IA dans le cadre de sa propre création : oui. Modifier les critères de protection pour permettre à l’IA d’être considérée au même titre qu’un·e créateur·rice : non.

** De la responsabilité, particulièrement si le contenu produit par l’IA viole les droits d’auteur d’œuvres existantes **

La majorité des artistes qui pourraient être affectés par la violation de leurs droits d’auteurs sont loin de disposer des ressources requises à la résolution de litiges, surtout lorsque ceux-ci les opposent à des entreprises comme Microsoft, Google ou Amazon. Se présumant inattaquables, les géants garantissent actuellement à leurs utilisateurs une pleine indemnité en cas de plainte ou de poursuite pour violation de droits d’auteur. Loin de prétendre que la responsabilité d’une violation n’incombe pas aux développeurs, l’UNEQ est néanmoins d’avis que les utilisateurs devraient plutôt être sensibilisés et conscientisés. En ce sens, le gouvernement canadien devrait prévoir des dispositions pénales exemplaires et en garantir l’application en soutenant concrètement les artistes et l’industrie culturelle dans le cas de violations de droits d’auteur.

Universities Canada

Technical Evidence

Introduction

This submission constitutes Universities Canada’s perspective on the issues identified in the Government of Canada’s discussion paper entitled Consultation on Copyright in the Age of Generative Artificial Intelligence. On behalf of our member institutions, Universities Canada thanks the Government of Canada for its engagement on these important topics.

About Universities Canada

Universities Canada is the voice of Canada’s universities at home and abroad. As a membership organization, we represent 97 public and private not-for-profit Canadian universities.

Our members are home to copyright owners, creators, buyers, sellers and users. University teachers and researchers are the creators of most content used in the classroom. Maintaining a balanced approach to copyright is critical to nurturing a higher education ecosystem that delivers the highest quality education for students, making use of the most innovative and up-to-date materials and approaches, while also ensuring that copyright owners and creators are remunerated for their work.

Text and Data Mining

Recommendations

Universities Canada has consulted with our members and offer the following recommendations for the Government of Canada as it considers changes to the Copyright Act in light of generative artificial intelligence.

1. Do not amend the Copyright Act without substantial further consultation with stakeholders, including the post-secondary sector.

Canada’s universities urge the government to move slowly and deliberately with any changes to the Copyright Act. The current consultations are a productive first step at gathering evidence of the ways that the digital transformation of teaching and learning are impacting long-standing copyright principles. However, noting that the Copyright Act has an upcoming deadline for a legislated review, the government should not proceed with amendments until a more substantive review of the Act has occurred.

2. Do not make changes to the Copyright Act that would adversely impact the ability of students and teachers to access the best course materials.

Canada’s universities have championed a balanced approach to copyright that includes the essential role of fair dealing. Fair dealing is essential because it recognizes that users have rights to access content for purposes in the public interest, including education and research. It is important to emphasize that fair dealing is increasingly rare due to the changing nature of digital access to course materials and the associated licensing practices that university libraries employ. However, it remains a fundamental principle. Encroaching on the fair dealing rights of students and teachers would be harmful and would upend decades of jurisprudence on the spirit of copyright law.

3. Do not make changes to the Copyright Act that would harm Canada’s competitiveness in AI research and development.

Canada’s universities also urge the government to prioritize Canada’s international competitiveness when considering any changes to the Copyright Act targeting generative artificial intelligence. Universities Canada has heard concerns among AI researchers, librarians, and instructors that Canada’s current fair dealing and anti-circumvention exceptions may be insufficient to enable the development and deployment of generative AI for either commercial or non-commercial purposes. While some argue that text and data mining activities are covered by the fair dealing exception for research, others suggest that this is not self-evident to researchers who are not copyright experts. We note that Canadian courts have not yet had the opportunity to adjudicate this, and may well do so, following jurisprudential developments in the United States and elsewhere. There is also significant concern that license owners may utilize contract terms to override user rights that the Supreme Court of Canada has said are “essential parts” of the Copyright Act. We encourage the government to clarify that contracts cannot override user rights like fair dealing.

In other jurisdictions, like in Canada, members of the AI research community have advocated measures to limit legal uncertainty and avoid the risk of stifling AI research, particularly early-stage exploratory research where various datasets may be tested. In light of this, some jurisdictions, including the United Kingdom and Japan, have enumerated specific fair dealing and anti-circumvention exceptions for text and data mining to ensure the technology can flourish.

Though there is a strong foundation that underlies AI research in Canada, we risk falling behind as other countries work to reduce barriers to entry for innovation and attract talent. We therefore urge the government to find positive ways to support institutions in these efforts, and to avoid changes to the Copyright Act that would add to existing legal uncertainty or create a chilling effect on AI research.

Authorship and Ownership of Works Generated by AI

Canada’s universities and generative AI

Despite current challenges facing the research ecosystem, Canada has been a leader in AI research for decades. Years of sustained investment in fundamental science in Canada enabled the creation of artificial intelligence technologies that has led to the proliferation we are witnessing today. The Government of Canada should be proud of its accomplishments in funding AI research at a time when other funders scaled back investments in the face of significant obstacles, and it should continue to fund fundamental research as a core pillar of federal economic policy.

As the institutions on the front lines of the development of generative AI, Canada’s universities are leading in the deployment of these technologies as a tool for research, instruction, and scholarly communications. Universities are taking a range of approaches, developing best practices, creating forums for faculty to share experiences and insights, and empowering instructors to responsibly utilize generative AI in the classroom.

Some universities are deploying library services to assist faculty and students. Some, such as OCAD U through its Cultural Policy Hub, are creating networks across the country to lead on issues of bias, inclusion, decolonization, and other important considerations that can be significantly impacted by artificial intelligence. Still others have created new positions to explore the implications of AI, such as the first Chief AI Officer in Canada at the University of Western Ontario. All of these are examples of universities responding to the unique needs of their communities while working collaboratively to address shared challenges.

Use of AI as a tool for research and education

As autonomous institutions, universities are best positioned to lead according to their unique community needs. Those universities that choose to deploy AI as a tool for instruction are following the best practices that are currently established, but are rapidly changing. These range from establishing guidelines for the use of AI tools; proscribing some tools and encouraging others based on evaluations of the processes involved in their development, their dependability and usability, and other factors; and creating iterative feedback processes for faculty and staff to report and gain advice on the use of tools. Most of the tools in use today utilize large language models, like those that power OpenAI’s ChatGPT.

Universities must stay abreast of the latest developments in AI technologies to ensure their responsible use. Members of university communities have various concerns about the full adoption of AI in the classroom. Some fear that issues around racial and gender bias in large language models have not been adequately addressed, while others have expressed concern about academic integrity. However, there is wide consensus that universities must embrace AI within certain parameters, and that it would be premature for the federal government to pre-empt these decisions. As the technology develops, it is important that the government not take steps that unduly restrict innovation in education.

Generative AI as a research and economic enterprise

Canada competes for research talent on the global stage, and there is significant concern among universities that efforts to restrict the development of large language models relying on fair dealing rights for research, or to impose new technological protection measures that prevent text and data mining processes, would hamper Canada’s competitiveness for top research talent in an area of science that is transforming global society. Canada cannot afford to lose out on this knowledge or on access to this talent.

These concerns are part of why the United States, the European Union, the United Kingdom, and Japan have all embraced varying degrees of fair dealing and anti-circumvention exceptions for text and data mining activities. Canada’s universities recommend that the government carefully study the approaches in these jurisdictions to better understand how such changes could impact Canada’s competitiveness.

Infringement and Liability regarding AI

Authorship, ownership, and infringement

A long-standing principle underlying copyright is that authorship requires creative human origin. Extending the idea of authorship to works generated entirely by artificial intelligence would be a departure from that principle. At the same time, the principle of technological neutrality is also fundamental, and treating works that are generated with AI assistance but with ultimate human origin differently than works generated using other technological tools may also be problematic under that principle. In any case, we believe it is too early in the development of this technology for the government to make definitive judgements about authorship and ownership.

In cases of genuine copyright infringement, the Copyright Act is sufficient as it stands to ensure that creators are properly compensated and liability for infringement is properly assigned. However, it is important to remember that the likelihood of genuine infringement occurring as a result of text and data mining is low, and efforts to restrict TDM to avoid infringement would have negative unintended consequence for Canada’s competitiveness.

Comments and Suggestions

Conclusion

Canada’s universities thank the government for their attention to this important issue. We also urge the government to avoid taking steps that would harm Canada’s international competitiveness and unfairly restrict the research enterprise in Canada. We ask that the government commit to a deeper level of engagement with affected parties before pursuing any amendments to the Copyright Act. We also ask that the government ensure any measures it does adopt do not adversely impact the user rights of students and teachers by preserving the Copyright Act’s fair dealing exceptions.

University of Toronto Libraries (UTL)

Technical Evidence

N/A

Text and Data Mining

The University of Toronto Libraries (UTL) thanks ISED for the opportunity to share our experiences and comments for this consultation on copyright and generative AI. The University of Toronto (U of T) is consistently ranked among the top 10 public universities worldwide. U of T is one of the world’s most highly regarded centres for advanced research at the master’s and doctoral level and is home to some of the world’s most talented thinkers, inventors, innovators, and educators, who are advancing knowledge and making critical discoveries for a healthier, more sustainable, prosperous, and secure future. UTL is at the heart of this research enterprise; we build and sustain Canada’s most comprehensive research collection, which serves as the raw material for research, a safe and trusted repository for scholarly outputs, and contributes to the preservation of Canada’s cultural and historical record. UTL spends over $28 Million CAD on licensed electronic materials and provides text and data mining (TDM) services, which includes consultation, instruction, the provision of data, and technical support.

UTL is in the early days of understanding how AI systems will be used for teaching and research. However, instructors and researchers have experienced long-standing uncertainty regarding the extent to which exceptions in the Copyright Act can be applied to TDM activities. Because TDM is a critical building block for machine and deep learning, how the Copyright Act addresses TDM impacts AI-related research as well. Uncertainty about how the Copyright Act applies to TDM creates confusion, chills research activities, and potentially risks discouraging use and exploration and therefore potentially limit researchers’ ability to better understand AI through machine and deep learning.

This uncertainty means that rightsholders hold the power to determine how researchers can access and use content. Unless clear terms allow for TDM, researchers may require permission from the rightsholder. Obtaining permissions to conduct TDM from a rightsholder can be an incredibly lengthy process, if it can be obtained at all, leaving research at a standstill and the status of projects uncertain.

Permission to conduct TDM with library-licensed content alone is insufficient to practically enable many research projects. Many terms of use introduce technical limitations that stymie access to usable content such as setting very low thresholds for the number of downloads per minute or prohibiting systematic or automated downloading. In some cases, a publisher or rightsholder will provide API access to the content to facilitate downloading, but some charge extra costs for API access, even if it has already been licensed. Additionally, license terms may also limit access timeframes or require destruction of content at the end of an agreement period as well as determining if and how researchers can report their findings by limiting how much data can be published and whether it must be published in a derivative form. These technical and retention limitations can complicate or impede research projects and dissemination. Even if TDM is permitted, it could nonetheless be impossible to use the content in an AI system, since there is a growing number of publishers including license terms that restrict the use of content in AI systems, limiting the types of research that can be conducted with paid-for content.

There is evidence of growing frustration and confusion among researchers working in international research groups about copyright provisions around TDM and possible jurisdictional collisions. In “Legal Literacies for Text and Data Mining – Cross-Border (“LLTDM-X"): White Paper” (2023), Rachael Samberg et al describe widespread confusion among researchers in the United States when working in international teams. Their research finds that many practitioners felt copyright concerns blocked their projects, despite fair use provisions, and overlooked the impact of contractual law on their research projects and ability to share content with international collaborators (page 12). Some researchers are confused by different affordances. Recently, a researcher contacted UTL about performing TDM on a licensed eResource, hoping to build on research conducted by European colleagues who performed non-consumptive textual analysis of public web content. Since the website’s terms of use indicated they could not scrape the content, they turned to the library to find a way to legally move ahead with this research. Despite UTL having paid for access to this content, contractual terms restricted TDM, requiring additional written permission from the publisher to perform this activity. This publisher has so far been unresponsive. At this point in time, this researcher is unable to move ahead with their work.

Recommendations:

This example highlights issues and constraints in Canada’s current copyright framework that the library community has been vocal about for many years. UTL strongly believes that authors should be appropriately compensated for the use of their work, however, these updates are necessary to help maintain the balance between creators and the larger public interest, affording Canadian researchers the same advantages their peers hold. Licensing is not an appropriate solution for TDM; it is simply not practical to seek permission for the sheer quantity of content that is often needed for TDM methodologies.

UTL endorses the recommendations outlined in the Canadian Association of Research Libraries (CARL) submission to the Consultation on Copyright in the Age of Generative AI (2023). These include:

Revising the fair dealing exception (Section 29) to make purposes illustrative, rather than exhaustive. This will provide the clarity needed for the educational community and industry to move ahead more confidently with their work. This is a model used in the US, where they have a much more solid legal basis for non-consumptive research using the fair use exception.

The addition of a new provision that clarifies no exception afforded to a user in the Act can be overridden by a contract. This would ensure that users are able to fully utilize the rights granted to them in the Act, helping to create a more balanced environment.

The addition of a provision that permits the circumvention of technological protection measures for non-infringing purposes.

Authorship and Ownership of Works Generated by AI

Given the ramifications that copyright protection on autonomously created work with little human intervention would have on creative industries, it is recommended that the IP Scholars’ Joint Submission to the Canadian Government Consultation (September 26, 2021) is further consulted. This submission recommends that copyright protection only subsist in the work of a human.

Developing best practices, as the US Copyright Office has done in their recently published Copyright Registration Guidance for Works Containing AI-Generated Materials, is recommended. While still undergoing public consultation, this evolving guidance sets parameters for disclosure of the use of AI-generated content and thresholds for when works will not be offered copyright protection.

Infringement and Liability regarding AI

As the Copyright Act already addresses infringement and liability, allowing time to consider issues that will eventually emerge and other policies that may be more appropriately suited for AI systems is prudent.

Comments and Suggestions

N/A

University of Waterloo

Technical Evidence

The University of Waterloo Context

We appreciate this opportunity to share our experiences and views on the current challenges and opportunities presented by copyright and generative artificial intelligence. As a premier comprehensive research institution in Canada, the University of Waterloo is committed to supporting inquiry through research and learning. The University enrolls more than 35,000 students across its six faculties and is home to the world’s largest co-operative education system of its kind.

The University of Waterloo has consistently been the top comprehensive research university and has also been recognized as Canada’s most innovative university for over a quarter of a century. The University’s uniquely entrepreneurial culture encourages experimentation and innovation. This culture is supported by the University mission to advance learning and knowledge through teaching, research, and scholarship, nationally and internationally, in an environment of free expression and inquiry.

In order to achieve our mission, balanced copyright law is a necessary component for success. The University of Waterloo has demonstrated a balanced approach through its IP policy where in many cases the author is the owner of copyright in works created through research and teaching.

We urge you to use this opportunity to reinforce the foundation of the Copyright Act to facilitate the increase of access to information, the advancement of knowledge, and the continued technological growth of Canadian society.

Development of AI systems

The way AI systems are used and designed are heavily context dependent. At the University of Waterloo, researchers approach development of AI systems from a wide variety of lenses using approaches based on their discipline, research question, and the availability of data. Research on AI takes place across our campus, from Computer Science to Economics, from Engineering to English Language and Literature.

Comfort with copyright risk and/or ability to purchase permission for use heavily influences the kinds of systems that can be developed and by whom. Researchers might choose a method of sourcing content based on the kind and/or focus of the system being designed, the amount of funding dedicated to the project, and/or the kind of hardware or software they have available to them. Given copyright limitations and variance between national copyright laws, some researchers will even limit use of works based on the countries in which their collaborators reside.

When accessing information for training datasets, some researchers rely on web scraping to gather publicly available information. Others rely solely on information that is openly licensed (for example, through Creative Commons) and in the public domain. Others yet are able to rely on information licensed by the institution for text and data mining purposes. Those who rely on open, public domain, or institutionally licensed content have necessarily smaller training data sets which limit the capabilities/outputs of the final system. For example, a system built on only public domain data would not be able to surface information about COVID-19. Those who rely on institutionally licenced data are limited by the licence conditions, which may allow training of a system, but not allow generative output of more than a few hundred words.

How this data is used will differ depending on the kind of system that is being offered. For example, a researcher designing a generative system, will use the underlying data to train the system such that it can find patterns and create output, like new images, text or code. A researcher designing a recommender or classifier systems would use the underlying data to train the system such that it could make a recommendation or provide a classification for a user. The generative system faces copyright challenges in the input and output stages, where the recommender/classifier system faces copyright issues mainly with input.

Approaches to changing the Copyright Act should be done with an understanding that there are a wide range of users and creators of AI systems with a diverse range of use cases.

Use of AI generated content

In day-to-day operations, the University is exploring how generative-AI can be used to improve service and processes. For example, staff are testing out AI tools capacities for assisting with basic writing and evaluation tasks and for deploying chat bots to provide frontline user support.

The University is only beginning to understand how use of generative AI will impact the way instructors teach and students learn. As with many other institutions, the University is taking a varied approach to the use of AI by students, encouraging instructors to be clear with students about their class policy. There are several instructors who are actively engaging with AI services in the classroom, even incorporating it into assessments. Many support staff have spent a great deal of time working to clarify what AI systems can and cannot do, when they can be used in a pedagogically sound way, and how instructors can maintain the integrity of their courses. The University has also provided clarity around copyright related risk for using of AI-generated content in teaching; guiding instructors to make risk-informed use of content when connected to learning outcomes.

Concerns about the nature of the tools available and their capacity for infringement have limited uptake in some areas. Changes to the Copyright Act that address AI systems potential for infringement and user liability would increase confidence in use.

Text and Data Mining

TDM Activities and factors

Informal observation suggests that research involving TDM is being conducted across many Canadian universities, including the University of Waterloo. It appears that researchers take several approaches to do so, including relying on the fair dealing exception, through licensing agreements, or web-scraping information from the publicly-accessible Web. How researchers engage with these projects seems to be influenced by what is understood as the common practice in their field; for example, researchers may look to widely used practices for web-scraping used in the US and EU. Note that in each of these cases, researchers must engage in a careful risk analysis regarding their planned use and any potential conflict with contracts from various providers. In many cases, the researcher could make a good case for TDM under the Copyright Act, but many institutional licenses or website terms of use would prohibit them from further engaging in their research. In addition, given the scale of data collection needed, analyzing terms of use for each website to determine whether each site could be used would not be feasible.

To provide clarity for AI developers and users and enable Canada to be competitive in AI research and development, we recommend that the fair dealing exception is expanded to be illustrative (i.e. through the use of “such as” language) and extended to include informational analysis. Fair dealing is currently limited to the eight enumerated purposes in the Act. Although the Supreme Court has encouraged a large and liberal interpretation of these purposes, it would provide greater confidence for researchers developing AI systems for illustrative language and the specific purpose of informational analysis to be added. Explicit inclusion of informational analysis would provide clarity and would reduce complications in using fair dealing for TDM research; rather than guessing that TDM might fit under a research purpose, researchers would be confident their work would fit under informational analysis. Illustrative language would have the effect of legislating the Supreme Court’s guidance on large and liberal interpretation of the purposes. The use of the works for this purpose would have all the safeguards of fair dealing, requiring the use to be fair and tested against the six-factor test outlined by the Supreme Court in CCH Canadian Ltd. v. Law Society of Upper Canada, 2004 SCC 13 (CanLII). While commercial uses would not necessarily be precluded, the six-factor test would act as a safeguard against unchecked commercial usage. A precursor to using fair dealing also requires that the content is accessed legally, another safeguard. This expansion of the fair dealing exception would be most effective if accompanied by the addition of a contract override provision and revision to allow circumvention of technological protection measures (TPMs) for non-infringing purposes.

Many uses of content are limited by contracts of various kinds – whether that be the terms of use of a website, or of a streaming platform, or an institutional license (e.g., the contracts libraries may sign to gain access to databases). Contract override ensures that users are not prevented from accessing their user rights under the Act by signing away rights via contract. Note that contract override does not mean that users do not have to pay for access, rather it means that once they have access all users have the same rights to reuse. In this way, contract override works to ensure a more democratic ability to use information and respects the technological neutrality goal of the Act. This would mean that the ability to exercise one’s user rights under the Copyright Act would not be connected to negotiating power. Currently, large companies or user groups with more resources would be more likely to be able to negotiate a licence with favourable terms than a single person, who might be saddled with the standard terms and conditions of a website with no option to negotiate. With contract override, this isn’t an issue. We would encourage the Government to explore the contract override provision in Ireland’s Copyright and Related Rights Act, 2000, section 2 (10).

The current TPM language in the Act does not allow users make non-infringing uses of a work (aside from limited carve outs, e.g., accessibility uses) when a TPM is in place. This means that users are denied the ability to exercise their user rights for content that is digitally locked down. This violates principles of technological neutrality; for example, a user may be able to copy a short excerpt from a print book under fair dealing, but not the same title in eBook form. In terms of AI usage, if a user wanted to incorporate video content stored on DVDs in a training dataset, they might be limited by digital rights management on the DVD from copying it off of the original medium. Current language would not allow a user to break the digital lock to use the content for fair dealing purposes. We recommend that the TPMs section in the Act is revised to allow users to exercise user rights (e.g., fair dealing) regardless of TPMs.

A multifaceted approach would enable development of AI while providing opportunities for creators to be compensated where appropriate.

Licenses available

The University of Waterloo Library licenses information from a wide variety of publishers. The bulk of our licenses are with foreign information providers, which influence the copyright law referenced in our contracts and our ability to use Copyright Act exceptions to use this content. Only 14% of our licences permit any kind of TDM. Of those that do allow TDM, most come with heavy restrictions on reuse of the data that would make incorporation into AI systems impractical, such as restrictions on the number of words that can be included in any published extract. Some of the publishers that do not allow TDM as part of their standard license will offer it as an added service, but often at prices researchers consider prohibitive. Traditionally, TDM allowances in Library licenses have been directed at enabling research looking for patterns within a corpus to answer a specific research question; these patterns may have been found programmatically, but not through the building of an AI system or service. In recent months, publishers have started to introduce new clauses in our contracts that aim to forestall use of content as training material for AI systems, further clarifying their position that the TDM clauses in these licenses were not designed with AI in mind.

Obligations for AI systems

The University understands AI development from the perspective of researchers and learners as developers of AI systems. To try to strike a balance between the requirement for attribution to respect the moral rights of the creators and the complexity of AI systems, the government could use language similar to the non-commercial user-generated content exception (s. 29.21). As provided in section 29.21(b) the user of a work is required to mention the source where it is reasonable to do so. Using similar language when developing provisions for AI systems would encourage attribution while being flexible. That said, there are two sets of issues with AI systems related to copyright and attribution – input and output. Regarding input, a compromise between traditional respect of moral rights (attribution) and feasibility in an AI development environment would be for developers to maintain records of what content was used to train their systems. This is to say that developers should be able to tell users where content was sourced and to understand that it is extremely difficult, if not impossible in many cases for developers to identify the copyright owner and to provide a list of copyright-protected content used to train a system. While a great deal of content is used and it is a time-consuming task to maintain a list of sources, this requirement could help creators understand where their works are being used. Regarding output, at this early stage in the development of AI systems, it seems likely that generated outputs contain only insubstantial amounts of the training material; such tiny amounts of content that the Copyright Act does not require attribution or permission.

Authorship and Ownership of Works Generated by AI

We recommend that the status quo be maintained. The current requirements for copyright protection, namely originality, fixation, and the exercise of skill and judgement work well. We understand these requirements to mean that content generated by AI is not protected unless an original work is created with the sufficient addition of human skill and judgement. We support the approach taken by the United States Copyright Office in their document, Copyright registration guidance: Works containing material generated by artificial intelligence, and recommend the work of Dr. Carys Craig, The AI-Copyright challenge: Tech-neutrality, authorship, and the public domain as two slightly different but interconnected ways to approach this issue.

Infringement and Liability regarding AI

At present, the way that many AI systems are trained relies on encoding content, changing bits of text into numbers that relate to each other (Stephen Wolfram Writings article What is ChatGPT doing and why does it work may be helpful for understanding this). When a user enters a prompt, the system looks to the training data for patterns in those small bits to generate new content. Currently, those small bits of information are not encoded with attribution information and so the resulting system is not able to provide the end-user with information about the source material. This has implications for testing infringement and for considering liability. Regarding infringement, the end-user does not know what sources are used to generate the content, and so is limited in their ability to understand whether permission is needed. They could compare the result to existing content if they’d started with a reference, but lacking that, they may have no clue if something was infringing, especially given the breadth and depth of the training models. Regarding liability, we again face the issue of dual copyright concerns related to input and output. When it comes to outputs, things are more complicated, and depend on the promises made by developers to users. If a developer promises that their content is copyright-cleared, the user would have a reasonable expectation to be able to use the service without issue. Both infringement and liability would be better addressed by reducing risk of infringement through the expansion of the fair dealing exception and addition of a contract override provision.

Comments and Suggestions

Copyright is just one tool in the toolbox for providing appropriate controls on use of content in AI systems. A multifaceted, nuanced approach will be necessary to enable AI to be an opportunity rather than a hindrance to society. Changes to law, policy, and regulations should be done with an eye to equitable access and use of this new technology across our society.

V

Vector Institute for Artificial Intelligence

Technical Evidence

How does your organization access and collect copyright-protected content, and encode it in training datasets?

Vector Institute (“Vector”) has worked with its legal counsel to develop a decision tool consisting of a grid showing alignment of common licenses and their terms. This tool enables the review of licenses associated with data that Vector AI practitioners would like to use for training AI models, and also enables the coordination of terms to license products that may contain other copyright-protected content.

This tool ensures that Vector meets the terms of the original license(s), and flows those terms forward in the new license by clearly demonstrating which initial datasets can be used in the final product based on the original license terms.

Vector works with staff to ensure only original data with license terms that align with the usage and final product license are used.

As an example of how Vector builds a training dataset, Vector’s AI Engineering workstream has created an AI and machine learning (ML) training dataset composed of a number of open-source datasets. For this training dataset, Vector used a Creative Commons License and only included open-source datasets with existing licenses that fit within the Creative Commons License that was chosen for the finished product. In doing so, Vector avoided contaminating the resulting training dataset with the terms and conditions of a more restrictive open-source license.

Vector also ensures that original input datasets are acknowledged. This is done by Vector staff when the final product is published, by entering the acknowledgement information in the Creative Commons License.

Vector collects and processes diverse datasets from various sources, including closed-source and publicly available texts, under strict adherence to legal and ethical standards, to train and continually improve AI models. These standards can be specific to each dataset.

In collecting data, Vector staff navigates relevant copyright laws and regulations, among other considerations for appropriate access to the data, which are specific to each dataset.

This often involves using content that is publicly available, licensed for use, or that falls under fair use exemptions, so as to avoid complications and administrative burden associated with more restricted datasets since the economic and resource obligations of doing so do not make sense from an ROI perspective.

Vector also considers the ethics and potential impact of its data collection and utilization practices on data creators and rights holders, in addition to its legal obligations, as a reputational risk.

For closed-source data, Vector employs a rigorous approach to ensure compliance with data provider and/or legal requirements, and ethical integrity (e.g., Research Ethics Board [REB] approvals).

This approach includes obtaining explicit permissions or licenses from data owners, and ensuring that the use of the data aligns with the specific terms and conditions set forth by the providers.

In handling closed-source data, Vector staff is especially cautious about privacy, intellectual property rights, and any contractual obligations associated with use of the data for AI inquiry since there is no standard model of terms and conditions associated with such datasets, and the risks associated with holding them are far greater than using open-source datasets.

How does your organization use training datasets to develop AI systems?

The process of acquiring and utilizing training datasets for AI systems is multi-faceted and rigorous:

- Vector staff begins with the ethical and legal acquisition of data; this includes REB reviews where appropriate, and obtaining necessary licenses

- Data are then processed to prepare for AI applications; this includes cleaning, curating, and categorizing the data (NB: de-identification and verification of the same is a necessary condition for transfer of data into Vector data and analytics environments)

- Processed data are then used to train AI models through an iterative learning process, where the models progressively improve in recognizing patterns and analyzing data

- The models are evaluated and refined for enhanced accuracy and functionality using an iterative process, and once they achieve satisfactory performance, they are deployed for practical applications (NB: the models themselves are deployed, but the corresponding training data are not released if they are under any restriction)

- Regular updates with new data or model improvements are also part of maintaining the effectiveness and relevance of AI systems

Some of Vector’s work is done in collaboration with data providers; for these projects, Vector implements Data Transfer Agreements that include specific terms and dictate how each party’s contributions are recognized

In your area of knowledge or organization, what measures are taken to mitigate liability risks regarding AI-generated content infringing existing copyright-protected works?

Dataset Curation and Compliance: As above, Vector’s approach includes careful curation of training datasets, and ensuring alignment with copyright laws; where private data are used for model training, Vector implements agreements providing explicit approval and/or appropriate licenses for its use, favouring the Creative Commons License or open-source datasets with similarly equivalent permissive terms and conditions.

Regular Model Updates and Training: Vector updates and re-trains its AI models to enhance their quality, and ensures adherence to copyright regulations in that process - particularly for generative models - carefully monitoring to ensure that there is no copyright violation.

Guidelines for Users: Vector provides guidelines for the responsible and lawful use of AI-generated products; users are required to acknowledge these terms through a data user agreement for training data and through Vector’s Code of Conduct, which includes trust and safety principles, and adhere to these guidelines when utilizing Vector’s AI tools.

Consultation with Legal and Ethics Experts: Vector engages with legal and ethics experts to remain compliant with the evolving landscape of copyright laws and ethical standards in AI, both as a matter of due diligence and as part of its mission to work together with its AI partners in other parts of Canada to ensure that they have the people, skills, and resources to be best in class at the use of AI.

In your area of knowledge or organization, what is the involvement of humans in the development of AI systems?

Vector leverages algorithmic techniques and AI models guided by human expertise in research, data curation, ethical oversight, and quality control.

Vector AI practitioners are conducting leading-edge research on mitigating bias in data and models to ensure high quality, trustworthy, and safe outputs.

Data generated from individuals (Personal Information, Personal Health Information) may be used for projects. Therefore, to mitigate associated risks of doing so, Vector requires that such data be de-identified before being transferred into Vector’s data and analytics environments, and confirms this with the data provider prior to transfer.

How do businesses and consumers use AI systems and AI-assisted and AI-generated content in your area of knowledge, work, or organization?

Vector’s AI solutions are applied by its industry sponsors, institutional partners, and small- and medium-sized enterprises across domains such as health, financial services, and manufacturing, to support functions such as customer support, analytics and insights, personalized training, and research and development.

While automation and generative AI tools can offer efficiency and scalability, Vector emphasizes human-centred, carefully controlled processes to maintain high standards of accuracy, innovation, and ethical compliance.

Text and Data Mining

What would more clarity around copyright and text and data mining (TDM) in Canada mean for the AI industry and the creative industry?

More clarity, and specifically definitions and limitations on liability related to the use of training data for generative AI, could foster a more collaborative, innovative, and legally secure environment for both the AI and creative industries. Such definitions and limitations must be at least as permissive and clear as Canada’s closest major competing AI jurisdictions: the US. There will be a public policy push to balance the interests of technology advancement with the protection of intellectual property, carving out appropriate exceptions where overly burdensome requirements for protecting IP will stifle innovation and discovery. However, if the result of that exercise results in a regime that is not competitive with the US, Canada’s closest competitor that has greater funding, greater market access, and greater VP capacity, it will result in a weakening of Canada’s international standing in AI.

Are TDM activities being conducted in Canada? Why or why not?

TDM activities are being conducted in Canada, with their application spanning numerous sectors including academic research, market analysis, and biomedical research, as well as in the development of AI/ML models. While TDM is a routine practice in domains like social media and natural language processing (NLP), its use in the health sector is comparatively less frequent, particularly when dealing with patient data.

Are rights holders facing challenges in licensing their works for TDM activities? If so, what is the nature and extent of those challenges?

Designing fair and clear licensing agreements for TDM is complex, as these must address factors like scope of use, duration, content type, intellectual property, and compensation, which are concepts that - in their current forms - do not always map neatly onto AI/ML use cases. The mismatch originates from the idea that traditional processes require that a hypothesis be matched to a dataset in order to receive approval: the researcher has a signal that they are looking for or a problem they are trying to solve. These processes are less effective when mining data using ML techniques since the researcher may not know the pattern they are looking for or the problem that they are trying to solve at the starting point.

The evolving legal framework for TDM adds uncertainty, especially with varying laws across jurisdictions, leading to concerns among rights holders about the legal implications of licensing their works; companies may be inclined to move to a jurisdiction with clearer limits on definitions and liability, even if the jurisdiction is less competitive in terms of AI talent generation needed to expand the company.

There is also apprehension regarding potential misinterpretation, misuse, or reputational damage from how the extracted data might be used, but this reputational damage occurs after-the-fact in the case where an unclear definition of liability may have caused an entity to take on a risk that was unclear or, in reverse, make a risk-managed decision that is subsequently found to be problematic after-the-fact because of unclear definitions.

What kind of copyright licenses for TDM activities are available, and do these licenses meet the needs of those conducting TDM activities?

- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0): This license allows free use, sharing, and modification of the work, provided that: the original creator is credited; it is not used for commercial purposes without permission; and any adaptations are distributed under the same license.

Creative Commons Attribution 4.0 International (CC BY 4.0): This license permits the distribution, remixing, adaptation, and building upon existing work, even commercially, as long as the original creator is properly attributed. Users are free to use, share, and modify the work with credit to the author.

- Creative Commons Attribution-ShareAlike 3.0 License: Users are free to use, share, and adapt the licensed work, with the condition of attributing the original creator and distributing any modifications under the same license.

- Creative Commons Zero (CC0): This tool allows creators to place their works into the public domain, removing copyright restrictions. It enables unrestricted use, modification, and distribution of the work without requiring attribution or imposing any conditions.

- GNU General Public License (GPL): This free software license allows users to use, modify, and distribute software, while protecting software freedom and the rights of users. Version 3.0 (GPLv3) is the most recent version of this license.

- Creative Commons Zero 1.0 Universal Public Domain Dedication (CC0 1.0): This public domain dedication tool allows creators to waive their copyrights, dedicating their works to the global public domain for unrestricted use, modification, and distribution without needing attribution.

If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would be the expected impact of such an exception on your industry and activities?

Amendments should focus on defining the types of data (e.g., health, social media, etc.) that can be mined, appropriate purposes for TDM, and should ensure robust safeguards for privacy and intellectual property that balance the big data needs for training high quality, accurate models, with the protection of individual and corporate rights. This may require new business models for how content creators license their outputs and data.

Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?

Yes, AI developers should keep and disclose records of copyright-protected content used in training AI systems to ensure transparency and adherence to copyright laws, support ethical practices, build trust, facilitate accountability, and contribute to research and development in the field; however, this requirement needs to be balanced with protecting trade secrets and privacy. It is worth noting that while records may be generated automatically, the scale of datasets used to train LLMs and other large models is such that those records are likely much larger than could ever be recorded or reviewed by a human.

What level of remuneration would be appropriate for the use of a given work in TDM activities?

The following factors should be considered for the use of a given work in TDM activities:

- The type and significance of the work being used (academic or research-oriented works might have different valuation compared to commercial or proprietary content);

- Whether the outputs of the activity will be used commercially (e.g., commercial uses typically necessitate higher remuneration than non-commercial, educational, or research purposes) - the prevailing market rates in competing jurisdictions for similar types of content and TDM activities should be applied, ensuring fairness and competitiveness; and

- Pre-existing agreements or standard industry practices around licensing content for TDM.

Authorship and Ownership of Works Generated by AI

Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

Businesses and content creators are hesitant because of the uncertainty surrounding ownership rights and the potential for monetization of AI-generated content, especially in the legal and intellectual property domains. Clarity is necessary for AI development, and there are ethical questions about giving credit to AI vs human creators, which can hinder collaboration and data sharing. Users may be reluctant to interact with AI-generated content whose legal status is unclear, which has an adverse effect on consumer trust and market dynamics.

Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

If this exercise results in a regime that is not competitive with the US, Canada’s closest competitor that has greater funding, greater market access, and greater VP capacity, it will result in a weakening of Canada’s international standing in AI.

Infringement and Liability regarding AI

What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

The details of training commercial models are typically restricted, and the trend is toward even more closed systems. The market environment is competitive, and model providers will resist releasing details on training data composition. Moreover, the move towards custom models (fine-tuning or retrieval augmented generation) means that there are many more models proliferating that are created by small teams without oversight.

When commercializing AI applications, what measures are businesses taking to mitigate risks of liability for infringing AI-generated works?

More businesses are turning towards local or self-hosted models, some of which are trained on base models with transparent training processes.

Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

The most recent development of international interest was a commitment from OpenAI on November 9, 2023 that it will cover the legal costs of users who are sued for copyright and now has a regime in place to provide such guarantees. To the extent that the US legal regime and the company’s position allows it to make this guarantee, one could speculate that any company/jurisdiction not able to provide a competitive/equivalent statement will lose ground to companies and jurisdictions that are not able to do so.

Comments and Suggestions

N/A

Verto Health

Technical Evidence

For training of datasets we use industry specific datasets available in the public domain, or internally produced synthetic data based on statistical reprfesentations of data attributes.

Training data sets are used to train ML algorithms. We might also extract subsets of data to create test data sets that are used to validate the trained ML algorithms. The content of these extracts support the development of net new data that does not belong to a single entity; in many cases this training data can be randomly generated to approximate real-world data.

To mitigate any liability risks related to infringement of data used for training or validation, we respect all licensing, use and access policies required by the owner/publisher of the training data sets. For example, we would complete registration and/or credentialing requirements for access and use of the data set. Data sets are used for training ML algorithms. Data is not embedded in our ML algorithms. Data may also be used to validate the trained algorithms. For internally generated synthetic data, we take precautions to ensure that the data in our algorithm (preventing leakage and potential for infringement of our data by others) nor do we publish our internally produced data sets.

Further we require that humans be involved in all steps of the coding/creation, training and validation of ML algorithms. Humans must create the bulk of a work product (source code, ML algorithm, script etc.)

Within our organization, we are using generative AI in limited ways to a) generate and/or optimize source code b) support general business productivity activities, such as generating base templates for agendas, synthesizing internally generated notes or documents to extract themes or actions, identify industry/common responsibilities for specific role descriptions or trivial activities such as suggesting ice breaker activities.

With respect to the use of generative AI for source code, our use is restricted to a) generation of non-core source code and/or code that is easily replaced b) generation of small snippets of code only; i.e. the generated code cannot form a substantial part and/or the full body of the code c) forming the basis for a section of code. In all cases, humans must review and integrate generated code and must create the majority of the content and/or any content that is novel, core or unique.

Within our industry domain (healthcare), AI is increasingly used to analyze, identify and/or predict patient and population care, outcomes or clinical recommendations. AI is also used to synthesize and/or codify demographic, clinical and other information. ML algorithms can be very specific to condition or clinical workflow. In our organization we are using AI to identify patient and population clinical pathways, to identify population cohorts and clusters and to identify and predict patients or populations deviating from pathways that are based on clinical concepts. To-date, the focused approach to these algorithms is a probabilistic one as opposed to generative. In the future, there may be an opportunity for generative tasks but that is not the case to-date.

Text and Data Mining

Clarity around copyright and TDM would include a plain language explanation of the Act and implications for TDM with industry specific examples. Clarity would also include definitions of authorship, where and when use of generative AI in TDM constitutes infringement and clarity on liability and protection strategies.

With respect to TDM activities, we are proceeding with caution. Internal guidance includes requesting citations and references with any query; separately validating any information returned; identifying and respecting any licensing, copyright or other ownership requirements. We are restricting use of ChatGPT to closed versions, available only to our organization and only for certain activities (described in prior section). We have an acceptable use policy which will undergo legal review to ensure we are considering and safeguarding infringement, liability and protections of any IP or Copyright (ours or others).

With respect to keeping records of use, we require that generated AI be validated (references, citations, licenses) where possible and that the work be properly attributed. With respect to use of data sets, we require that our internal users abide by registration, certification and use policies for access and use of those data sets. As we move into using generative AI for generation of code snippets and/or optimization of code, we require that the sections of source code that were generated are tagged/identified within the source code.

With respect to amendments to the Act to clarify the scope of permissible TDM activities, we do not have sufficient expertise to comment on how the Act should be amended (scope and safeguards) and/or the expected impact of such an exception on your industry and activities.

At present we do not have expertise or sufficient experience to comment on renumeration or approaches from other jurisdictions that could inform policy for Canada.

In areas where we lack expertise we are responding by limiting or restricting use, proceeding within known constraints and best practices and seeking legal counsel for clarification and guidance.

Authorship and Ownership of Works Generated by AI

We are proceeding with caution on the use of AI and seeking legal guidance for appropriate use and to better understand terms and conditions for generative AI tools with respect to liability, infringement and protection.

When using generative AI we have found that it is not always easy to identify attribution and/or ownership. Citations/references returned by generative AI do not always correlate to actual sources of information.

Infringement and Liability regarding AI

We do have concerns related to the potential for infringement of copyright. With respect to infringement of our copyrighted works by others we have varying degrees of protection including operational agreements (NDAs, employee agreements etc.), information classification and use and disclosure policies, and maintaining privacy and control over works such as source code. Regardless, as the use of generative AI evolves, it is unclear how publicly available information could be incorporated into works being authored outside of our organization and how we would be aware of such infringement and/or detect it.

With respect to infringement on works authored by others, we have spoken to the fact that it is difficult to attribute works of content generated by AI. As such we are very careful about how we use AI to protect ourselves.

With respect to commercialization of our AI applications, we mark our source code and other materials with Copyright, we have agreements in place (internal and external) and we also develop and seek patent protection for novel works.

Similar to prior comments, plain language guidance that is industry specific would be helpful.

We do not have sufficient expertise to comment on how approaches in other jurisdictions can inform a Canadian policy.

Comments and Suggestions

As a start up/scale up in digital health with a number of innovations in various states of patent protection. We engage legal council for activities related to information protection, liability, disclosure etc. These activities are expensive but are important to protect us. As we move into areas such as generative AI, we have found that we are required to engage legal for guidance and advice. These services are expensive for small, growing organizations to undertake but are vital to ensure that innovation, IP and copyright are able to be undertaken, advanced and that we remain productive through use of available resources. Support from government, in the form of information, guidance, considerations and/or funding would support emerging innovations and organizations to continue to be protected and that organizations can leverage these technologies to their advantage. The financial impacts of proceeding in alignment with legal and ethical concerns, while protecting our works is substantial.

Another area of consideration is how other legislation and regulation impacts use of these technologies in our domain. Specifically for a healthcare organization, legislation such as PIPEDA and PHIPA (in Canada) may conflict with generative AI and/or potential that use of tools may lead to inadvertent leakage of information, inappropriate context for specific jurisdictions, trust in systems/algorithms by healthcare organizations if underlying technology is not clear in attribution and/or best practices followed etc. We think about these requirements, as well as ethical use of AI in healthcare guidelines, and the potential areas of impact when determining how and where we will use generative AI. In many areas we have proceeded with an abundance of caution which may impact our ability to innovate and/or be productive.

W

Writers’ Guild of Alberta

Technical Evidence

N/A

Text and Data Mining

The issue in my sector with TDM for the development of AI is that artists' work is being scraped and used for gen AI models that compete against the artists themselves. There should be distinction between TDM for research purposes and for commercial purposes. AI companies should not be able to mine artists' work to create generative models that the companies alone benefit from commercially while removing financial compensation from the artists.

Authorship and Ownership of Works Generated by AI

I'm concerned that establishing copyright ownership and authorship for AI generated works disproportionally benefits the AI companies while devaluing the rights of artists. If AI art is generated by a machine with only some prompt input by a human, why should the prompter hold copyright when anyone can use the same AI program and get it to generate something similar? Who owns the copyright of an AI output if it's not distinguishable from the work of an artist that was scraped to create the AI model? If granting copyright to AI prompters or companies means artists lose any rights at all, I'm not in favour of allowing any copyright of AI works.

Infringement and Liability regarding AI

Businesses should not receive any liability protection for scraping the work of artists to compete against the scraped artists. I don't see any benefit to commercial use of AI-generated artworks and the use of datasets in the arts should be for research purposes only.

Comments and Suggestions

I don't think there is any ethical way to give copyright protection to AI works. The priority must be protecting the copyright of artists that are feeding the dataset. If this results in gen AI art not being commercially viable, so be it. Many artists are losing jobs in the arts sector, which indicates that gen AI art is not creating jobs, it's eliminating them.

Writers Guild of Canada

Technical Evidence

IN YOUR AREA OF KNOWLEDGE OR ORGANIZATION, WHAT MEASURES ARE TAKEN TO MITIGATE LIABILITY RISKS REGARDING AI-GENERATED CONTENT INFRINGING EXISTING COPYRIGHT-PROTECTED WORKS?

Pursuant to section 13(1) of the Copyright Act of Canada (the Act), the general rule under the Act is that the “author” of a work is the first owner of the copyright therein.

Screenwriters are the authors and first owners of the copyright in their scripts. This is expressly recognized in the current Independent Production Agreement (IPA), which is the collective agreement currently in force between the Writers Guild of Canada (WGC) and the Canadian Media Producers Association (CMPA). (Please see: https://www.wgc.ca/screenwriters/resources/agreements/search_agreements/index)

Specifically, Article A701 of the IPA states: “All rights negotiated under this Agreement or in any individual contract between a Writer and a Producer shall be in the form of a license from the Writer to the Producer for a specific use during a specified term of whatever right is in question. The Writer’s copyright shall not be assigned. The copyright herein referred to is the copyright in the Writer’s Script Material, which is separate and distinct from the copyright in the Feature Film or program.”

In this context, the IPA includes the following warranties and indemnities.

Article A709(a): “The Writer or Story Editor warrants that, to the best of his/her knowledge, information and belief the Script Materials to be provided by him/her hereunder: i) are original to the Writer or Story Editor; ii) do not infringe the copyright of any person; iii) do not defame any person; and iv) do not invade the right to privacy of any person. The foregoing warranty does not apply to material included in the Script Materials supplied to the Writer or Story Editor by the Producer, or in respect to any claim or action that arises from any change made in the Script Materials delivered by the Writer or Story Editor to the Producer after such delivery.”

Article A709(b): “The Producer warrants that, to the best of the Producer’s knowledge, information and belief, any material supplied to the Writer by the Producer for the Writer or Story Editor to incorporate in the Script Materials to be provided by the Writer or Story Editor hereunder: i) do not infringe the copyright of any person; ii) do not defame any person; and iii) do not invade the right to privacy of any person; and covenants that no Script Material supplied by the Writer or Story Editor to the Producer shall be used by, or with the approval of, the Producer in such a manner as to defame any person or to invade the right to privacy of any person or to violate the provisions of the Criminal Code of Canada in with respect to child pornography, or obscenity or any like offenses.”

While these provisions predate the emergence of generative AI as a viable source of script materials, the WGC takes the view that Articles A709(a)(i) and (ii), and Article A709(b)(i) are applicable in this context. Subject to the state of the law of copyright in Canada, these articles limit or prevent the use of AI-generated materials by both screenwriters and producers under the IPA.

As of the date of this submission, the WGC is in the process of bargaining the terms of the next IPA, and is naturally aware of the potential impacts of recent developments of generative AI in this context.

Text and Data Mining

WHAT WOULD MORE CLARITY AROUND COPYRIGHT AND TDM IN CANADA MEAN FOR THE AI INDUSTRY AND THE CREATIVE INDUSTRY?

Broadly speaking, TDM sits at the front end of a pipeline in which valuable human-created material is extracted, used to train generative AI systems, and then those systems are used to create value for their developers and users. In other words, it is a process that extracts something of value from the work of human creators as “inputs”, often for little or no compensation, and then conveys that value to developers and users in the form of generative AI “outputs”. Value is transferred from one entity or entities to another entity or entities, often without the knowledge, consent, credit, or compensation of the former.

In this sense, the “M” in “TDM”—i.e. the “mining”—is a deeply misleading term. In traditional mining, mineral wealth exists in the ground due entirely to natural processes, and not in any way due to human effort. Nobody “made” the iron ore in an iron mine, it existed there long before humans did, and the human effort in the iron value chain begins with its discovery and extraction from the earth.

This is fundamentally not the case with “text and data mining”. In the case of TDM, the “T” and the “D” is only available for the “M” because human beings created the text and data in the first place. Moreover, they did so through expending effort, and often significant effort. Indeed, in complete contrast to the mining of minerals, the human effort involved in creating a given work under copyright is much more significant than that involved in scraping it from the Internet.

Simultaneously, there is an enormous asymmetry of information available to AI developers and the creators whose works are being “mined”. In many cases, if not most, the asymmetry is total, with AI developers having ALL the knowledge available on what works are being mined, and how that information is being subsequently used, while creators have NONE of that knowledge. Many creators do not even know if or how their work has been used in TDM in the first place.

Given all this, there is an enormous need for transparency in TDM, and this is a place for copyright law and policy to start. AI developers must have certain basic obligations for disclosure and reporting, to holders of copyright, to the public, or both. AI platforms should be required to comply with transparency requirements, including, but not limited to, publishing records of the copyright-protected works that were ingested into the platform.

This clarity would allow creators and rights holders, once they know whether and how their work is being used, to advocate and/or negotiate for fair compensation for its use, or to consent to its use in the first place. As noted above, creators have made something of value to AI developers. We know it is valuable, because AI developers are using it through TDM, and generative AI is attracting significant financial investment and user activity. Greater clarity around TDM would allow that value chain to operate in a way that is beneficial to everybody on it, and not just those and the tail end.

Broadly speaking, the WGC believes that TDM activities should be subject to the “three C’s” of consent, credit, and compensation for rightsholders and authors.

Further, the WGC believes that TDM for the purposes of training generative AI is not and should not constitute fair dealing, either now or in the future.

ARE TDM ACTIVITIES BEING CONDUCTED IN CANADA? WHY OR WHY NOT?

As noted above, given the near-complete information asymmetry with respect to TDM, the WGC cannot say whether TDM activities are being conducted in Canada. Given the apparent ubiquity and pervasiveness of the practice, however, we would have every reason to believe that they are indeed being conducted in Canada.

See the WGC’s further comments elsewhere here on the need for transparency on TDM activities.

IF THE GOVERNMENT WERE TO AMEND THE ACT TO CLARIFY THE SCOPE OF PERMISSIBLE TDM ACTIVITIES, WHAT SHOULD BE ITS SCOPE AND SAFEGUARDS? WHAT WOULD BE THE EXPECTED IMPACT OF SUCH AN EXCEPTION ON YOUR INDUSTRY AND ACTIVITIES?

Please see our comments above regarding transparency and information asymmetry in this context. In particular, AI platforms should be required to comply with transparency requirements, including, but not limited to, publishing records of the copyright-protected works that were ingested into the platform. Broadly speaking, the WGC believes that TDM activities should be subject to “the three C’s” of consent, credit, and compensation for rightsholders and authors. Further, the WGC believes that TDM for the purposes of training generative AI generally does not and should not constitute fair dealing, either now or in the future.

Given the above, the WGC feels strongly that the Government should not create any new copyright exceptions to facilitate TDM, as a fair dealing exception or otherwise. Exceptions that allow AI companies to freely use copyrighted works for AI training purposes would erode the objectives of the “the three C’s” of consent, credit, and compensation for rightsholders and authors. Such exceptions would eliminate completely the “consent” element, and would further undermine the other two, at the very least, by either not providing for compensation of any form, as a full fair-dealing exception, or through compelled licensing, which would undermine rightsholders’ ability to negotiate compensation or withhold their rights if a deal cannot be struck. Removing the latter option from rightsholders naturally hobbles them in negotiations with AI companies. The price of a thing is grounded in the fact that the potential buyer doesn’t get to have it if the seller doesn’t agree to the price. Taking that away benefits the potential buyer and hurts the potential seller. Forcing rightsholders to make works available for TDM benefits AI companies and hurts rightsholders and authors.

The WGC also opposes an “opt-out system” for the use of copyrighted works in AI training. An opt-out model would place an enormous burden upon rightsholders, some of whom are individual creators with limited resources, requiring them to monitor multiple AI platforms—or all AI platforms—and then send notice to each advising that it has chosen to opt out of the exception. What happens with regard to any copying that took place before they opted out? This would be a significant burden to place on copyright owners—again, some of whom are individual artists—vis-à-vis typically much larger and powerful corporations, and is disproportionate to the problem and with respect to the bargaining power of the parties.

SHOULD THERE BE ANY OBLIGATIONS ON AI DEVELOPERS TO KEEP RECORDS OF OR DISCLOSE WHAT COPYRIGHT-PROTECTED CONTENT WAS USED IN THE TRAINING OF AI SYSTEMS?

Yes. Generative AI platforms should be required to comply with transparency requirements, including, but not limited to, publishing records of the copyright-protected works that were ingested into the platform.

The WGC is not currently in a position to provide further detail on exactly what that transparency would look like, given the current near-total information asymmetry discussed above. We presume that further details would be engaged in further steps in this or a subsequent government consultation process. But the principle of such transparency and the further development of how it would work is vital.

WHAT LEVEL OF REMUNERATION WOULD BE APPROPRIATE FOR THE USE OF A GIVEN WORK IN TDM ACTIVITIES?

Given the information asymmetry discussed above, the WGC cannot effectively answer this question, as we lack the information upon which to do so. We do not know which copyright-protected works may have been used in the training of a given AI system, and we do not have quantitative data on the value generated by the AI system based on the use of those works. In addition, see our comments above that the Government should not create any new copyright exceptions to facilitate TDM. Exceptions that allow AI companies to freely use copyrighted works for AI training purposes would presumably upend the objective of the “the three C’s” of consent, credit, and compensation for rightsholders and authors or, at the very least, the “consent” element.

Further, we question the premise of the question that the Government should set, or be involved in setting, the level of remuneration that would be appropriate for the use of a given work in TDM activities in the first place. Consistent with “the three C’s”, and in conjunction with robust transparency obligations, it should be up to rightsholders and authors to negotiate with AI developers on whether, how, and for what remuneration their works can be used to train AI systems. In this sense, we submit that the Government should be empowering creators to engage with AI developers on a level playing field, rather than setting the price itself, or compelling the transaction in the first place.

ARE THERE TDM APPROACHES IN OTHER JURISDICTIONS THAT COULD INFORM A CANADIAN CONSIDERATION OF THIS ISSUE?

The WGC is not in the position at this time to provide a comprehensive review of the approaches in other jurisdictions on this issue. Broadly speaking, inter-jurisdictional comparisons of law and policy are highly complex and, to be done properly, involve a deep understanding of the economic, social, and political contexts of those jurisdictions and how they compare to those of Canada. Ideally, they would also consider what the outcomes of those approaches have been. In the case of generative AI, it is very early in the process and such outcomes are unlikely to be well understood, or known at all.

Authorship and Ownership of Works Generated by AI

SHOULD THE GOVERNMENT PROPOSE ANY CLARIFICATION OR MODIFICATION OF THE COPYRIGHT OWNERSHIP AND AUTHORSHIP REGIMES IN LIGHT OF AI-ASSISTED OR AI-GENERATED WORKS? IF SO, HOW?

As stated in the Government’s Consultation Paper, “Canadian copyright jurisprudence suggests that ‘authorship’ must be attributed to a natural person who exercises skill and judgment in creating the work, reflective of the fact that the Act ties the term of protection to the life and death of an author.”

The WGC believes that this is correct, both as an expression of the jurisprudence and that the jurisprudence has reached the correct conclusion. It is entirely consistent with the words of the Act, as well as the policy rationales for the existence of copyright in the first place, for authorship to be attributed to natural persons—to human beings—alone, and not to generative AI, nor to any other type of non-human source.

There are two generally accepted policy rationales for the existence of copyright. One sees copyright from the perspective of users, as a means to incentivize and promote the creation of works that ultimately benefit societies at large. The other sees copyright from the perspective of authors, as a natural right of a person to the fruits of their labours in the exercise of their skill and judgement. In both cases, these rationales are underpinned by the word used to describe what is copyrightable under the Act, namely, “works”. “Works” naturally involve work—human effort, without which such works don’t exist. This is fundamental to any reasonable policy rationale for copyright.

Neither of copyright’s rationales justify copyrightability being vested in AI-generated “works”. Importantly, AI-generated outputs involve virtually no effort on the part of the user to create. Currently, a user simply enters basic text prompts into the generative AI and receives back complex text, visual, audio, or audiovisual outputs in return. Other types of non-text inputs may exist now or in the future, but the crucial fact remains that these inputs represent the tiniest fraction of the effort that would be required to create a similar copyrightable work by non-AI-generated means. For example, a complete novel that might require a year or more for a human to write can be spat out by a generative AI in mere minutes, or even seconds. The difference is one of orders of magnitude.

Given this, there is no reason for copyright law to protect such outputs for the benefit of the user, either based on the rationale for the incentivizing of creation for the benefit of society, or the rationale for protect the right of a person to the fruits of their labour. In the latter case, there is no meaningful “labour” to protect, and in the former case, there is no shortage of AI-generated works in need of incentivizing.

Similarly, as it pertains to the developers of generative AI, there is clearly no need to incentivize their work under copyright either, as demonstrated by the existing fact that copyright is currently not ascribable to AI-generated works in key jurisdictions like the United States, yet billions have poured in to AI development already, and not from any reasonable expectation that copyrightability of resulting outputs is somehow on the horizon.

Given this, there is an opportunity for the Government to amend the Copyright Act to make it crystal clear that an “author” is, indeed, a natural person—a human being—and not a machine. We recommend that the Government do so. (This will be particularly important if the Government chooses to clarify that performers are human beings, as it will then be inconsistent for the Act to make that clarification, but not clarify the same issue with respect to authors.) To reiterate, however, the WGC believes it is clear that the Act as currently drafted already requires that authors are human beings, and AI cannot be an author.

In addition, as stated in the Consultation Paper, “A human may contribute sufficient skill and judgment in a work produced with the assistance of AI technologies to be considered the author of the work.” We submit that the Government should amend the Act to specify the standard that such contribution of “sufficient skill and judgement” in the context of AI would be, and that this should be a high standard—or, at the very least, a higher standard—for a significant contribution of human input.

The WGC is particularly concerned about the threat of a practice we call “copyright laundering”. Copyright laundering may be a particular risk for creators such as screenwriters, who work in an expensive and highly commercialized medium—in our case, film and television—and who collaborate with producers and others within an intermediary stage in the larger creative process towards a final production.

In film and television production, screenwriters work with producers and/or content commissioners like broadcasters or streamers. The process begins with an original idea or existing source material, like a novel, and which is then developed into a script which ultimately goes into production to become a film or television show. This development process is a creative process in and of itself, attracting both remuneration for the work done by the screenwriter and recognition for the creativity involved, including in the form of credit on the production and critical acclaim.

Copyright laundering would occur when a producer or content commission approaches a screenwriter with material from a generative AI and asks the screenwriter to rework that material to such a degree that it becomes copyrightable. Under a legal framework like the current one, the producer or content commissioner would likely know that simply producing the script generated by AI without sufficient skill and judgement from a human writer would put them at significant risk of not having a copyrightable film or television show in the end, and therefore not being able to effectively commercialize a significant investment in its production. But if the standard for “sufficient skill and judgement” from a human screenwriter is low enough, the producer or content commissioner could generate a script using AI for extremely low or no cost, and have a human writer “launder” the script, seeking to pay the screenwriter significantly less, based on the (specious) argument that they “didn’t do as much work” as if they were working from an original idea. (And without having to benefit other human artists or rightsholders through the purchase the rights to human-created source material, such as a novel.)

Such a practice would not necessarily eliminate the role of human screenwriters altogether, but it could reduce the amount which screenwriters are paid, threatening the economic viability of screenwriting as a profession. At the same time, it could also diminish the creative status of human screenwriters, as audiences and others may question just how much the screenwriter—or any other artists working on the production involving AI, for that matter—actually contributed to the final work. Indeed, whether accurate or not, producers, content commissioners, and/or audiences could come to see screenwriters not as artists and creators, but mere formalistic legal requirements for copyrightable production whose skill and ideas are worth less than their existence as human beings.

The issue of the economic and creative status of screenwriters and other artists must also be considered in light of the development and maintenance of a talent pool. It is possible that encroachment of generative AI into creative fields would not eliminate all relevant creative roles immediately. Many senior, established artists may remain in demand for a number of reasons, such as their name recognition, track record, and/or unique individual style. But what about more junior and mid-level creators who are trying to establish themselves? Creative skill and talent are rarely things that artists are simply born with, fully formed. They are developed over the course of a career. Like any skill, creative work needs to be practiced and honed. A unique creative voice is something that an artist often finds within themselves after significant effort to unearth it. That is a process that takes years, and is often only possible if and when the artist can financially sustain themselves while it happens, through the process of making art itself. Established creative industries provide incubators for talent, and a pipeline for younger or newer artists to earn a living through creativity while they hone their craft.

If generative AI is allowed to disrupt that pipeline by rendering less experienced creators unnecessary, the short-term impacts may only be felt by those creators. But the long-term impacts will be felt by everybody, as the conveyor belt that develops and delivers talent into more senior roles shuts down. And then we will all be worse off as a result.

ARE THERE APPROACHES IN OTHER JURISDICTIONS THAT COULD INFORM A CANADIAN CONSIDERATION OF THIS ISSUE?

Broadly speaking, inter-jurisdictional comparisons of law and policy are highly complex and, to be done properly, involve a deep understanding of the economic, social, and political contexts of those jurisdictions and how they compare to those of Canada. Ideally, they would also consider what the outcomes of those approaches have been. In the case of generative AI, it is very early in the process and such outcomes are unlikely to be well understand, or known at all.

That said, while it may be considered at first glance to be a comparable jurisdiction, the approach to authorship and ownership to AI-generated works taken in the UK represents a global outlier position, is comparable to the UK approach for authorship of cinematographic works that is also a global outlier, is inconsistent with Canadian copyright law, and should not be followed in Canada.

Infringement and Liability regarding AI

WHAT ARE THE BARRIERS TO DETERMINING WHETHER AN AI SYSTEM ACCESSED OR COPIED A SPECIFIC COPYRIGHT-PROTECTED CONTENT WHEN GENERATING AN INFRINGING OUTPUT?

Please see our comments elsewhere in this consultation regarding transparency and information asymmetry in this context. This asymmetry is virtually total, with AI developers having ALL the knowledge available on what works are being mined, and how that information is being subsequently used, while creators have NONE of that knowledge. Many creators do not even know if or how their work has been used in TDM in the first place. This is a significant barrier.

Comments and Suggestions

N/A

The Writers' Union of Canada

Technical Evidence

N/A

Text and Data Mining

Question: What would more clarity around copyright and TDM in Canada mean for the AI industry and for the creative industry?

Clarity in the Act around the responsibilities of industrial users of professionally created content is essential for the maintenance of existing markets and the creation of valuable new ones.

We have seen how lack of clarity around education as a category of fair dealing has led only to over-reach, costly litigation, greater confusion, and a broken market for published work in educational settings.

Text and Data Mining (TDM) activity requires industrial scale copying of created works. It must be regulated by strong and clear guidelines stemming from the Copyright Act.

Question: Are TDM activities being conducted in Canada? Why is it the case or not?

It is very difficult to answer this question authoritatively, because as we’ve seen from the prominent TDM activities resulting in AI services, much of that TDM and training work is done unadvertised and quietly without engaging with or even informing the creators whose work is being used.

That said, we are reasonably sure TDM activities do take place in Canada as Large Language Model research has been happening at Canadian universities for a very long time.

Question: Are rights holders facing challenges in licensing their works for TDM activities? If so, what is the nature and extent of those challenges?

The main challenge facing rightsholders around licensing of work for TDM is that there is simply no initiative from within the TDM sector to engage in licensing, or even to inform the creative sector about their work. The Books3 dataset was revealed by an investigative report in the press, but otherwise the training and datasets used for the training are essentially behind a curtain.

Creative professionals can be reasonably sure their work is being used because the outputs of AI Chat services show clear and deep familiarity with the work, but the lack of transparency in TDM does not indicate good faith engagement.

There are mechanisms in place that could have been used by TDM companies to engage with and seek permission from professional creators. Licensing collectives and the Copyright Board have become too easy to avoid or circumvent through vague and bad faith claims of fair dealing. This points to a fundamental weakness in Canada’s Copyright Act. The fence that is supposed to define and protect our intellectual property has been so riddled with exceptions, it no longer functions as a fence.

Over a decade of undermining the Canadian market for rights under copyright has created the impression that there is no marketplace for rights, and that they can simply be ignored.

Question: What kind of copyright licenses for TDM activities are available, and do these licenses meet the needs of those conducting TDM activities?

There are both direct and collective licensing models already in existence, and the potential for TDM-specific licenses to emerge. The Writers’ Union of Canada has developed new contract terms to ensure our members reserve their rights around TDM and Artificial Intelligence. The creative sector is nimble and able to adapt to new technology and new uses of our work, and we are willing participants in most functioning markets. With good faith negotiation and bargaining, there is a zero percent chance a licensing structure that works for all parties would not emerge.

Question: If the Government were to amend the Act to clarify the scope of permissible TDM activities, what should be its scope and safeguards? What would the expected impact of such an exception on your industry and activities?

Any new exception for TDM and/or AI training will have a negative impact on creative professionals. Such an exception would further undermine exclusive rights conferred by the Act, would remove any potential for emerging markets, and would damage existing models for monetization and control of cultural work.

TWUC opposes any amendment to the Copyright Act that introduces new exceptions to the exclusive rights conferred by the Act. TDM and AI activities using creative work are acts of copying, and as such fall under the exclusive right of creators. There exists an evolving market for these rights, and an exception would disrupt that natural evolution and destroy a market.

Furthermore, TWUC believes the Copyright Act must now take on the question of AI outputs that directly compete with the work of human creative professionals. We believe the Act should privilege the original work of human creators in all instances of conflict with AI outputs.

If this work to clarify the Copyright Act is not done, Canada is inviting yet another round of costly, time-consuming, and ultimately inconclusive litigation similar to what we’ve seen around educational copying. As was shown with the SCC decision in Access Copyright v. York University, the Copyright Act must be clear and unequivocal in its definitions, and regulators such as the Copyright Board must be given functional authority, or the whole rights market breaks down under legal challenge.

Question: Should there be any obligations on AI developers to keep records of or disclose what copyright-protected content was used in the training of AI systems?

It is essential for transparency both from a market perspective and a moral rights perspective that TDM and AI developers be obligated to keep comprehensive records and disclose their use of copyright-protected materials.

We are already seeing allegations that infringed work was accessed from pirate sites in some cases of AI training. To not require complete transparency only encourages further infringing activities.

A functioning market for TDM and AI licensing depends on the tracking of use and value. Given the technological complexity and sophistication of such operations it would be disingenuous to argue that comprehensive record-keeping and disclosure are burdensome requirements.

Question: What level of remuneration would be appropriate for the use of a given work in TDM activities?

Fair and reasonable price points evolve through market functionality. Without fully understanding the details of the use of a work and the value generated by that use, it is impossible to place an accurate price. A level-playing field with effective regulation will foster negotiation and bargaining that will arrive as price points, and allow for the evolution of those price points as uses change.

What is inappropriate is allowing undeclared, unpermitted, and uncompensated use of intellectual property at industrial scale.

Question: Are there TDM approaches in other jurisdictions that could inform a Canadian consideration of this issue?

As with educational copying, TWUC considers the UK model for TDM licensing and permissions to be a workable example when building a framework in Canada. The UK model allows for growth of a rights market around TDM and AI, and has shown no signs of inhibiting the growth of AI development.

Authorship and Ownership of Works Generated by AI

Question: Is the uncertainty surrounding authorship or ownership of AI-assisted and AI-generated works and other subject matter impacting the development and adoption of AI technologies? If so, how?

We do not see any inhibition on the rapid development and growth of AI systems. It would appear that questions of copyright and ownership of intellectual property are not of a high priority for AI developers.

Should they be a high priority? Absolutely. Market considerations around AI inputs and outputs are not restricted to the provision of the service. A market for AI cannot be allowed to develop without consideration for the IP used in both inputs and outputs.

Question: Should the Government propose any clarification or modification of the copyright ownership and authorship regimes in light of AI-assisted or AI-generated works? If so, how?

TWUC believes it is a foundational principle of copyright protection that it be granted to works demonstrating individual human judgement and skill. Allowing for copyright protection to AI outputs with minimal to no human creativity involved would quickly concentrate market power for content in the hands of the few largest AI firms, much in the same way advertising revenue was concentrated in just a few tech platforms due to a lack of effective regulation around the sharing of news content.

Governments are now in the position of trying to pull back revenue and control for media rightsholders from an intransigent tech sector. We must learn from that lesson and place proper guardrails and regulations that privilege human creation.

Question: Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

Professionally created Canadian cultural works play an outsized role in defining Canadian reality and values to the world. To safeguard that work, and keep it from being obscured by an avalanche of AI-generated content that purports to define this country, Canada should put in the work to develop our own strong policy. The UK example of granting copyright protection to computer-generated works should not be followed. Ongoing conversations in the US and UK around privileging human creation are instructive but only a starting point, and suffer from having to be reactive against a starting point at which AI-generated content may be considered equal to human-made content. TWUC believes this conversation should start from an agreement that human-generated content is the intended target for rights under Copyright.

Infringement and Liability regarding AI

Question: Are there concerns about existing legal tests for demonstrating that an AI-generated work infringes copyright (e.g., AI-generated works including complete reproductions or a substantial part of the works that were used in TDM, licensed or otherwise)?

TWUC is concerned that any Canadian test for infringement is inadequate without complete transparency in the TDM or AI process. TDM and AI developers must be subject to transparency requirements, and providers must be required to keep comprehensive records of inputs and outputs for any rightsholder or court to have a reasonable chance of determining infringement.

Without transparency requirements and discoverable source records, original creators are disadvantaged in protecting and exploiting their rights. One can anticipate the creation of a book series that is radically similar to a human-created series, and that then competes for market share without ever revealing that it is explicitly derivative from the human work. Under copyright as humans have designed it, that scenario is an actionable infringement, but if the infringing input cannot be proven the human rightsholder is unfairly disadvantaged.

Question: What are the barriers to determining whether an AI system accessed or copied a specific copyright-protected content when generating an infringing output?

Lack of transparency. Lack of record-keeping.

Question: When commercialising AI applications, what measures are businesses taking to mitigate risks of liability for infringing AI-generated works?

TWUC cannot comment on other business practices, but we are recommending that all authors insist on AI-specific rights definition and reservation in author contracts and agreements.

Question: Should there be greater clarity on where liability lies when AI-generated works infringe existing copyright-protected works?

TWUC believes infringement in AI process would take many forms throughout both input and output stages. We believe rightsholders should be able to seek joint and several liability between all actors in the AI process, from LLM creator, the AI platform, the application provider, and end user of AI generated content. This scenario should be no different than the various liabilities at any stage of any other instance of copyright infringement.

Question: Are there approaches in other jurisdictions that could inform a Canadian consideration of this issue?

As above, Canada should put in the work to develop our own strong policy. The UK example of granting copyright protection to computer-generated works should not be followed. Ongoing conversations in the US and UK around privileging human creation are instructive but only a starting point, and suffer from having to be reactive against a starting point at which AI-generated content may be considered equal to human-made content. TWUC believes this conversation should start from an agreement that human-generated content is the intended target for rights under Copyright.

Comments and Suggestions

New Exceptions would Damage Existing and Emergent Markets

The Writers’ Union of Canada (TWUC) believes that creative content uses required by AI must be negotiated within a market for licensing rather than through a damaging new exception (or exceptions) to copyright protection. To be clear, TDM activities involving copyright works infringe the rights of copyright-holders under the Copyright Act unless they are specifically permitted by the rightsholder. Any call for a new exception for TDM and AI activities is in fact a request to excuse infringements that have already happened and continue to happen in TDM and AI training.

TWUC is concerned that with each new technologically driven “use” of creative content comes a call for a new exception to the exclusive rights of those who create that content. This exception-focus throws the traditional purpose of copyright completely out of balance and privileges the desires of industrial users over the rights of creators. The end-result is what has been described as a Copyright Act “like a pasta strainer”; legislation so full of exceptions and loopholes that there remains simply no motivation on the part of users to seek permission or pay for the content they use.

The reflex to create ever greater exceptions is a market destroyer, as has been amply proven over the last decade by Canada’s ill-advised adoption of education as a fair dealing category. That change to the Act has done nothing to provide students or their instructors with more affordable access, but it has transferred hundreds of millions of dollars of earned income away from the cultural sector.

The content uses required by developers of artificial intelligence must be negotiated in the context of a respectful market for licensing. Though they often hide their motivation behind claims of serving the public interest, these are enormously powerful and wealthy corporate entities primarily driven by a profit motive. They can and should be expected to operate in a licensing environment.

Licensing is not a barrier to innovation. Innovation does not mean free riding on the work of others, nor should it; yet it would under new exception. Licensing preserves the integrity of copyright by giving creators an element of control over when and how their works are used. Licensing also provides legal certainty to good faith users, and keeps disagreements at the bargaining table where they belong, and not in court.

Annex

Annex: Detailed Questions