Microsoft and GitHub

The information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages and privacy requirements.

September 17, 2021

Submission of Microsoft and GitHub to Innovation, Science and Economic Development Canada on the Government of Canada’s “Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things”

Microsoft,Footnote 1 together with its subsidiary GitHub,Footnote 2 welcome the opportunity to participate in the Government of Canada’s consultation to implement a modern copyright framework to foster innovation and investment in an era of rapidly developing new technologies, such as artificial intelligence (AI) and the Internet of Things (IoT).

In this response to the Government’s accompanying consultation paper (the “Consultation Paper”), our submissions focus on AI, which is among the most powerful technologies of our time and is having transformational impact on our – increasingly digital – lives in the 21st century. Canada’s copyright laws must be fit for the modern age to: (i) propel innovation to achieve Canada’s AI ambitions; and (ii) achieve the right balance of legitimate interests across the spectrum of uses of AI.

These submissions build on Microsoft’s prior submissions in the 2018-2019 parliamentary review of the Copyright Act by the Standing Committee on Industry, Science and Technology (INDU). We reiterate the importance of the Copyright Act being amended to include an express exception that permits text and data mining (TDM) of lawfully accessed copyright-protected works and other subject matter for the purpose of informational analysis by all users.

It is critical that the Government of Canada encourage and foster an environment where AI is not burdened by unnecessary copyright regulation and remains accessible to all entities and individuals across the AI ecosystem. To do this, Canada should join other countries that have unequivocally established sufficient protections to ensure that AI-related activities, such as TDM, are widely adopted and used to their full potential.

TDM drives machine learning and is the backbone for AI

TDM is essential for the research and development of AI technologies, which relies on machine learning to analyze vast amounts of data and content to identify insights, patterns, and relationships.

The ability to harness the benefits of AI exists for everyone – individuals or organizations, large or small, public or private, commercial or non-commercial. All can readily use AI to develop innovative projects benefitting the public.Footnote 3 Startups, research groups, academics, not-for-profits, governments, and businesses are increasingly using machine learning, driven by processes like TDM, to develop algorithms to learn from data and to understand trends, research new and existing markets, and develop new technologies and applications.Footnote 4

The impact of machine learning can be revolutionary. As an example, GitHub recently announced a technical preview of an AI-powered tool for software developers called “GitHub Copilot”.Footnote 5 GitHub Copilot uses a large AI language model, developed in collaboration with the AI research lab OpenAI, to generate suggestions of code that developers can choose to integrate into their programs. Developers can simply describe the function they want in plain English and GitHub Copilot will generate a suggestion. Whereas no individual developer could read all the code, books, or information in the world, TDM can harness collective intelligence at scale: GitHub Copilot generates outputs from a model trained on billions of lines of English language text and source code from publicly available sources.

GitHub Copilot and other AI systems that generate code hold promise to accelerate innovation. These systems continue the trajectory of previous developer tools: just as compilers permitted developers to focus on higher order things instead of writing direct instructions for machine hardware, GitHub Copilot allows developers to spend less time on routine tasks. With this AI-empowered up-levelling, developers have an expert assistant they can work with and learn from. Programming becomes more accessible, empowering more people to become developers in the first place. AI systems for code generation hold promise to increase developer productivity, and as developers write useful software, accelerate the pace of innovation.

The Government’s Consultation Paper acknowledges that “TDM is currently demonstrating benefit in many socially valuable applications”. Among the many other examples are:

  • EcoHealth Alliance: using AI-based natural language processing of scientific literature to lead research into public health and environmental science with the goals of preventing pandemics and promoting conservation;Footnote 6
  • World Wildlife Fund / Microsoft’s Global Hackathon: training an image recognition model to identify and prevent the trade in pangolin and other endangered species products with the goal of ending wildlife trafficking online;Footnote 7 and
  • Computational Creativity for Games and Worlds – Microsoft Research: using AI to help facilitate the development of new creative works.Footnote 8

Continuing Canada’s leadership position in AI depends upon broad access to data and content for all users in the AI ecosystem

Canada has emerged asFootnote 9 – and continues to beFootnote 10 – a global innovation leader in AI. A recent report by the University of Toronto summarized data showing that Canada has “become a powerhouse of AI innovation, producing the most AI patents per million people among G7 nations and China, while Toronto has attracted the densest cluster of AI startups in the world.”Footnote 11

The Government of Canada has recognized the importance of AI to the economy and the opportunity to translate Canada’s leadership position in AI into jobs and economic activity:

  • In 2017, the Government asked the Canadian Institute for Advanced Research (CIFAR) to develop a $125 million Pan-Canadian AI Strategy (PCAIS) focused on attracting, retaining and training AI research in the country.Footnote 12 Since then, CIFAR’s investments have supported innovative new research, including in the fight against COVID-19.Footnote 13 The Government recently committed to renewing and expanding the PCAIS in the 2021 federal budget.Footnote 14
  • Also in 2017, the Government announced $950 million of funding for innovation superclusters, including the Digital Technology SuperclusterFootnote 15 and the AI-Powered Supply Chains Supercluster (Scale AI),Footnote 16 each of which involve analyzing large datasets to advance machine learning and AI solutions.

By ensuring that intellectual property laws are not interpreted to unnecessarily restrict the use of data for the purposes of innovation, Canada can maintain its competitiveness in AI innovation, attract human talent, financial capital, and commercialization opportunities. The role data plays to train machines to recognize objects, speak, listen, interpret, and more, so that they can look for patterns, relationships, and insights – all keys to innovation – cannot be understated. In July 2021, the Global Partnership on Artificial Intelligence (GPAI) released its “Applied Research Agenda for Data Governance for AI”, which highlights the importance of TDM to AI innovation and the need for global harmonization of TDM exceptions.Footnote 17

Global trends in extending legal protections for TDM

On the legislative front, several jurisdictions have implemented legal protections for broad machine learning techniques, including TDM. Examples include:

  • Japan: Japan amended its Copyright Act in 2009 to allow reproductions of copyrighted works that are created as part of a “computerized data processing”. In 2018, it expanded this law to allow for the “exploitation” of any copyrighted work for non-consumptive purposes, including “data analysis”. In addition to creating a general-purpose exception for non-consumptive uses of copyrighted works, the 2018 amendments also authorize beneficiaries of the information processing exception to make limited public uses of the underlying works, such as the display of snippets, to validate AI results and enable transparency about the data used in specific AI models.
  • European Union: In 2019, the EU amended its copyright laws by adding two broad exceptions that authorize non-commercial and commercial AI researchers to reproduce copyrighted works for computational analysis as necessary for machine learning.
  • Singapore: Earlier this year, Singapore introduced a Copyright Bill that includes an exception to copyright infringement for the purposes of TDM. The Copyright Bill was recently passed by the Parliament of Singapore on September 13, 2021, and is expected to come into effect later this year. The law contains several key safeguards to protect copyright owners’ legitimate interests:
    1. Entities who wish to engage in computational data analysis of copyrighted works must first obtain “lawful access” to the works.
    2. Entities performing computational data analysis cannot use copyright works that are infringing or were “obtained from an online location that is being or has been used to flagrantly commit or facilitate rights infringements”.
    3. Copying is only permitted to the extent of performing computational data analysis and communications to the public that are necessary for the purposes of: (i) verifying the results of the computational data analysis or (ii) collaborative research and study relating to the purpose of the computational data analysis.

Unsurprisingly, these countries are also at the forefront of attracting investment and business initiatives in the AI sector.Footnote 18

Copyright should not be a barrier to TDM

Just as copyright has never controlled how people understand, research, or analyze the books they read, copyright should not impede the use of technology to enhance the perception and analysis by machines of lawfully accessed works.

With machine learning through TDM it is often necessary to make copies of lawfully acquired works. These copies are not read by humans, nor are they consumed or redistributed for their creative expression. They do not substitute for or displace the markets for the original works.Footnote 19

Most researchers and innovators would not expect that their machine learning projects involving lawful access to works would be impeded by copyright. However, merely because machine learning techniques (such as TDM) may require incidental copying of works, doubt or uncertainty around the legality of various machine learning techniques exist under applicable copyright regimes, including in Canada.Footnote 20

Fear, uncertainty, and doubt (FUD) can have a powerful chilling effect on AI research. Providing clarity around TDM and ensuring that such machine learning techniques are available to all, removes legal ambiguity around TDM, breaks through the FUD, and helps to unlock even greater potential for innovative research by both public and private sector.

Closing the data divide and stimulating Canada’s data economy

Canada has a role to play in addressing the looming data divide. Policy-makers should explore ways to facilitate enhanced access to data in a manner that is responsible and scalable, enabling researchers, developers, data scientists and organisations to drive data-driven innovation and spur the data economy. The inclusion of a copyright exception for TDM is one way for policy-makers to achieve these objectives to ensure that Canada remains competitive with countries around the world in harnessing the power of machine learning and data to solve complex problems, while encouraging innovation and development in AI. We believe everyone can benefit from opening, sharing, and collaborating around data to make better decisions, improve efficiency and tackle pressing societal challenges.Footnote 21

Propelling innovation

There are two key AI-related changes to modernize Canada’s copyright law that may best achieve this Consultation’s stated objective of supporting the Government’s strategic priorities in the post-pandemic recovery of creating jobs and growth, supporting Canadian researchers and innovators, and ensuring that Canada’s economy takes advantage of the opportunities ahead:

  • An express exception for TDM:
    • The Government of Canada should introduce a targeted TDM exception in the Copyright Act to permit the use of lawfully accessed works or other subject matter for the purpose of informational analysis. Our submissions on the scope, safeguards, and limitations of such an exception are discussed in further detail below.
    • Enacting a new TDM exception would fulfill Recommendation 23 by the Standing Committee on Industry, Science and Technology (INDU) following the 2018-2019 parliamentary review of the Copyright Act.Footnote 22
  • Open government data:
    • The Government of Canada should promote and implement a legal regime that supports open government data.
    • Canada maintains a Crown copyright system, whereby works and other subject matter that have been “prepared or published by or under the direction or control of Her Majesty or any government department” are protected by copyright.Footnote 23 For Government of Canada works, permission from the Government is required for acts of revision, adaption or translation, irrespective of whether the use is personal or non-commercial. Permission is also required where the government work is reproduced for commercial purposes.Footnote 24
    • While Microsoft and GitHub applaud Canada’s efforts to make data more open, via initiatives such as Canada’s Open Data Portal,Footnote 25 there are strong policy arguments that data and works generated, collected or acquired by the public sector and funded by the Canadian taxpayer should be made open to the fullest extent possible, and freely available without restrictions, for the democratization of access to AI.
    • Implementing changes toward open government data would fulfill the recommendation by the Standing Committee on Industry, Science and Technology (INDU) following the recent parliamentary review of the Copyright Act: “[t]hat the Government of Canada improve Crown copyright management policies and practices by adopting open licences in line with the open government and data governance agenda…”.Footnote 26

Balancing legitimate interests

Modernizing Canada’s Copyright Act requires the careful balancing of various interests of stakeholders to ensure that copyright law is used to incentivize and support – not hinder – innovation, while also respecting legitimate rights holder interests.

The Consultation Paper discusses several areas of review of existing copyright law, including liability and authorship. Microsoft and GitHub offer the following comments:

  1. Infringement and liability regarding AI:

    This Consultation contemplates two areas of copyright review for AI-generated works: first, in the inputs that may require the reproduction of copyright-protected works; and second, in the outputs that may infringe existing copyright-protected works.
    1. Inputs:

      An express TDM exception in the Copyright Act will help propel Canada’s AI ambitions by removing uncertainty. Canada can and should clarify its copyright framework to expressly permit the reproduction of lawfully accessed copyrighted works for informational analysis, which is a foundational activity that underpins the research and development of AI technologies.

      An unequivocal legal exception creates clarity and will spur innovation. Other countries that have reviewed their copyright regimes have accomplished this goal by introducing an express TDM exception or through court interpretation and guidance on robust fair use exceptions (see e.g., Japan, EU, Singapore, United States). Microsoft and GitHub support an express TDM exception in the Canadian Copyright Act that permits the use of copyrighted works and other subject matter for the purpose of informational analysis.

      A TDM exception should allow all users in the AI ecosystem to harness the power of AI. Microsoft and GitHub support exceptions that do not place unnecessary restrictions on the nature of the users or their uses (e.g., commercial vs. non-commercial). Modern AI research involves a wide spectrum of users – researchers, universities, developers, small, medium, and large business enterprises, and governments, who often collaborate on projects to unlock the power of data and achieve the benefits flowing from informational analysis. A TDM exception, without such unnecessary restrictions, would align Canada’s copyright laws with other modern economies, and enables Canada’s users across the spectrum to participate in the global AI economy.

      Copyright owners whose works are lawfully acquired by users are not harmed by any approach that clarifies that copyright cannot be used to restrict TDM activities. Nothing would limit the rights holders’ use of non-copyright measures to restrict access – for example, by placing them behind a paywall or using other access-credentials to limit access. These access restrictions are not onerous; nearly all copyright owners whose works have significant commercial value – including books, music, film, television, video games, and software – implement measures to control access to their works as part of their normal course of business to monetize and protect expressive use of their works.

    2. Outputs:

      The existing copyright regime in Canada has sufficient mechanisms to address the infringement and liability issues raised by outputs of AI. As a result, any efforts in this Consultation should be focused on ensuring that legitimate, excepted, and similar “fair uses” that occur in the AI context enjoy the same legal certainty as when such works are used outside the context of AI. In particular, the ability to display or distribute relevant portions of data inputs, to the extent doing so would engage copyright, should be expressly permitted. In AI, being able to replicate or evaluate the data inputs used to train AI is often necessary to detect and address problems, eliminate bias, and promote trust and transparency for AI applications.

      In addition, the extent to which AI outputs contain reproductions of the copyrighted content used in training them may be very low. For example, prior to introducing the GitHub Copilot AI-powered programming tool described above (currently in a limited technology preview), GitHub conducted research to understand how the tool may output code that may be in the training set.Footnote 27 The analysis found that outputs matched code snippets from the training set in approximately 0.1% of cases. Subsequent product decisions sought to decrease this likelihood by, for example, preventing outputs in empty files, as was observed in the analysis, and instead limiting outputs to files with more developer-entered context. The team is also developing an origin tracker to help detect and prevent the rare cases where outputs may be repeated from the training set.

  2. Authorship and ownership of works generated by AI:

    The existing law in Canada is sufficient to address questions of copyrightability, authorship and ownership. Additional laws or new provisions to existing laws are not required to incentivize creation. While AI tools may enable the expression, the law should require that a human is still providing creative input as a prerequisite for affording copyright protection.

Summary of recommendations

  • The Government of Canada should introduce an express TDM exception in the Copyright Act to permit the use of lawfully accessed works or other subject matter for the purpose of informational analysis by all users in the AI ecosystem.
  • The Government of Canada should promote and implement a legal regime that supports open government data within the existing Crown copyright system.
  • The existing copyright regime in Canada has sufficient mechanisms to address the infringement and liability issues raised by outputs of AI, and to address questions of copyrightability, authorship and ownership.

About Microsoft

Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.

About GitHub

GitHub is the developer company. As the home to more than 65 million developers from across the globe, GitHub is where developers can create, share, and ship the best code possible. GitHub makes it easier to work together, solve challenging problems, and create the world’s most important technologies.

Nadine Letson
Assistant General Counsel
Microsoft Canada Inc.

Mike Linksvayer
Head of Developer Policy
GitHub, Inc.