Product description: patent bibliographic and full text

File format

Data for patent bilbiographic and full text is provided in XML format with individual XML files for each individual patent. The XML file structure is governed by the World Intellectual Property Organization (WIPO) standard ST.36 for patent data.

Record and data content

Records for both patent applications and granted patents are included. Patent data is open to public inspection after a confidentiality period of up to 18 months after the earliest filing date of an application. As a result, patent files consist of either a patent application or a patent that has been issued or granted.

Each XML file includes the following types of information regarding the granted patent or patent application:

  • Bibliographic data (names and addresses of inventors, assignee's and agents, critical dates such as filing, publication & issuing, classification codes, etc.)
  • Title and abstract text (a short description of the invention)
  • Claims (a legal description of the scope of the patent and defines the specific attributes of the invention that are protected from infringement)
  • Disclosure/description (the full description of the invention)

Production schedule: weekly and annually

Weekly production

On a weekly basis, a collection of XML files are produced for all patent applications or granted patents that have been updated or are new. These collections of updated and new files are provided for the current calendar year. This results in 52 weekly collections that range from 50 MB to 120 MB in size depending on volume of activity.

The naming convention for XML files includes the patent number and the extraction date. In addition to the XML files, a "Log" file and "OpStat" file are produced for both new and updated patents. The "Log" file contains a list of all patents extracted and the "OpStat" file provides statistics related to the extraction process.

Annual production

On an annual basis, a complete refreshed collection of XML files of patent bibliographic and full text data is produced. This includes all patents from 1869 to the most recent completed calendar year. Collections are organized by year based on either granted issue date, or file update date in the case of applications.

As of 2017, a collection of refreshed patent data consisted of 236 folders of approximately 28 GB. The naming convention for XML files includes the patent number and the extraction date.