Product description: biological sequence listings data

File format

The official communication regarding the biological sequence listing (BSL) associated with a patent is provided in a TXT file. The file structure is governed by World Intellectual Property Organization (WIPO) standard ST.25 for the presentation of nucleotide and amino acid sequence listings in patent applications. The Canadian Intellectual Property Office (CIPO) generates two additional file types (PEP and SEQ) that are created as working files. They might be incomplete and are not to be considered official communication.

Record and data content

Records for both patent applications and granted patents are included. Patent data is open to public inspection after a confidentiality period of up to 18 months after the earliest filing date of an application. As a result, patent files consist of either a patent application or a patent that has been issued or granted.

Each TXT file includes the following types of information regarding the granted patent or patent application:

  • General information (applicant names, title of invention, critical application processing dates)
  • Sequence listing information (ID number, length, type, organism, feature, name/key, location)
  • Relevant journal publications (journal reference information)

Production schedule: weekly and annually

Weekly production

On a weekly basis, a collection of TXT, PEP and SEQ files are produced for all patent applications or granted patents that include a biological sequence. These collections of updated and new files are provided for the current calendar year. This results in 52 weekly collections that range from 1 MB to 350 MB depending on volume of activity. Each weekly collection includes a report that lists all the patent files included in that week's extract.

The naming convention for BSL files includes the patent number and the production date. Each folder should include at least a TXT file.

Annual production

On an annual basis, a complete refreshed collection of BSL files is produced. This includes all patents from 2003 to the most recent completed calendar year.  Collections are organized by year.

As of 2017, a collection of refreshed patent data consisted of 14 folders of approximately 4 GB. The naming convention for BSL files includes the patent number and the production date.