A New Report on Patent Data

Since the U.S. Open Data Policy was established two years ago, federal agencies have been making their data resources more accessible and useful to the public. To help agencies prioritize this work, the Center for Open Data Enterprise runs a series of Open Data Roundtables that bring together federal agencies with the businesses, nonprofits, and individuals who use their data. These facilitated dialogues are designed to help identify high-value datasets, find solutions to data problems, and establish new collaborations. The Roundtables were originally launched as an initiative of The GovLab at NYU.

Last December, The GovLab and the U.S. Patent and Trademark Office (USPTO) hosted a Roundtable to give the Office critical feedback on its open data plans. It focused on a USPTO Roadmap for open data initiatives that was designed to provide better access to USPTO’s data assets and engage their data users more effectively. This Roadmap is an 18-month strategy that outlines a plan for data delivery, community engagement and private sector collaborations. At the Roundtable, representatives from companies, nonprofits, and academic institutions convened with USPTO staff to discuss the Roadmap, give their feedback, and identify other open data opportunities and solutions to data problems.

The public report on this Roundtable, being issued today, contains a number of recommendations to improve patent data – a key resource for U.S. and international businesses of all kinds. Some highlights are:

Increasing bulk access to patent data
Overwhelming priorities for information dissemination include: access to bulk Public PAIR (Patent Application Information Retrieval) and CPC (Cooperative Patent Classifications) data, data on the status and expiration of patents, and increased notification of changes affecting data classification, access, and availability.

Improving data quality along the data chain
The USPTO faces significant challenges in improving the quality of its data. First, statutory requirements for the submission of patents make fixes difficult when errors are detected after these have become legal documents. Second, patent applicants are often reticent to provide unambiguous information: there are market incentives to obfuscate data about patent ownership. Suggestions included strategies for addressing errors early in the patent application process, and market-based approaches for finding and fixing errors.

The Roundtable added insight to an ongoing discussion of the best ways to ensure that patent information can be publisehd as high quality, searchable open data. A recent post from the Data Transparency Coalition,  for example, suggested that Congress mandate electronic filing of patent applications – a move from paper-based or PDF systems to an e-filing system with an all-digital text-searchable format.

Engaging users of the patent system
The Roundtable reinforced that USPTO, like other federal agencies, should think of its data stakeholders – both current and potential – as “customers” for their data. The USPTO’s stakeholders include not only users of patent data but also users of other aspects of the patent system: inventors, patent experts and attorneys, patent examiners, and the USPTO itself. The Roundtable attendees had a number of recommendations for ways to enable continued outreach and user feedback. Options include establishing a public-facing point of contact for all data queries, such as a data librarian, concierge, or “contentist”; aligning events such as hackathons or data jams with open data releases; forming affinity groups both within the USPTO, and externally; and fostering a developer community.

The report is released as a public document with the hope that it will encourage further input, dialogue, and commitments. We invite you to review the full report and offer your feedback and suggestions.