Utilizing Artificial Intelligence for Developing Patent-Related Automated Search Engines
=============================================================
Google has developed a dataset of phrases to aid in the training of search models for patent searches. However, it's important to note that Google does not provide a specific, downloadable "dataset of phrases" from Google Patents for this purpose.
Instead, users can access patent text data via Google Patents Search at patents.google.com. This search engine indexes millions of patents from over 100 patent offices worldwide. To build a phrase dataset for training, users typically perform keyword or classification-based searches, extract text data (such as titles, abstracts, claims, descriptions) from retrieved patents, and process and curate this text into a training corpus manually or with custom scripts.
It's worth mentioning that the language used in patents can often be non-standard, making it challenging to find relevant patents during a search. For example, a soccer ball might be described as a "spherical recreation device" in a patent. The dataset developed by Google includes labels that denote how phrases are related to one another, such as synonyms, exact matches, or unrelated. These labels are intended to improve the accuracy of patent search returns.
It's essential to emphasise that no official "Google dataset of phrases" is publicly available specifically for training. If you require large-scale training data for patent search models, exploring official patent bulk data releases or specialized patent data providers is recommended beyond just Google Patents search. Some official patent office databases, such as USPTO and Espacenet, may provide data downloads or bulk access more suited for datasets.
Image Credit: Flickr user Nick Normal.
[1] Google Patents: https://patents.google.com/ [2] USPTO: https://www.uspto.gov/ [3] Espacenet: https://worldwide.espacenet.com/ [4] World Intellectual Property Organization (WIPO): https://www.wipo.int/ [5] Patent repositories for data mining and training models: https://www.researchgate.net/publication/324306544_Patent_Data_Mining_and_Text_Mining_for_Patent_Analysis
- The dataset developed by Google, which includes labeled phrases related to one another, can be beneficial in AI-based data-and-cloud-computing technology for improving the accuracy of patent search returns.
- For large-scale training data in patent search models, investigating official patent bulk data releases or specialized patent data providers beyond just Google Patents search could yield valuable resources, such as the USPTO database or Espacenet.