Query Expansion for Patent Searching using Word Embedding and
Professional Crowdsourcing
release_dtgxxnyv2jcvtp7h3nfhnkjsle
by
Arthi Krishna, Ye Jin, Christine Foster, Greg Gabel, Britt Hanley and
Abdou Youssef
2019
Abstract
The patent examination process includes a search of previous work to verify
that a patent application describes a novel invention. Patent examiners
primarily use keyword-based searches to uncover prior art. A critical part of
keyword searching is query expansion, which is the process of including
alternate terms such as synonyms and other related words, since the same
concepts are often described differently in the literature. Patent terminology
is often domain specific. By curating technology-specific corpora and training
word embedding models based on these corpora, we are able to automatically
identify the most relevant expansions of a given word or phrase. We compare the
performance of several automated query expansion techniques against expert
specified expansions. Furthermore, we explore a novel mechanism to extract
related terms not just based on one input term but several terms in conjunction
by computing their centroid and identifying the nearest neighbors to this
centroid. Highly skilled patent examiners are often the best and most reliable
source of identifying related terms. By designing a user interface that allows
examiners to interact with the word embedding suggestions, we are able to use
these interactions to power crowdsourced modes of related terms. Learning from
users allows us to overcome several challenges such as identifying words that
are bleeding edge and have not been published in the corpus yet. This paper
studies the effectiveness of word embedding and crowdsourced models across 11
disparate technical areas.
In text/plain
format
Archived Files and Locations
application/pdf 699.9 kB
file_hnd53dg4urh6lpgd6stthpjcmy
|
arxiv.org (repository) web.archive.org (webarchive) |
1911.11069v1
access all versions, variants, and formats of this works (eg, pre-prints)