Effective Reformulation of Query for Code Search using Crowdsourced
Knowledge and Extra-Large Data Analytics
release_zwyukulqnfhapaadbdzi36lnpa
by
Mohammad Masudur Rahman, Chanchal K. Roy
2018
Abstract
Software developers frequently issue generic natural language queries for
code search while using code search engines (e.g., GitHub native search,
Krugle). Such queries often do not lead to any relevant results due to
vocabulary mismatch problems. In this paper, we propose a novel technique that
automatically identifies relevant and specific API classes from Stack Overflow
Q & A site for a programming task written as a natural language query, and then
reformulates the query for improved code search. We first collect candidate API
classes from Stack Overflow using pseudo-relevance feedback and two term
weighting algorithms, and then rank the candidates using Borda count and
semantic proximity between query keywords and the API classes. The semantic
proximity has been determined by an analysis of 1.3 million questions and
answers of Stack Overflow. Experiments using 310 code search queries report
that our technique suggests relevant API classes with 48% precision and 58%
recall which are 32% and 48% higher respectively than those of the
state-of-the-art. Comparisons with two state-of-the-art studies and three
popular search engines (e.g., Google, Stack Overflow, and GitHub native search)
report that our reformulated queries (1) outperform the queries of the
state-of-the-art, and (2) significantly improve the code search results
provided by these contemporary search engines.
In text/plain
format
Archived Files and Locations
application/pdf 722.3 kB
file_2dl4qukqcvgtfipb6g624qettu
|
arxiv.org (repository) web.archive.org (webarchive) |
1807.08798v1
access all versions, variants, and formats of this works (eg, pre-prints)