Self Learning from Large Scale Code Corpus to Infer Structure of Method
Invocations
release_vx57l52xavh5deyhasqwcjolu4
by
Hung Phan
2019
Abstract
Automatically generating code from a textual description of method invocation
confronts challenges. There were two current research directions for this
problem. One direction focuses on considering a textual description of method
invocations as a separate Natural Language query and do not consider the
surrounding context of the code. Another direction takes advantage of a
practical large scale code corpus for providing a Machine Translation model to
generate code. However, this direction got very low accuracy. In this work, we
tried to improve these drawbacks by proposing MethodInfoToCode, an approach
that embeds context information and optimizes the ability of learning of
original Phrase-based Statistical Machine Translation (PBMT) in NLP to infer
implementation of method invocation given method name and other context
information. We conduct an expression prediction models learned from 2.86
million method invocations from the practical data of high qualities corpus on
Github that used 6 popular libraries: JDK, Android, GWT, Joda-Time, Hibernate,
and Xstream. By the evaluation, we show that if the developers only write the
method name of a method invocation in a body of a method, MethodInfoToCode can
predict the generated expression correctly at 73% in F1 score.
In text/plain
format
Archived Files and Locations
application/pdf 519.4 kB
file_7jkpv72lrfhxbopotr7ck6kzzm
|
arxiv.org (repository) web.archive.org (webarchive) |
1909.03147v1
access all versions, variants, and formats of this works (eg, pre-prints)