Application of Seq2Seq Models on Code Correction release_xyutbjpe2ffvnknje2ioeda5w4

by Shan Huang

Released as a article .

(2020)

Abstract

We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets.
In text/plain format

Archived Files and Locations

application/pdf  2.5 MB
file_aodhuoresrcatjjq3gwbkqjblu
web.archive.org (webarchive)
arxiv.org (repository)
Read Archived PDF
Archived
Type  article
Stage   submitted
Date   2020-08-04
Version   v2
Language   en ?
arXiv  2001.11367v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: d1712b4c-d1b9-4554-a3f9-bda28769df8d
API URL: JSON