[2406.18181] On the Evaluation of Large Language Models in Unit Test Generation
December 30, 2024

[2406.18181] On the Evaluation of Large Language Models in Unit Test Generation

View a PDF of the paper titled “Evaluation of Large Language Models in Unit Test Generation” by Lin Yang and 10 other authors

View PDF
HTML (experimental)

Abstract:Unit testing is an important activity in software development and is used to verify the correctness of software components. However, writing unit tests manually is challenging and time-consuming. The emergence of large language models (LLM) provides a new direction for automated unit test generation. Existing research mainly focuses on closed-source LLMs with fixed prompting strategies (such as ChatGPT and CodeX), while the functionality of advanced open-source LLMs with various prompting settings has not been explored. In particular, Open Source LLM has advantages in data privacy protection and demonstrates superior performance in certain tasks. Furthermore, effective prompting is crucial to maximizing the capabilities of the LL.M. In this paper, we conduct the first empirical study to fill this gap based on 17 Java projects, five widely used open source LLMs with different structures and parameter sizes, and comprehensive evaluation metrics. Our results highlight the significant impact of various prompt factors, show the performance of open source LLM compared to commercial GPT-4 and traditional Evosuite, and identify the limitations of LLM-based unit test generation. We then draw a series of implications from our study to guide future research and practical use of LLM-based unit test generation.

Commit history

Sender: Lin Yang [view email]
[v1]

Wednesday, June 26, 2024 08:57:03 UTC (219 KB)
[v2]

Wednesday, 25 September 2024 06:47:10 UTC (3,326 KB)

2024-12-28 16:10:24

Leave a Reply

Your email address will not be published. Required fields are marked *