【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (2024)

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (1)

本次分享论文:LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

原文作者:Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen

作者单位:University of California, Davis

Keywords: fuzz testing, large language models, binary structured data

Original link:

https://arxiv.org/pdf/2406.07714v2.pdf

Open source code: Not available

Introduction:

Gray-box fuzzing has been successful in revealing vulnerabilities in programs. However, the random mutation strategy limits the performance of fuzzers when working with structured data. Specialized fuzz testers can handle complex structured data, but require additional syntax work and low throughput.

This paper explores a method of using large language models (LLMs) to enhance gray-box fuzzing. Leverage the pre-trained knowledge of LLMs to generate new valid inputs, and further fine-tune the model with pairwise mutation seeds to efficiently learn structured formats and mutation strategies. The LLM-enhanced fuzzer LLAMAFUZ outperformed its top competitors by experimenting on the standard bug benchmark Magma and a variety of real-world programs, finding an average of 41 more bugs and a total of 47 unique bugs across all experiments. IN ADDITION, LLAMAFUZZ DEMONSTRATES CONSISTENT PERFORMANCE IN TRIGGERING ERRORS AND REACHING ERRORS.

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (2)

Research Objectives:

This study aims to use large language models (LLMs) to improve the performance of gray-box fuzzing when processing structured data. Traditional gray-box fuzzing is inefficient in generating structured data, while LLMs have pre-trained knowledge of data transformation and formatting to generate new valid inputs. By fine-tuning the LLM to learn structured formats and mutation strategies, the performance of the fuzzer can be enhanced and more vulnerabilities can be discovered.

Research Contributions:

1. An LLM-enhanced mutation strategy is proposed that can be applied to binary and text data formats.

2. Provides a solution between a general-purpose fuzzer and a specialized fuzzer, capable of learning patterns of structured seeds and mutating.

3. Provide experimental evidence that LLM can enhance the mutation process and improve code coverage.

4. Experiments are used to explain how LLMs can enhance the fuzzing process.

5. A LIGHTWEIGHT ASYNCHRONOUS APPROACH WAS DESIGNED THAT LEVERAGES A COMBINATION OF LLM AND FUZZER TO ENABLE LLAMAFUZZ TO BE EASILY DEPLOYED ON A SINGLE GPU OR MULTIPLE GPUS.

Fuzzing is an automated software testing technique that uses test seeds to find vulnerabilities in a target program or application. Over the past few years, gray-box fuzzing has gained attention for its effectiveness in discovering new vulnerabilities. As the complexity of software systems continues to increase, the need for adaptive test inputs becomes increasingly important. While there have been some successes with random variation, it has encountered bottlenecks in generating structured data. Generic gray-box fuzzers achieve high throughput with bit-level variation, but when dealing with applications that require structured input, blind random bit-level variation tends to compromise the integrity of the data format, resulting in inefficient seeds.

To speed up this process, Honggfuzz proposes a shared file corpus to support multi-process and multi-threaded operation, which improves the throughput of generating more test cases. However, simply increasing throughput and adding more random variation strategies creates bottlenecks when processing structured seeds. AFL++ and honggfuzz require a lot of experimentation to mutate into effective structured seeds. In addition, fuzzer testers that use a random strategy have inconsistent results. To alleviate this uncertainty, multiple replicate experiments are required to make fair comparisons. We propose to use large language models (LLMs) to augment the mutation process in fuzz testing, to generate new valid inputs by pre-training LLMs to understand data transformation and format information, and to learn specific structured seed patterns and mutation strategies by fine-tuning LLMs to find a balance between general-purpose fuzzers and specialized fuzzers.

Gray-box fuzzing has attracted attention for its effectiveness in finding vulnerabilities in many real-world programs. However, as the complexity of software development increases, many programs use highly structured data formats, which poses significant challenges to traditional fuzzing techniques. Traditional fuzzing is mainly mutated at the bit level, and it takes a lot of trial and error to effectively mutate this structured data. Syntax-based fuzzing provides a way to generate well-structured seeds by human-specified syntax, guaranteeing that the generated input is syntactically valid and diverse. However, grammar-guided fuzzing requires additional domain knowledge, which limits the possibility of its widespread use.

LLAMAFUZZ'S RESEARCH METHODOLOGY CONSISTS OF THREE MAIN PHASES:

1. Fine-tuning preparation: First, we collected data from FuzzBench and AFL++ experiments to create a diverse training set. To ensure that the LLM is able to handle a wide range of data formats, we have introduced a data transformation method that converts binary input files into a uniform hexadecimal representation. Doing so not only enables the LLM to understand and process different data formats, but also ensures the diversity and validity of the training data.

2. Fine-tuning LLMs for mutation: In this phase, we fine-tune the pre-trained LLMs so that they can learn specific structured seed patterns and mutation strategies. By supervising fine-tuning on structured data, LLMs are able to adjust their weights to accurately understand the input syntax and generate effective variation patterns. We used a step-by-step prompting approach to guide the LLM to generate variant output in the expected format.

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (3)

3. Integrate fuzzers and LLMs: In order to solve the contradiction between the slow generation speed of LLM and the high throughput of gray-box fuzzing, we design an asynchronous communication method to integrate fuzzers and LLMs. In this process, the current seed is converted to a hexadecimal representation and sent to the LLM, which mutates and returns the new seed to the fuzzer. Asynchronous processing improves overall testing efficiency by ensuring that the fuzzer can continue to efficiently process other tasks while waiting for the LLM to generate the mutation seed.

TO EVALUATE THE LIMITATIONS OF LLM IN ADDRESSING THE LIMITATIONS OF TRADITIONAL FUZZING IN PROCESSING STRUCTURED DATA, LLAMAFUZZ WAS IMPLEMENTED BY EXTENDING AFL++ AND EVALUATED ON TWO BENCHMARKS. Experiments include performance in Magma benchmarks and actual open-source programs. The results show that LLAMAFUZZ outperforms existing fuzz testers in terms of the number of bug discoveries and code coverage, demonstrating its advantages in processing structured data. Specifically, LLAMAFUZ found an average of 41 more bugs than its top competitors in the Magma benchmark, for a total of 47 unique bugs. In the actual open source program test, LLAMAFUZZ showed a significant increase in code coverage on 10 out of 15 fuzzing targets, with an average increase of 27.19%.

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (4)

Fuzz testing is an automated, random software testing technique used to find vulnerabilities in a target program or application. Traditional fuzzing methods include black-box fuzzing, white-box fuzzing, and gray-box fuzzing. Black-box fuzzing does not understand the program structure, and mainly achieves high execution volume by randomly generating test inputs, but the effect is limited. White-box fuzzing uses program analysis to improve code coverage, but it takes a long time. Gray-box fuzzing combines the advantages of white-box and black-box fuzzing, and generates more valuable test seeds through a feedback mechanism to improve testing efficiency.

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (5)

In this paper, we propose a method to enhance gray-box fuzzing using large language models (LLMs) to pre-train and fine-tune LLMs to effectively generate and mutate structured data. Experimental results show that LLAMAFUZZ is better than the existing fuzz testers in terms of the number of bug discoveries and code coverage, demonstrating its advantages in processing structured data. The success of LLAMAFUZZ has verified the potential of LLM in improving the efficiency of fuzzing testing and the ability to find vulnerabilities, and has a wide range of application prospects.

Original author: Interpreting agents of the paper

Proofreading: Little Coconut Wind

【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (6)
【Essay Speed Reading】| LLAMAFUZZ: GRAY-BOX FUZZ TESTING FOR LARGE LANGUAGE MODEL ENHANCEMENT (2024)
Top Articles
Latest Posts
Article information

Author: Chrissy Homenick

Last Updated:

Views: 6533

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Chrissy Homenick

Birthday: 2001-10-22

Address: 611 Kuhn Oval, Feltonbury, NY 02783-3818

Phone: +96619177651654

Job: Mining Representative

Hobby: amateur radio, Sculling, Knife making, Gardening, Watching movies, Gunsmithing, Video gaming

Introduction: My name is Chrissy Homenick, I am a tender, funny, determined, tender, glorious, fancy, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.