SymPcNSGA-Testing: A Hybrid Approach to Mitigate Path Explosion in Software Programs ()
1. Introduction
The path explosion problem poses a significant barrier in the domain of software testing, making it nearly impossible to exhaustively explore all execution paths in large or complex software. Many techniques have been used, such as Symbolic Execution, which explores multiple execution paths by interpreting program inputs as symbolic variables and resolving constraints using SMT solvers. This technique has been adopted in several widely used tools, such as KLEE, due to its ability to systematically and automatically uncover hard-to-find execution paths [1]. However, as software systems grow in complexity, so does the number of potential execution paths, which makes exhaustive exploration increasingly impractical.
One of the main obstacles encountered in symbolic execution is the path explosion phenomenon, in which the number of possible execution paths grows exponentially as the complexity of the program increases. This explosion renders full exploration infeasible, even for relatively small programs, and limits the scalability and effectiveness of symbolic analysis. As a result, some tools find it difficult to achieve high branch coverage efficiently, both in terms of time and computational resources. This leads us to a critical research question: How can we efficiently reduce the number of paths to be analyzed while preserving maximum branch coverage?
Many strategies have been proposed to mitigate path explosion. Tools such as KLEE implement search heuristics like Depth-First Search (DFS), Breadth-First Search (BFS), and various NURS strategies to guide path exploration. Other techniques include state merging, where similar program states are combined to reduce redundancy [2], and state pruning, which discards states deemed less promising [2]. Meanwhile, Search-Based Software Testing (SBST) introduces metaheuristics like genetic algorithms to explore input domains and maximize coverage [3]. More recent approaches include machine learning-guided exploration [4] and concolic testing [2].
While these techniques provide valuable insights, they each have significant limitations. Heuristic-based search strategies may suffer from local optima, failing to explore diverse paths. State merging and pruning risk losing critical execution behaviors. Search-Based Software Testing techniques, like genetic algorithms, although powerful in searching input spaces, often lack structural awareness of symbolic paths and do not exploit similarities among them. Furthermore, few existing works systematically integrate path similarity reduction prior to optimization, resulting in redundancy and suboptimal selection of test paths.
To address these shortcomings, we introduce SymPcNSGA-Testing (Symbolic execution, Path clustering and NSGA-II Testing), a novel hybrid methodology for mitigating path explosion in symbolic execution. Our approach consists of three phases:
1) Symbolic execution using KLEE to generate a diverse set of execution paths;
2) Clustering of symbolic paths using the Ktest-cluster algorithm to group structurally similar paths;
3) Multi-objective optimization via NSGA-II to select a minimal, representative subset of paths that ensures maximum branch coverage.
The key research questions we explore are:
RQ1: How can the combination of symbolic execution, a path clustering strategy, and multi-objective evolutionary algorithms improve the efficiency of software testing in the face of path explosion?
RQ2: How effective is SymPcNSGA-Testing in achieving maximum branch coverage while mitigating path explosion?
2. Related Work
The path explosion problem is one of the most critical challenges in symbolic execution and search-based software testing. A wide range of techniques has been proposed to address it, including parallelization, heuristics, clustering, and evolutionary multi-objective algorithms. This section reviews existing approaches and highlights their contributions and limitations in relation to our proposed SymPcNSGA-Testing methodology.
2.1. Parallel and Distributed Symbolic Execution
Several works have focused on parallelizing symbolic execution to reduce exploration time. Staats and Păsăreanu [5] introduced Simple Static Partitioning, a method that partitions the symbolic execution tree using preconditions and assigns partitions to different processors. This method achieved significant speedups (up to 90×), but its static nature may leave relevant paths unexplored.
Cloud9 [6], proposed by Bucur et al., tackles path explosion by distributing symbolic execution over clusters of commodity machines. Cloud9 offers linear scalability and integrates well with existing test suites. However, it does not reduce the number of paths explored, nor does it include any strategy to eliminate redundant or similar paths.
2.2. Heuristics and Search-Guided Execution
Heuristics have been introduced to guide symbolic execution toward unexplored or critical paths. van Vliet [7] extended the OOX verification engine with several heuristics, including minimal-distance-to-uncovered, random-path, and round-robin combinations. While they improve coverage in some cases, these heuristics do not eliminate redundant paths after execution.
Xiao et al. [8] addressed path explosion using heuristic learning, search strategies, and function abstraction to improve vulnerability discovery. Yet, the absence of empirical comparisons and a lack of path reduction mechanisms limit the scope of their contributions.
Zhang et al. [9] proposed MuSE (Multiplex Symbolic Execution), a technique that integrates constraint solving directly into symbolic path exploration, enabling concurrent exploration of multiple paths. MuSE achieved significant speedups across multiple solvers. Nonetheless, it does not reduce the total number of paths or apply selection heuristics such as genetic algorithms.
Bessler et al. [10] introduced Metrinome, a tool that statically estimates the potential severity of path explosion based on structural complexity metrics (e.g., cyclomatic complexity, loop depth). Although useful for planning symbolic analysis, Metrinome does not propose any execution or optimization mechanism to reduce the number of generated paths.
2.3. Evolutionary and Feedback-Based Approaches
Genetic algorithms have been employed in several works to improve test generation. Gong et al. [11] proposed ATCG-PC, a multi-objective genetic algorithm where paths are grouped and optimized in parallel using subpopulations. While this improves coverage, no selection mechanism is applied to extract a minimal subset of relevant paths.
Zhu et al. [12] improved test generation by grouping paths based on similarity, identifying common prefix paths, and guiding symbolic execution with reduced constraints. This results in performance gains but still does not explicitly minimize the number of selected paths for testing.
Semujju et al. [13] introduced a feedback-driven approach that removes path groups from the search space if they show no improvement across generations. This mechanism prevents wasting search efforts on unreachable paths. However, this strategy may inadvertently drop critical paths if the dropout condition is too strict.
2.4. State Reduction Techniques
State merging and pruning are also explored to mitigate path explosion. Copeland [2] combined dynamic state merging and pruning in a KLEE prototype, outperforming the baseline on several Coreutils programs. However, merging can complicate constraint tracking and does not address test selection or redundancy reduction.
3. Motivating Example
Let consider the following Figure 1, which represents the C code of a program and its Control Flow Graph. It contains 101 independent conditions, each controlling the increment of a counter if a specific value is met. The critical state (ERROR) is reached only if all 100 conditions are met simultaneously. Each if(inp[i] == i + 1) condition generates two possible branches: either the condition is true or it is false. Given that there are 101 independent conditions, the total number of execution paths is: Number of paths = 2101 ≈ 2.53 × 1030. The critical state (ERROR) is reached only if all conditions are met—in other words, only 1 path out of 2101 leads to the error. This makes the probability of reaching this path by chance almost zero and reinforces the importance of optimization or path selection methods. This image clearly highlights the problem of path explosion encountered during software testing.
Figure 1. Motivating example of path explosion problem.
4. Problem Formulation
The path explosion problem in testing leads to a combinatorial explosion of execution paths, making exhaustive analysis intractable. The goal of SymPcNSGA-Testing is to mitigate this explosion while ensuring representative and diverse test coverage. To this end, we formulate the selection of execution paths as a multi-objective optimization problem addressed using NSGA-II.
4.1. Aim
Given a set of execution paths
partitioned into
clusters
obtained using a clustering algorithm, we aim to construct a subset
(an individual) such that:
At least one path is selected from each cluster:
,
.
The selected subset
maximizes branch coverage across all paths it contains.
The number of selected paths is minimized to reduce redundancy.
4.2. Decision Variables
Each individual
is a combination of execution paths selected from the original path set
. The size
of each individual is variable but must satisfy the coverage constraint for all clusters.
4.3. Objectives Functions
We define the problem as a bi-objective minimization problem:
Objective 1 (Maximize Coverage):
(1)
Objective 2 (Minimize Path Set Size):
(2)
Objective 1 aims to maximize branch coverage. Although NSGA-II is traditionally formulated for minimization problems, the algorithm is used here in a configuration that directly supports maximization objectives. Consequently, individuals exhibiting higher coverage values are systematically favored during the selection process.
4.4. Constraints
To ensure representativity across all clusters, we introduce the constraint:
(3)
This guarantees that each individual includes at least one path from every cluster, promoting diversity and representativity.
4.5. Optimization Strategy
The NSGA-II algorithm is applied to evolve a population of such individuals, balancing the trade-off between maximizing branch coverage and minimizing the number of selected paths. The non-dominated sorting and crowding distance mechanisms of NSGA-II ensure the preservation of diversity and the convergence toward a Pareto-optimal front of path subsets.
5. Methodology
In this section, we describe the proposed SymPcNSGA-Testing methodology, which aims to mitigate the path explosion problem while maximizing branch coverage during test phase. This combined methodology brings together symbolic execution, clustering of execution paths, and optimization using genetic algorithms in a structured and complementary sequence of steps. The entire process comprises three main phases, which can be seen in Figure 2.
5.1. Symbolic Execution with KLEE
The first step of the SymPcNSGA-Testing methodology is based on the use of symbolic execution using the KLEE 3.2 tool. KLEE 3.2 is an open-source symbolic execution engine that automatically generates test cases by exploring all possible execution paths of a program, based on symbolic inputs. During this phase, the program’s inputs are rendered symbolic so that KLEE can follow all conditional branches encountered. At each conditional branch, KLEE generates a new path based on the accumulated constraints.
Figure 2. SymPcNSGA-testing architecture.
For each path explored, KLEE generates a .ktest file, which contains the concrete values of the inputs used to follow that specific path. It also generates a .path file representing the decision sequence taken through branches. These two types of generated files will be essential in the next steps of our methodology. The process can be described by the following algorithm (Algorithm 1).
5.2. Path Clustering with Ktest-Cluster
The second step of our methodology aims to group execution paths into sets to reduce redundancy, facilitate selection, and limit the impact of combinatorial explosion. To this end, we propose KTest-Cluster (K, the number of cluster, that we will produce), an innovative clustering algorithm based not on symbolic paths or executed branches, but directly on the .ktest files produced by KLEE.
Algorithm 1. Symbolic execution with KLEE.
Unlike traditional approaches that analyze paths from concrete values generated, KTest-Cluster leverages input values contained in .ktest files, which represent the test cases associated with each execution.
Step 1: Data Preparation
Each .ktest file contains the concrete inputs associated with a given path. These files are first processed to extract the input data in a structured manner using KLEE’s ktest-tool. Each .ktest file is then transformed into a numerical vector, where each component represents an input variable.
Step 2: Vector Normalization
Before clustering, the vectors are normalized. The vectors are padded with zeros at the end so that they all have the same maximum length. This is to prevent certain large dimensions from overwhelming others in the distance calculation.
Step 3: Applying K-means with Euclidean Distance
The normalized vectors are then clustered using the K-means algorithm, using Euclidean distance as the similarity measure. In this work, K-means clustering with Euclidean distance was selected due to the numerical and fixed-length representation of execution paths, where each path is encoded as a vector of branch coverage features and execution characteristics. Euclidean distance provides an intuitive measure of similarity in this vector space by capturing the overall geometric proximity between paths in terms of covered branches and execution. K-means was chosen for its computational efficiency and scalability, which is particularly important given the large number of paths generated by symbolic execution.
Alternative distance metrics (e.g., cosine or Jaccard distance) and clustering algorithms (e.g., hierarchical or density-based clustering) could also be considered. However, these approaches either introduce higher computational costs or require additional parameters that are difficult to tune in large-scale symbolic execution settings. In this study, the choice of K (the number of clusters) is based on the Rule of thumb [14] technique in which
where
is the number of data points. In our case,
is the number of total path executions produced by Symbolic Execution. The result of this process is a partitioning of the paths into coherent groups of similar input behaviors. A systematic comparison of clustering techniques is left as future work (Algorithm 2).
Algorithm 2. Ktest-Cluster algorithm.
5.3. Optimization with Non-Dominated Sorting Genetic Algorithm-II (NSGA-II)
The final phase of the SymPcNSGA-Testing methodology focuses on reducing path explosion through a multi-objective optimization using NSGA-II (Non-dominated Sorting Genetic Algorithm-II). This step seeks to balance two conflicting goals: 1) maximize branch coverage and 2) minimize the number of selected execution paths. Each individual in the population represents a subset of symbolic paths and is evaluated by a fitness pair
where:
Branch coverage was computed using KLEE’s branch coverage instrumentation, where each conditional branch is represented as a binary value indicating whether it was exercised by at least one execution path, and coverage is reported at the source-level branch granularity as the percentage of covered branches over the total number of branches.
NSGA-II applies non-dominated sorting to classify individuals into Pareto fronts. An individual
is said to dominate another
(denoted
) if:
with at least one strict inequality. This mechanism identifies optimal trade-offs between coverage and cost.
Evolutionary operators such as crossover and mutation are applied to ensure that each individual maintains at least one path per cluster, thus preserving structural diversity.
The algorithm iterates over a fixed number of generations, gradually improving the population. At the end of the process, the first Pareto front offers a set of non-dominated solutions representing various trade-offs. These allow testers to select configurations with high coverage, minimal redundancy, or a balanced compromise, depending on testing goals and resources (Algorithm 3).
Algorithm 3. SymPcNSGA-II optimization.
6. Experimental Setup
To evaluate the effectiveness of our proposed methodology, SymPcNSGA-Testing, we conducted a series of experiments on 10 real-world programs from the GNU Coreutils 9.5 suite. This section presents the experimental environment, datasets used, and parameter settings.
6.1. Environment Configuration
All experiments were conducted on a machine running Linux Mint 21 with the following hardware specifications:
CPU: Intel® Core™ i5-11th generation @ 2.90 GHz (8 cores).
RAM: 16 GB.
Storage: HDD 1 TB.
KLEE Version: 3.2 with LLVM 14.0.0, Clang 14.0.0, GCC 11.4.0.
Python Version: 3.10.
Scikit-learn Version: 1.2.0.
Clustering: Implemented via Scikit-learn (K-means).
Genetic Algorithm: Implemented in Python.
6.2. Programs under Test
To evaluate the SymPcNSGA-Testing methodology, we selected 10 diverse programs from GNU Coreutils 9.5: basename, cut, csplit, dirname, echo, expr, printf, sort, tr, uniq. These programs present a variety of input/output behaviors and logical complexities.
Each program was executed 20 times, with a single execution taking approximately 2 hours, totaling:
20 runs × 10 programs × 2 hours = 400 hours of experimentation
6.3. Methodology Parameters
This subsection presents the parameters used at each stage of the SymPcNSGA-Testing methodology. Each parameter is crucial for optimizing the performance of the genetic algorithm, clustering, and symbolic execution.
6.3.1. Symbolic Execution Parameters
The following Table 1 presents the different parameters that we have used for Symbolic Execution.
Table 1. Symbolic execution parameters.
Parameter |
Description |
––libc=uclibc |
Uses the minimal uClibc library for standard C functions |
––posix-runtime |
Enables POSIX-like environment simulation for command-line arguments, stdin, and file I/O |
––sym-arg N |
Declares one symbolic argument of up to N characters |
6.3.2. Clustering Parameter
The following Table 2 presents the description of the parameter k that is essential for clustering step.
Table 2. Clustering parameter.
Parameter |
Description |
Value Used |
Number of clusters
|
Specifies the number of groups formed from symbolic execution paths |
, where
is the number of paths |
6.3.3. NSGA-II Parameters
The following Table 3 presents the different parameters that we have used for running NSGA-II algorithm.
Table 3. NSGA-II parameters.
Parameter |
Description |
Value |
Population size |
Number of individuals per generation |
50 |
Number of generations |
Number of evolutionary iterations |
100 |
Selection method |
Based on Pareto ranking and diversity |
Binary Tournament |
Crossover type |
Recombination of individuals |
One-Point Crossover |
Crossover rate |
Probability of crossover per pair |
100% |
Mutation type |
Replacement of a path with another from the same cluster |
Intra-cluster replacement |
Mutation rate |
Probability of mutation per individual |
10% |
Termination criteria |
Stopping condition |
100 Generations |
Individual size |
Variable size; each individual includes at least one path per cluster |
|
The NSGA-II algorithm was configured with a population size of 50 individuals and executed for 100 generations. Each individual represents a variable-length combination of execution paths, with the constraint that it must be greater than and at least one path is selected from each cluster to ensure representativeness.
One-point crossover operator was used with a probability of Pc = 100%, enabling the recombination of path subsets between individuals. Mutation was applied with probability Pm = 10%, where mutation consists of randomly adding, removing, or replacing a path within a cluster. These operators were selected to preserve cluster coverage while maintaining population diversity.
Parameter values were chosen empirically based on preliminary experiments and commonly adopted settings in SBST literature, providing a trade-off between convergence speed and solution diversity.
6.4. Comparative Tools
The following Table 4 lists the different KLEE configurations used for comparison with our proposed methodology. Each configuration aims to tackle the path explosion problem using various strategies.
Table 4. Comparative KLEE techniques.
No. |
Technique |
Purpose |
KLEE Parameters |
0 |
KLEE-BASE |
Standard configuration with symbolic execution using POSIX system calls |
––posix-runtime ––libc=uclibc |
1 |
Covering-New |
Reduces redundant outputs by generating tests only for new code coverage |
––only-output-states-covering-new |
2 |
NURS-Depth |
Explores longer execution paths first to reach deep branches |
––search=nurs:depth |
3 |
BFS + Merge |
Combines breadth-first search with state merging and new coverage focus |
––search=bfs – use-merge ––only-output-states-covering-new |
4 |
Random-Path |
Introduces randomness in path selection for diversity |
––search=random-path |
5 |
DFS |
Depth-first search explores long paths first, may ignore breadth |
––search=dfs |
6 |
State-Merge |
Merges execution states to limit the number of paths explored |
––use-merge |
7 |
Covering-New + Merge |
Enhances coverage while merging redundant states |
––only-output-states-covering-new ––use-merge |
8 |
MD2U |
Prioritizes paths closer to uncovered instructions to guide search |
––search=nurs:md2u ––only-output-states-covering-new |
9 |
NURS CovNew + DFS |
Combines depth-first andcoverage-guided heuristics |
––search=nurs:covnew ––search=dfs |
7. Results and Discussion
7.1. Answer to RQ1
The first research question (RQ1) aims to analyze the contribution of our SymPcNSGA-Testing methodology compared to different KLEE techniques in the context of generating execution paths. More specifically, we seek to evaluate whether SymPcNSGA-Testing better manages the combinatorial explosion of paths in the tested program.
The analysis of our experimental results, illustrated in Table 5, shows that SymPcNSGA-Testing systematically generates fewer paths than KLEE-BASE and its different advanced techniques. This reduction in the number of paths demonstrates the ability of our approach to control path explosion, a major challenge in symbolic execution.
Table 5. Number of paths generated by each KLEE input and by SymPcNSGA-Testing.
Programs |
KLEE Input |
Our Approach |
KLEE-BASE |
A |
B |
C |
D |
E |
F |
G |
H |
I |
SymPcNSGA-Testing |
basename |
110 |
27 |
34 |
23 |
27 |
29 |
30 |
30 |
30 |
33 |
39 |
csplit |
5306 |
45 |
48 |
39 |
44 |
46 |
47 |
48 |
47 |
51 |
64 |
cut |
1244 |
56 |
69 |
51 |
57 |
55 |
59 |
57 |
68 |
68 |
62 |
dirname |
2314 |
22 |
34 |
22 |
22 |
21 |
23 |
23 |
24 |
34 |
66 |
echo |
3363 |
7 |
13 |
9 |
10 |
10 |
11 |
11 |
11 |
31 |
68 |
expr |
1375 |
25 |
37 |
23 |
25 |
25 |
25 |
25 |
27 |
35 |
69 |
printf |
10593 |
133 |
129 |
96 |
115 |
109 |
129 |
129 |
120 |
133 |
22 |
sort |
6186 |
75 |
77 |
61 |
64 |
59 |
86 |
85 |
92 |
87 |
62 |
tr |
5286 |
36 |
42 |
31 |
36 |
40 |
39 |
39 |
42 |
44 |
50 |
uniq |
3530 |
5 |
7 |
5 |
10 |
5 |
5 |
5 |
5 |
6 |
68 |
A: Covering-New; B: Covering-New + NURS-Depth; C: Covering-New + BFS + Merge; D: Covering-New + Random; E: Covering-New + DFS; F: Covering-New + Merge; G: Covering-New + NURS-CovNew + Merge; H: Covering-New + NURS-MD2U; I: Covering-New + NURS-CovNew + DFS.
Among the ten tested programs, SymPcNSGA-Testing produces fewer paths than KLEE-BASE in all cases:
basename: 39 paths vs 110 (KLEE-BASE).
csplit: 64 paths vs 5306 (KLEE-BASE).
cut: 62 paths vs 1244 (KLEE-BASE).
dirname: 66 paths vs 2314 (KLEE-BASE).
echo: 68 paths vs 3363 (KLEE-BASE).
expr: 69 paths vs 1375 (KLEE-BASE).
printf: 22 paths vs 10593 (KLEE-BASE).
sort: 60 paths vs 6186 (KLEE-BASE).
tr: 50 paths vs 5286 (KLEE-BASE).
uniq: 68 paths vs 3580 (KLEE-BASE).
Comparison with Advanced KLEE Strategies
In most cases, SymPcNSGA-Testing generates fewer paths than advanced KLEE techniques using specific strategies such as NURS, BFS, DFS, random-path, and use-merge.
For example:
On expr, SymPcNSGA-Testing generates 69 paths while Covering-New + NURS-Depth produces 37 and Covering-New + DFS generates 35. SymPcNSGA-Testing is more efficient in reducing the number of paths.
On sort, SymPcNSGA-Testing generates 60 paths, fewer than most techniques except Covering-New + DFS (59 paths).
On all the programs, Sampans-Testing outperforms KLEE-BASE.
Case Study: basename—Drastic Reduction of Path Explosion
Interpretation: basename processes text inputs and includes multiple branches based on arguments, leading to significant path growth. SymPcNSGA-Testing eliminates redundant paths while maintaining sufficient coverage. This illustrates the effectiveness of clustering and genetic algorithms.
Case Study: cut—Drastic Reduction of Path Explosion
Interpretation: cut manipulates structured inputs and contains many conditions. SymPcNSGA-Testing reduces the number of execution paths by over 95%, while maintaining branch representativity.
Case Study: printf—Exceptional Path Reduction
Interpretation: printf includes multiple formats and features, leading to exponential path explosion. Our approach reduces the number of paths by more than 99% while retaining the essential test cases. This confirms its suitability for highly combinatorial programs.
These results confirm that SymPcNSGA-Testing is a robust and effective approach for mitigating the path explosion problem in symbolic execution while maintaining sufficient test coverage.
7.2. Answer to RQ2
The second research question (RQ2) aims to evaluate whether reducing the number of paths tested compromises the quality of the generated tests and, consequently, the reliability of the tested software. The goal is to examine whether the SymPcNSGA-Testing methodology maintains sufficient test coverage while optimizing path exploration to limit combinatorial explosion.
Among the ten tested programs, SymPcNSGA-Testing achieves branch coverage rates comparable to those obtained with KLEE-BASE and its different techniques. Table 6 presents these results. For example:
Table 6. Comparison of branch coverage (C) generated by KLEE inputs and SymPcNSGA-Testing for each program.
Program |
Metric |
KLEE-BASE |
A |
B |
C |
D |
E |
F |
G |
H |
I |
SymPcNSGA-Testing |
basename |
C (%) |
73.19 |
72.96 |
73.19 |
72.96 |
72.96 |
72.97 |
72.96 |
72.96 |
72.96 |
73.19 |
73.19 |
csplit |
C (%) |
84.70 |
81.03 |
75.04 |
74.12 |
79.40 |
78.12 |
80.21 |
80.28 |
80.82 |
79.76 |
84.28 |
cut |
C (%) |
85.12 |
79.27 |
78.76 |
78.52 |
79.85 |
78.90 |
78.82 |
78.86 |
83.61 |
78.80 |
85.07 |
dirname |
C (%) |
73.03 |
81.03 |
75.04 |
74.12 |
79.40 |
78.12 |
80.21 |
80.28 |
80.82 |
79.76 |
73.00 |
echo |
C (%) |
70.05 |
71.50 |
59.87 |
74.91 |
56.14 |
59.85 |
59.80 |
59.80 |
59.80 |
67.46 |
69.40 |
expr |
C (%) |
75.40 |
75.28 |
75.28 |
75.28 |
75.28 |
74.92 |
74.92 |
75.24 |
75.28 |
75.25 |
75.28 |
printf |
C (%) |
76.36 |
74.01 |
71.86 |
77.13 |
73.41 |
77.71 |
71.86 |
71.87 |
71.92 |
71.86 |
73.00 |
sort |
C (%) |
92.09 |
79.63 |
89.65 |
80.48 |
75.30 |
76.92 |
70.53 |
70.59 |
70.19 |
75.37 |
86.03 |
tr |
C (%) |
84.93 |
76.53 |
79.64 |
74.18 |
76.99 |
80.55 |
79.47 |
79.42 |
80.36 |
79.32 |
84.44 |
uniq |
C (%) |
72.56 |
72.02 |
72.22 |
72.60 |
72.43 |
72.13 |
72.02 |
72.02 |
72.16 |
72.12 |
72.56 |
A: Covering-New; B: Covering-New + NURS-Depth; C: Covering-New + BFS + Merge; D: Covering-New + Random; E: Covering-New + DFS; F: Covering-New + Merge; G: Covering-New + NURS-CovNew + Merge; H: Covering-New + NURS-MD2U; I: Covering-New + NURS-CovNew + DFS.
csplit: 84.28% for SymPcNSGA-Testing compared to 84.70% for KLEE-BASE, which achieves the best coverage. The difference is only 0.42%.
cut: 85.07% for SymPcNSGA-Testing compared to 85.12% for KLEE-BASE. The difference is only 0.05%.
dirname: 73.00% for SymPcNSGA-Testing compared to 73.03% for KLEE-BASE and 81.03% for Covering-New, which achieves the best coverage. Differences of 0.03% and 8.03% respectively.
echo: 69.40% for SymPcNSGA-Testing compared to 70.05% for KLEE-BASE and 71.50% for Covering-New. Differences of 0.65% and 2.1%.
printf: 73% for SymPcNSGA-Testing compared to 76.36% for KLEE-BASE and 77.71% for Covering-New + DFS. Differences of 3.36% and 4.71%.
sort: 86.03% for SymPcNSGA-Testing compared to 92.09% for KLEE-BASE, which achieves the best coverage.
These results show that although SymPcNSGA-Testing explores far fewer paths than KLEE, the difference in branch coverage remains minimal.
Specific Cases
Program uniq:
KLEE-BASE explores 3530 paths and achieves 72.56% coverage (best).
Covering-New + BFS + Merge explores 5 paths and achieves 72.60% coverage.
SymPcNSGA-Testing retains 68 paths and achieves 72.56% coverage.
Interpretation:
98.07% reduction in the number of tested paths.
Minimal drop in coverage (−0.01%).
SymPcNSGA-Testing achieves the same coverage as KLEE-BASE despite drastically reducing the number of paths. This confirms the effectiveness of SymPcNSGA-Testing in generating fewer, but more representative paths.
Program basename:
KLEE-BASE explores 110 paths with 73.19% coverage.
Covering-New + NURS-Depth and Covering-New + NURS-CovNew + DFS both achieve 73.19% coverage with only 34 and 33 paths, respectively.
SymPcNSGA-Testing retains 39 paths and achieves 73.19% coverage.
Interpretation:
65% reduction in tested paths.
Identical coverage rate compared to other KLEE techniques.
This demonstrates that SymPcNSGA-Testing not only reduces the number of paths but also maintains high test quality through equivalent branch coverage.
Program expr:
Interpretation:
Program tr:
Interpretation:
99.05% reduction in tested paths.
Minimal coverage drop (−0.49%).
SymPcNSGA-Testing achieves high coverage while significantly reducing the number of paths.
8. Conclusions
Our methodology demonstrated a significant reduction in the number of paths analyzed, thus limiting the combinatorial explosion inherent in symbolic execution. The integration of the genetic algorithm optimized path selection by favoring those with a high impact on branch coverage. Moreover, SymPcNSGA-Testing proved more scalable than traditional approaches, especially for programs where KLEE quickly reaches memory saturation.
The experimental evaluation was conducted on the GNU Coreutils benchmark, which is widely adopted in the symbolic execution and software testing literature, ensuring both reproducibility and comparability of the results. While the evaluation relies on this standard experimental framework, the proposed methodology remains generic and independent of the specific characteristics of the analyzed programs.
These results highlight the potential of SymPcNSGA-Testing as an effective solution to the path explosion problem in symbolic execution. By combining clustering and evolutionary optimization, this methodology enables better management of the exploration space while ensuring high branch coverage. These advancements offer promising prospects for improving structural testing, making symbolic execution applicable to larger-scale programs.
Although the SymPcNSGA-Testing methodology has effectively mitigated the path explosion problem, several directions for improvement and extension can be considered to further optimize its performance and broaden its applicability. These include the integration of machine learning techniques, resource optimization, parallel execution, and evaluation on industrial-scale codebases.