Algorithm Z1 Accelerated and Parallel Generation of Integer Partitions in Standard Representation Form
—Improving the Fastest Existing Algorithm ZS1 ()
1. Introduction
Mr. Zoghbi’s Master’s thesis, Algorithms for Generating Integer Partitions (University of Ottawa, 1993) [1], together with the subsequent paper Fast Algorithms for Generating Integer Partitions (Zoghbi & Stojmenović, International Journal of Computer Mathematics, 1998, Vol. 70, pp. 319-332) [2], introduced the ZS1 and ZS2 algorithms. These algorithms quickly gained widespread recognition within the combinatorics community due to their simplicity, efficiency, and strong computational performance [3]-[6]. In particular, Algorithm ZS1 has been applied in a broad range of domains, including combinatorics [7]-[14], networking, agriculture [15], music [6], gaming [16], healthcare applications such as DNA analysis [17] [18], civil engineering [19], and other areas.
The network graphs shown in Figure 1 illustrate representative partition structures of the integer 10, where nodes correspond to parts and edges represent their associated values. For clarity, only a subset of the complete set of partitions is displayed.
Figure 1. Mapping graphs into partitions.
This paper presents several contributions:
Algorithm Z1, which efficiently processes partitions whose rightmost part is 3 followed, when applicable, by a sequence of 1’s, thereby eliminating iterative processing for a substantial subset of cases.
A direct concatenation strategy in Algorithm Z1 for subset cases where the rightmost part equals 3, thereby avoiding unnecessary intermediate steps and reducing computational overhead.
Algorithm Z3, designed to generate fairly balanced and well-distributed subsets of partitions.
A batch-processing mechanism that automatically initiates and manages parallel threads.
Algorithm Z1 (formerly AZ1), implemented in standard representation form and anti-lexicographic order as described in Section 6, achieves performance improvements of up to 36% in single-threaded execution and up to 89% in multi-threaded execution on identical hardware, compared with Algorithm ZS1.
Algorithms ZS1 and Z1 were evaluated on a single system using a multi-threaded execution model in which the partition space was divided into disjoint subsets according to ranges of the largest part.
All reported performance measurements exclude output operations.
2. Partitions with Parts 2 and 3
The number of integer partitions P(n) grows exponentially as n increases. As shown in Table 1, the proportion of partitions whose leftmost part is 2 or 3 followed, when applicable, by a sequence of 1 s, also increases steadily with n.
Table 1. Count_2 & 3 (x ≤ 3) percentage released from iterations.
n |
# of partitions P(n) |
Count_2 (ZS1) |
Count_2% (ZS1) |
Count_3 (Z1) |
Count_3 (x ≤ 3) (Z1) |
Count_2 & 3 (x ≤ 3) (Z1) |
Count_2 & 3 (x ≤ 3)% (Z1) |
10 |
42 |
22 |
52.38% |
8 |
6 |
28 |
66.67% |
50 |
204,226 |
147,273 |
72.11% |
35,620 |
16,942 |
164,215 |
80.41% |
100 |
190,569,292 |
150,198,136 |
78.82% |
28,579,511 |
10,681,693 |
160,879,829 |
84.42% |
120 |
1,844,349,560 |
1,482,074,143 |
80.36% |
263,565,625 |
92,092,200 |
1,574,166,343 |
85.35% |
150 |
40,853,235,313 |
33,549,419,497 |
82.12% |
5,479,813,969 |
1,759,520,420 |
35,308,939,917 |
86.43% |
160 |
107,438,159,466 |
88,751,778,802 |
82.61% |
14,137,782,672 |
4,428,070,930 |
93,179,849,732 |
86.73% |
170 |
274,768,617,130 |
228,204,732,751 |
83.05% |
35,500,619,847 |
10,860,692,344 |
239,065,425,095 |
87.01% |
Breakdown of the cases in which the leftmost part equals 3, followed, when applicable, by a sequence of 1 s. In Table 2, x denotes the number of 1 s following part 3 in a given partition.
Table 2. Leftmost part 3 Breakdown.
n |
Count_3 |
x = 0 |
x = 1 |
x = 2 |
x = 3 |
x > 3 |
x > 3% |
Count_3 (x ≤ 3) |
10 |
8 |
2 |
2 |
1 |
1 |
2 |
25.00% |
6 |
50 |
35,620 |
5237 |
4510 |
3872 |
3323 |
18,678 |
52.44% |
16,942 |
100 |
28,579,511 |
3,134,927 |
2,803,342 |
2,505,329 |
2,238,095 |
17,897,818 |
62.62% |
10,681,693 |
120 |
263,565,625 |
26,708,042 |
24,089,760 |
21,719,314 |
19,575,084 |
171,473,425 |
65.06% |
92,092,200 |
150 |
5,479,813,969 |
503,415,815 |
458,522,902 |
417,514,127 |
380,067,576 |
3,720,293,549 |
67.89% |
1,759,520,420 |
160 |
14,137,782,672 |
1,262,212,193 |
1,152,724,706 |
1,052,460,914 |
960,673,117 |
9,709,711,742 |
68.68% |
4,428,070,930 |
170 |
35,500,619,847 |
3,085,272,563 |
2,824,542,959 |
2,585,233,667 |
2,365,643,155 |
24,639,927,503 |
69.41% |
10,860,692,344 |
3. Algorithms ZS1 and Z1
3.1. General Overview
For all performance evaluations presented in this document, including both single-threaded experiments in Section 3 and the multi-threaded experiments in Section 5, the following methodology was applied:
Each benchmark was executed at least three times, and the reported results correspond to the average execution time.
Execution time was measured in nanoseconds by recording timestamps at the beginning and end of each run.
All programs were compiled exclusively using the MSYS MinGW64 compiler.
Benchmarks were performed on an AMD Ryzen 7 5800HS system (8 cores, 16 threads, 16 GB RAM).
Output operations were disabled during benchmarking by commenting them out using “//” in C language.
Performance improvements were determined by comparing the measured execution times.
Algorithms ZS1 and Z1 generate each partition exactly once in standard representation form and anti-lexicographic order.
Algorithms ZS1 and Z1 generate partitions within the bounded range [X1, X2], where n ≥ X2 ≥ X1 ≥ 1 for any given integer n.
3.2. Algorithm ZS1
Input integer -> n;
Input lowest value of leftmost part -> x1;
Input highest value of leftmost part -> x2;
If (x1 = 1) then {x1 ⇓ 2};
For I ⇓ 0 to n do a[i] ⇓ 1;
A[1] ⇓ x2+1; h ⇓ 1; m ⇓ n – x2; x ⇓ 0;
A[1] ⇓ n; h ⇓ 1; m ⇓ 1; x ⇓ 0;
While(a[1]>=x1) do {
If (a[h] = 2) then {m ⇓ m + 1, a[h] ⇓ 1; h ⇓ h – 1;}
else {
t ⇓ m – h + 1; a[h] ⇓ a[h] – 1; r ⇓ a[h];
while (t >= r) do {h ⇓ h + 1; a[h] ⇓ r; t ⇓ t – r;}
if (t < 2) then {m ⇓ h + t;}
else { m ⇓ h + 1; h ⇓ h + 1; a[h] ⇓ t;}
}
if (a[1] >= x1) then {for i ⇓ 1 to m do output a[i];}
}
if (a[1] == 1) then {for i ⇓ 1 to n do output a[i];}
Algorithm ZS1 was extended to allow users to specify explicit lower [X1] and upper [X2] bounds on the leftmost part of a partition, thereby restricting the generation process to a selected subset of the integer partitions of n. The algorithm processes partition whose rightmost nonunit part equals 2 with exceptional efficiency: in such case, the successor partition is generated using only three elementary operations. This behavior confirms that Algorithm ZS1 satisfies a constant average delay property established in Fast Algorithms for Generating Integer Partitions.
As a result, Algorithm ZS1 remains the fastest known algorithm in its class and continues to be widely used. furthermore, as n increases, the proportion of partitions containing at least one part equal to 2 followed, when applicable by a sequence of 1’s grows significantly, as illustrated in Table 1.
For n = 50, 72.11% of the partitions end with a 2 followed, when applicable by a sequence of 1 s.
For n = 170, this proportion increases to 83.05%.
Consequently, as n becomes larger, this behavior helps explain the strong empirical performance and scalability of Algorithm ZS1.
3.3. Algorithm Z1
Algorithm Z1 is an optimized variant of ZS1, designed to accelerate the generation of integer partitions in standard representation form and anti-lexicographic order. It supports explicit lower [X1] and upper [X2] bounds on the leftmost part, enabling restricted-range generation without altering the core structure of the algorithm. Its principal enhancement is a specialized fast path for partitions whose rightmost part is 3 followed, when applicable, by a sequence of 1 s. For the subset of cases 3 (x ≤ 3), illustrated in Table 1 and Table 2, this optimization eliminates the iterative redistribution loop used in Algorithm ZS1.
Algorithm Z1 handles such partitions by directly redistributing the rightmost 3 and propagating the transformation until all 3 s and any intermediate 2 s are reduced to 1 s. For example, for n = 10, the partition (4, 3, 3), triggers case x = 0, producing the sequence:
(4,3,2,1), (4,3,1,1,1),
In the final partition, the trailing 3 reappears immediately before the block of 1 s, triggering case x = 3. Algorithm Z1 then directly generates:
(4,2,2,2), (4,2,2,1,1), (4,2,1,1,1,1), (4,1,1,1,1,1,1).
Algorithm Z1
Input integer -> n;
Input lowest value of leftmost part -> x1;
Input highest value of leftmost part -> x2;
If (x1 = 1) then {x1 ⇓ 2;};
For I ⇓ 0 to n do a[i] ⇓ 1;
A[1] ⇓ x2 + 1; h ⇓ 1; m ⇓ n – x2; x ⇓ 0;
While(a[1]>=x1) do {
While(a[h] = 2) do {
m ⇓ m+1, a[h] ⇓ 1; h ⇓ h – 1;
for i ⇓ 1 to m do output a[i];
}
if (a[h] = 3) then {
x ⇓ m – h; a[h] ⇓ 2;
if (x > 3) then {
while (x > 0) do {h ⇓ h+1; a[h] ⇓ 2; x ⇓ x – 2;}
if (x=0) then {m ⇓ h + 1;}
else {m ⇓ h;}
}
else if (x = 0) then { // m=h
m ⇓ m + 1;
for i ⇓ 1 to m do output a[i];
a[h] ⇓ 1; h ⇓ h – 1; m ⇓ m + 1;
}
else if (x = 1) then {
for i ⇓ 1 to h do output a[i]; output “ 2”;
for i ⇓ 1 to h do output a[i]; output “ 1 1”;
a[h] ⇓ 1; h ⇓ h – 1; m ⇓ m + 2;}
else if (x =2) then {
for i ⇓ 1 to h do output a[i]; output “ 2 1”;
for i ⇓ 1 to h do output a[i]; output “ 1 1 1”;
a[h] ⇓ 1; h ⇓ h – 1; m ⇓ m + 2;}
else { // x=3
for i ⇓ 1 to h do output a[i]; output “ 2 2”;
for i ⇓ 1 to h do output a[i]; output “ 2 1 1”;
for i ⇓ 1 to h do output a[i]; output “ 1 1 1 1”;
a[h] ⇓ 1; h ⇓ h – 1; m ⇓ m + 2;
}
// for all cases x=0, x=1, x=2, x=3
for i ⇓ 1 to m do output a[i];
} /* a[h] =3 */
else if (a[h] > 3) then {
t ⇓ m – h + 1; a[h] ⇓ a[h] – 1; r ⇓ a[h]
while (t >= r) do {h ⇓ h+1; a[h] ⇓ r; t ⇓ t – r}
if (t <2) then {m ⇓ h + t;}
else { m ⇓ h + 1; h ⇓ h + 1; a[h] ⇓ t;}
for i ⇓ 1 to m do output a[i];
} /* a[h] > 3 */
} /* while a[1] >= x1 */
Each output partition is constructed by concatenating a fixed prefix with suffixes obtained through redistribution of the rightmost 3. For example, when n = 11, the partition (4, 3, 2, 2), has prefix (4, 3) which is then combined with all partitions generated from redistributing (2, 2), such as (2, 1, 1) and (1, 1, 1, 1), yielding:
(4, 3, 2, 1, 1), (4, 3, 1, 1, 1, 1).
This direct concatenation mechanism eliminates repeated state reconstruction and reduces several inner-loop transitions, resulting in an additional efficiency gain of approximately 5%. For n = 170, this increases the overall performance improvement to approximately 89%.
By explicitly handling rightmost 3 (x ≤ 3) structures, algorithm Z1 removes the most computationally expensive iterative steps present in algorithm ZS1, leading to significantly lower average delay and improved empirical performance. As a result, only rightmost 3 (x > 3) structures remain within the iterative process.
Algorithm Z1 further accelerates partition generation by optimizing cases in which the rightmost structure consists of a 2 or 3 (x ≤ 3), followed by any number of 1 s.
For n = 170, Algorithm Z1 produces:
207,202,977,471 partitions where the rightmost part is 2 before the trailing block of 1 s.
31,862,447,624 partitions where the rightmost part is 3 (x ≤ 3) before the trailing block of 1 s. Table 1 shows that Count_3 (x ≤ 3) = 10,860,692,344 which are processed in batches to generate 31,862,447,624 partitions.
In comparison, Algorithm ZS1 generates 228,204,732,751 such partitions ending in 2. The difference
228,204,732,751 − 207,202,977,471 = 21,001,755,280
corresponds to partitions in which a 2 is created exclusively through the processing of rightmost 3 (x ≤ 3) structures. Algorithm Z1 handles these cases collectively whenever a rightmost 3 (x ≤ 3) is encountered, thereby improving efficiency. For clarification (21,001,755,280 + 10,860,692,344 = 31,862,447,624).
For n = 150, Algorithm Z1 increases the proportion of special-case partitions from 82.12% (ZS1) to 86.43% (Z1).
For n = 170, this proportion rises from 83.05% (ZS1) to 87.01% (Z1). Refer to Table 1.
These results show that as n increases, partitions whose rightmost parts equal to 2 or 3 (x ≤ 3) constitute an increasingly large share of all partitions.
For n = 170, as shown in Table 3, algorithm ZS1 required 1099.354004 seconds in a single-threaded execution, whereas algorithm Z1 completed the same task in 696.825012 seconds, representing a 36.38% reduction in execution time.
Table 3. Z1 Time saving seconds percentage.
Integer |
ZS1 Time Sec |
Z1 Time Sec |
Z1 Time Reduction (Sec) % |
Z1 Time Saving Sec % |
50 |
0.000000 |
0.000000 |
0.00% |
0.00% |
100 |
0.734000 |
0.508000 |
69.21% |
30.79% |
150 |
151.744003 |
104.516998 |
68.88% |
31.12% |
170 |
1099.354004 |
696.825012 |
63.38% |
36.62% |
When evaluated using the MSYS MinGW64 compiler on an IntelTM Core i7-12700H (2.3 GHz), performance tests for n = 150 show that algorithm ZS1 required 522.903992 seconds in a sequential run, whereas algorithm Z1 completed the same task in 297.003902 seconds, representing 43.20% reduction in execution time.
These results indicate that the performance advantages of Algorithm Z1 are influenced by both hardware configuration and compiler environment.
Constant Average Delay Property for Algorithm Z1
Algorithm Z1 is organized into three components for generating the partitions of a given n:
Generation of partitions with leftmost part equal 2 (followed, if applicable, by 1 s)
Generation of partitions with leftmost part equal 3 (followed, if applicable, by 1 s)
Generation of partitions with leftmost part greater than 3
For partitions whose leftmost part is 2, the generation procedure is identical to that of the original Algorithm ZS1 and requires no modification.
For partitions with leftmost part greater than 3, consider for example n = 17 with the partition
(4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).
The next partition generated is
(3, 3, 3, 3, 3, 2).
In this case, the loop redistributes the 1 s t = n − p = 17 − 4 = 13, using parts of size p = 4 − 1 = 3, followed by a minor adjustment. The number of iterations is therefore approximately ⌊13/3⌋, plus a constant term, resulting in about five iterations. As the size of the redistributed part p increases, the number of required iterations decreases.
For example, when n = 17, if the leftmost part is 10 instead of 4, the partition
(10, 1, 1, 1, 1, 1, 1, 1)
is transformed into
(9, 8),
requiring only a single iteration. The leftmost part 9, is obtained by decrementing 10, by 1. Each iteration of the while loop performs a constant number of operations, approximately three.
The most computationally expensive iterations occur when a part of size p = 4 is redistributed into parts of size 3, In this situation, the number of loop iterations is on the order of
W = ⌊(t + 1)/(p − 1)⌋ + S).
where, S = 0 if (⌊(t + 1)/(p − 1)⌋ mod 0) = 0,
S = 1 if (⌊(t + 1)/(p − 1)⌋ mod 0) > 0
Consequently, the total number of instructions C required to generate the next partition, including loop execution, termination, and trailing conditional operations, can be expressed as follows:
U = 3 (instructions per partition generation),
Q = 3 (termination and trailing conditions)
C = ((⌊(t + 1)/(p − 1)⌋ + S) * U + Q
or equivalently,
C = W * U + Q.
For partitions whose leftmost part is 3, we consider the cases 3 (x ≤ 3), where x denotes the number of trailing 1 s. In these cases, the iterative procedure is replaced by a constant number of instructions (at most ten), including the conditional branching to determine the correct generation path. Each case produces a small batch of partitions via direct concatenation, without arithmetic iteration. When x = 3, four partitions are generated; when x = 2 or x = 1, three partitions are generated; and when x = 0, two partitions are generated. This yields an upper bound of C = 10 operations per generated partition.
When x > 3, the iterative procedure is again required, and the same expression C = W * U + Q applies.
Therefore, the most computational expensive iterations occur when t = n − 3 and p = 3, giving
C = (⌊((n − 3) +1)/2⌋ + S) * 3 + 3.
For example, when n = 15, the partition
(3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
generates the partition
(2, 2, 2, 2, 2, 2, 2, 1), using
C = ((12 + 1)/2 + 1) * 3 + 3. C = 7 * 3 + 3. C = 24.
Let RP(n, m) denote the number of restricted partitions of n using at most m parts.
Lemma 1 and Theorem 1 are taken from my paper “Fast Algorithms for Generating Integer Partitions”. [2]
Lemma 1. RP(n, L2) ≥ n2/12 for L2 ≥ 3.
Proof. Since RP(n, L2) ≥ RP(n, 3) for L2 > 3, it is sufficient to prove RP(n, 3) ≥ n2/12. In the multiplicity representation, the partitions RP(n, 3) are the following kind: n = 3c1 + 2c2 + 1c3 (i.e. y1 = 3, y2 = 2, y3 = 1). The number of such partitions is equal to the number of the solutions of the above equation. Clearly 0 ≤ c1 ≤ ⌊n/3⌋. For each c1 in the interval we solve the equation 2c2 + c3 = n – 3c1. This equation has a unique solution c3 for each c2, 0 ≤ c2 ≤ ⌊n − 3c1/2⌋. Therefore, for fixed c1, the number of solutions is (⌊n − 3c1)/2⌋ + 1 ≥ (n − 3c1)/2. Taking values of c1 into account, the number of solutions is ≥n/2 + (n − 3)/2 + (n − 6)/2 + … + (n − 3⌊n/3⌋)/2 = (⌊n/3⌋ + 1)n/2 − ⌊n/3(n⌊/3⌋ + 1)3/4 = (⌊n/3⌋ + 1)(n/2 − ⌊n/3⌋3/4) ≥ (⌊n/3⌋ + 1)(n/2 − n/4) ≥ n2/12.
Theorem 1. Algorithms ZS1 and Z1 generate unrestricted integer partitions in Standard representation with constant average delay, exclusive of the output.
Proof. Consider xi ≥ 3 in the current partition. It received its value after a backtracking search (starting from the last part) was performed to find an index j ≤ i, called the turning point, that should change its value by 1 (increase/decrease for anti-lexicographic order) and to update values xi for j ≤ i. The time to perform both backtracking searches is O(rj) where rj = n − x1 − x2 − … − xj is the remainder to distribute after first j parts are fixed. We decided to change the cost of the backtrack search evenly to all “swept” parts, such that each of them receives constant O(1) time. Part xi will be changed only after a similar backtracking step “swept” over i-th part or recognized i-th part as the turning point (note that i-th part is the turning point at least one of the two backtracking steps). There are RP(ri, xi) such partitions which all keep xi intact. For xi ≥ 3 the number of such partitions, according to Lemma 1, is ≥ri2/12. Therefore, the average number of operations that are performed by such part i during the “run” of RP(ri, xi), including the change of its value, is O(1)/RP(ri, xi) ≤ O(1)/ri2 = O(1/ri2) < qi/ri2 where qi is a constant. Thus the average number of operations for all parts of size 3 is ≤q1/r12 + q2/r22 + …+ q3/r32 ≤ q(1/r12 + … + 1/r32) < q(1/n2 + 1/(n − 1)2 + … + 1/12) < 2q (the last inequality can be obtained easily by applying integral operation on the last sum), which is a constant. The case in which xi ≤ 2 was not considered. In this case, however, both algorithms ZS1 and Z1 perform a constant number of steps per partition on all such parts. Therefore, the algorithms ZS1 and Z1 have overall constant time average delay.
4. Generating Balanced Distribution Partitions
4.1. Algorithm Z3, 8 Threads, n = 150
Algorithm Z3, a refinement of algorithm Z1, as illustrated in Figure 2, generates a balanced distribution of partition subsets by:
Figure 2. Example of algorithm Z3 input/output.
Dividing the full set of partitions into disjoint subsets.
Ensuring that all partitions sharing the same leftmost part are grouped within the same subset.
Organizing these subsets to enable efficient parallel or distributed execution.
The Partitions Factor parameter improves load balancing by enabling a more uniform distribution of partitions across subsets. Each subset is assigned based on the total number of partitions and the number of available threads. Since all partitions sharing the same leftmost part must remain grouped together, any subset boundary that would otherwise divide such a block instead assigns the entire block to the next subset. As a result, the subsets do not contain identical numbers of partitions.
To facilitate efficient parallel execution, Algorithms Z1 and ZS1 were also adapted to accept input parameters directly from a batch file.
int main(int arg1, char *argv[]) {
n = atoi (argv[3]); x1 = atoi (argv[4]); x2 = atoi (argv[5]);
as an example in the batch file Z1a190.16.bat below: n = 190, x1 = 28, x2 = 29 etc….
Example of batch file: Batch file name: Z1a190.16.bat
echo Starting launching All scripts in parallel
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 28 29
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 26 27
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 31 32
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 23 24
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 52 190
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 33 34
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 39 41
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 42 45
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 46 51
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 18 20
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 21 22
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 35 36
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 37 38
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 25 25
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 30 30
start C:/msys64/mingw64/code/Z1a 3 Z1a 190 1 17
echo Finish launching All scripts in parallel
Tasks are prioritized according to partition count and estimated execution time, with larger workloads scheduled first in order to reduce startup imbalance and ensure a more even distribution of computational load across threads.
4.2. Hardware Capacity
Algorithm Z3 computes P(n) using a single array. For example, for n = 200, it produces P(200) = 3,972,999,029,388. However, due to hardware and compiler constraints, generating all partitions for this value in a single sequential run is impractical. Empirical tests on an AMD Ryzen 7 5800HS system (8 cores, 16 threads, 16 GB RAM) indicate that the largest value that can be reliably handled on a single system is approximately n = 170, although this limit ultimately depends on the available hardware resources.
For n = 190, a complete single-run execution of Algorithm ZS1 resulted in system instability. While sequential execution of individual subsets completed successfully in 6139 seconds, parallel execution of all subsets was not feasible.
Similarly, for Algorithm Z1 at n = 190, a complete single-run execution also proved unstable. However, sequential execution of the subsets completed successfully in 4174 seconds, and parallel execution of all subsets was achievable representing a 32% reduction in total execution time compared with ZS1 (4174 seconds versus 6139 seconds).
For n = 200, generating all partitions in a single run using Algorithm ZS1, was not possible. Although sequential execution of the individual subsets remained feasible, full parallel execution again resulted in instability.
For Algorithm Z1 at n = 200, single-run generation was likewise not achievable; however, sequential execution of the subsets remained stable and provided approximately a 34% performance improvement over Algorithm ZS1. For example, the subset 60 - 200 required 596 seconds under Algorithm ZS1, compared with 393 seconds under algorithm Z1. Full parallel execution of all subsets within a single batch was not stable; consequently, parallel processing was curried out across three batches, although occasional instability was still observed.
5. Empirical Analysis
5.1. Single-Threaded and Multi-Threaded Parallel Execution on the Same Hardware System
Performance of Algorithms ZS1 and Z1 was evaluated under multi-threaded execution model, where multiple instances of each program were executed concurrently, with each instance responsible for generating a distinct subset of the integer partition space.
Partition subsets were assigned to threads through iterative tuning to obtain an approximately optimal balance of computational workload across threads.
5.1.1. Algorithm ZS1, 16 Threads, n = 170
For n = 170, as illustrated in Table 4, execution using sixteen-thread reduced the runtime from 1099.354 seconds in sequential mode to 157.539 seconds, corresponding to 14.33% of the original runtime and representing a reduction of 85.67%. CPU utilization reached 100% at the start of execution and subsequently stabilized at approximately 97%.
5.1.2. Algorithm Z1, 16 Threads, n = 170
Under identical experimental conditions, as illustrated in Table 5, Algorithm Z1 was executed using sixteen parallel threads on the same hardware platform, with each thread assigned a distinct subset of the integer partition space. The parallel
Table 4. Algorithm ZS1, 16 Threads, n = 170.
n = 170 |
Partitions from |
Partitions to |
P(n) |
Single run |
Muti-threaded |
Muti-threaded |
Muti-threaded |
Time Sec |
Time Sec |
Time Start |
Time End |
ZS1 |
1 |
170 |
274,768,617,130 |
1099.354004 |
|
|
|
ZS1 |
49 |
170 |
19,193,773,961 |
74.348999 |
140.298004 |
8:27:10 |
8:29:31 |
ZS1 |
43 |
48 |
17,659,663,004 |
67.250000 |
127.806999 |
8:27:10 |
8:29:18 |
ZS1 |
39 |
42 |
19,035,618,336 |
71.633003 |
138.615000 |
8:27:10 |
8:29:29 |
ZS1 |
36 |
38 |
19,456,938,801 |
72.300003 |
140.289993 |
8:27:10 |
8:29:30 |
ZS1 |
34 |
35 |
15,826,760,953 |
58.849998 |
116.635002 |
8:27:15 |
8:29:11 |
ZS1 |
32 |
33 |
18,218,831,388 |
67.400002 |
133.645004 |
8:27:10 |
8:29:24 |
ZS1 |
30 |
31 |
20,511,517,482 |
75.767998 |
147.029007 |
8:27:10 |
8:29:36 |
ZS1 |
28 |
29 |
22,438,541,109 |
82.203003 |
157.539993 |
8:27:09 |
8:29:47 |
ZS1 |
27 |
27 |
11,732,853,219 |
42.695999 |
88.816000 |
8:27:15 |
8:28:43 |
ZS1 |
26 |
26 |
11,913,686,128 |
43.589001 |
90.138000 |
8:27:15 |
8:28:45 |
ZS1 |
25 |
25 |
11,941,084,517 |
43.752998 |
90.412003 |
8:27:15 |
9:28:45 |
ZS1 |
24 |
24 |
11,791,495,467 |
42.509998 |
89.194000 |
8:27:15 |
8:28:44 |
ZS1 |
22 |
23 |
22,340,653,336 |
81.047997 |
156.395996 |
8:27:10 |
8:29:46 |
ZS1 |
20 |
21 |
19,325,817,959 |
69.013000 |
138.604004 |
8:27:10 |
8:29:28 |
ZS1 |
17 |
19 |
20,544,577,090 |
71.592003 |
144.625000 |
8:27:10 |
8:29:34 |
ZS1 |
1 |
16 |
12,836,804,380 |
43.292999 |
92.642998 |
8:27:15 |
8:28:48 |
TOTAL |
|
|
274,768,617,130 |
1007.247001 |
157.539993 |
8:27:09 |
8:29:47 |
Table 5. Algorithm Z1, 16 Threads, n = 170.
n = 170 |
Partitions from |
Partitions to |
P(n) |
Single run |
Muti-threaded |
Muti-threaded |
Muti-threaded |
Time Sec |
Time Sec |
Time Start |
Time End |
Z1 |
1 |
170 |
274,768,617,130 |
696.825012 |
|
|
|
Z1 |
49 |
170 |
19,193,773,961 |
51.344002 |
113.330002 |
8:15:05 |
8:16:58 |
Z1 |
43 |
48 |
17,659,663,004 |
46.152000 |
101.664001 |
8:15:05 |
8:16:49 |
Z1 |
39 |
42 |
19,035,618,336 |
49.728001 |
111.119003 |
8:15:10 |
8:17:01 |
Z1 |
36 |
38 |
19,456,938,801 |
50.187000 |
113.459999 |
8:15:05 |
8:16:58 |
Z1 |
34 |
35 |
15,826,760,953 |
40.425999 |
94.254997 |
8:15:05 |
8:16:39 |
Z1 |
32 |
33 |
18,218,831,388 |
46.551998 |
105.808998 |
8:15:05 |
8:16:51 |
Z1 |
30 |
31 |
20,511,517,482 |
53.355000 |
119.105003 |
8:15:05 |
8:17:04 |
Z1 |
28 |
29 |
22,438,541,109 |
56.541000 |
124.427002 |
8:15:05 |
8:17:09 |
Z1 |
27 |
27 |
11,732,853,219 |
29.278999 |
71.709000 |
8:15:05 |
8:16:17 |
Z1 |
26 |
26 |
11,913,686,128 |
29.958000 |
72.875000 |
8:15:06 |
8:16:19 |
Z1 |
25 |
25 |
11,941,084,517 |
29.827999 |
72.261002 |
8:15:06 |
8:16:18 |
Z1 |
24 |
24 |
11,791,495,467 |
29.166000 |
69.338000 |
8:15:06 |
8:16:15 |
Z1 |
22 |
23 |
22,340,653,336 |
54.713001 |
119.942000 |
8:15:10 |
8:17:09 |
Z1 |
20 |
21 |
19,325,817,959 |
47.192000 |
112.999000 |
8:15:10 |
8:17:03 |
Z1 |
17 |
19 |
20,544,577,090 |
48.462000 |
115.941002 |
8:15:11 |
8:17:07 |
Z1 |
1 |
16 |
12,836,804,380 |
29.027000 |
76.518000 |
8:15:11 |
8:16:27 |
TOTAL |
|
|
274,768,617,130 |
691.909999 |
124.427002 |
8:15:05 |
8:17:09 |
execution completed in 124.427 seconds, compared with 696.825 seconds for the sequential execution. This corresponds to 17.86% of the sequential runtime (124.427 seconds, versus 696.825 seconds) and represents an overall reduction of 82.14% in execution time. Similarly, CPU utilization reached 100% at the start of execution and subsequently stabilized at approximately 97%.
Under sixteenthread parallel execution on the same hardware platform, Algorithm Z1 achieved 78.98% of the runtime of Algorithm ZS1 (124.427002 seconds versus 157.539993 seconds), corresponding to a performance improvement of 21.02%.
6. Summary
The performance evaluation of the proposed algorithms is presented in Table 6, which reports sequential and multithreaded execution times, percentage reductions in runtime, estimated time savings, and average CPU utilization across multiple values of n and varying thread counts. Across all experiments, Algorithm Z1 consistently outperforms Algorithm ZS1 under both sequential and parallel execution settings.
Table 6. Summary measurement results Algorithm ZS1 vs Algorithm Z1.
Algorithm |
n |
Parallel Runs |
Single-threaded |
Single-threaded |
Muti-threaded |
Muti-threaded |
Muti-threaded |
Single execution Time Sec/Sec per Partition |
Sequential Runs Time Sec/Sec per Partition |
Time Sec/Sec per Partition |
Time Saving |
CPU Usage |
ZS1 |
150 |
4 |
154.584002 |
154.434006 |
62.945999 |
59.28% |
28% |
|
|
|
3.783886E−09 |
3.780215E−09 |
1.540784E−09 |
|
|
Z1 |
150 |
4 |
108.193001 |
106.355001 |
47.875999 |
55.75% |
28% |
|
|
|
2.648334E−09 |
2.603343E−09 |
1.171902E−09 |
69.03% |
|
ZS1 |
150 |
17 |
151.744003 |
150.232000 |
23.315001 |
84.64% |
97% |
|
|
|
3.714369E−09 |
3.677359E−09 |
5.707015E−10 |
|
|
Z1 |
150 |
17 |
105.403000 |
103.927000 |
18.730000 |
82.23% |
97% |
|
|
|
2.580040E−09 |
2.543911E−09 |
4.584704E−10 |
87.66% |
|
ZS1 |
170 |
16 |
1099.354004 |
1007.247001 |
157.539993 |
85.67% |
97% |
|
|
|
4.001017E−09 |
3.665801E−09 |
5.733551E−10 |
|
|
Z1 |
170 |
16 |
696.825012 |
691.909999 |
124.427002 |
82.14% |
97% |
|
|
|
2.536043E−09 |
2.518155E−09 |
4.528428E−10 |
88.68% |
|
ZS1 |
190 |
16 |
0.000000 |
6250.526956 |
0.000000 |
0.00% |
97% |
|
|
|
0.000000E+00 |
3.747931E−09 |
0.000000E+00 |
|
|
Z1 |
190 |
16 |
4174.296875 |
4127.428999 |
763.338939 |
81.71% |
97% |
|
|
|
2.502985E−09 |
2.474882E−09 |
4.577121E−10 |
87.79% |
|
For the highest integer tested, n = 170 on an AMD Ryzen 7 5800HS system (8 cores, 16 threads, 16 GB RAM), Algorithm Z1 reduced the sequential execution time by 36.62% compared with the original sequential implementation of Algorithm ZS1 (696.825 seconds versus 1099.354 seconds). Similarly, the parallel implementation of Z1 achieved a 21.02% reduction in execution time relative to the parallel implementation of ZS1 (124.427002 seconds versus 157.539993 seconds). Furthermore, comparing the parallel execution time of Z1 with the original sequential baseline of ZS1 yields an overall reduction of 88.68% (124.427002 seconds versus 1099.354 seconds). For comparison, at n = 150, the 17-thread implementation achieved a time saving of 87.66%. These results confirm the superior efficiency and scalability of Algorithm Z1 over Algorithm ZS1.
For clarification:
82.14% = (100 − (124.427002/696.825012)) %;
88.68% = (100 − (124.427002/1099.354004)) %
7. Conclusions
Using MSYS MinGW64 on an AMD Ryzen 7 5800HS system (8 cores, 16 threads, 16 GB RAM), Algorithm Z1 consistently outperforms algorithm ZS1 under both sequential and parallel execution conditions. Although both algorithms maintain the constant average delay property, their practical performance is influenced by system architecture and compiler behavior. The performance advantage of Algorithm Z1 becomes particularly significant for values of n > 100.
Empirical results demonstrate that parallel execution on a single hardware system yields substantial runtime reduction. Using four threads produces an approximate 69% decrease in execution time, while sixteen threads achieve reductions of nearly 89%. Notably, subdividing the generation process improves performance even in the absence of parallel execution.
In practice, executing a single subset is more efficient than launching multiple subsets concurrently, since parallel initialization introduces overhead that increases with the number of simultaneously active subsets. A key advantage of Algorithms ZS1 and Z1 is their support for selective partition generation through parameter adjustment alone, without requiring structural modifications. This distinguishes them from alternative standard-form generation methods, particularly those outside the anti-lexicographic and lexicographic frameworks, which typically rely on additional conditional checks that significantly increase computational cost.
Another important observation is that generating the full partition set P(n) through multiple sequential executions of complete subsets is more efficient than performing a single-threaded monolithic run. For example, when n = 170, a full execution of Algorithm Z1 required 696.825 seconds, whereas sequential execution of sixteen subdivided runs required a total of 691.909 seconds. This consistently observed behavior suggests that subdivision reduces system load and improves performance even in the absence of parallelism.
The ability to execute subsets sequentially or in parallel and subsequently combine their outputs makes it feasible to generate partitions for significantly larger values of than would be practical in a single run. For instance, this approach enables full computation of partitions for n = 190 using Algorithm ZS1.
For sufficiently large n, a trade-off emerges between executing subsets sequentially or in parallel on a single system and distributing them across multiple machines, each handling a distinct subset. The case n = 200 highlights these considerations and illustrates the resource constraints inherent in largescale partition generation.
For large value of n, the most efficient strategy is to distribute subsets across multiple machines using multi-threaded execution, ensuring that the number of subsets assigned does not exceed the available threads per machine and that total CPU utilization on each machine remains below approximately 97%.
All performance measurements reported in this study were obtained with output operations disabled.
8. Conclusions: Algorithm Z1 vs Existing Parallel Algorithms
Theoretical analysis indicates that Algorithm Z1, as illustrated in Table 7, operates as a high-efficiency sequential baseline, characterized by strong per-core computational throughput and minimal control-flow overhead. These properties allow it to remain competitive with parallel partition generation methods, particularly in scenarios where synchronization, communication, and thread-management overhead contribute significantly to total execution time.
Table 7. Cross-model comparative summary.
Criterion |
Algorithm Z1 |
Distributed Parallel |
Multi-Threaded Parallel |
Per-Core Efficiency |
Very High |
Low-Moderate |
Moderate |
Absolute Throughput |
Low-Moderate |
Very High |
High |
Scalability |
Low |
Very High |
Moderate |
Execution Overhead |
Minimal |
High |
Moderate |
Implementation Complexity |
Low |
High |
Moderate |
Framework Conclusion
Algorithm Z1 serves as a highly efficient sequential baseline, characterized by strong runtime performance and minimal execution overhead. Although some parallel algorithms may offer higher throughput or improved scalability, these gains are often accompanied by additional coordination, communication, and synchronization costs. Consequently, the practical advantages of parallelism depend heavily on the structure and characteristics of the workload. In this context, Algorithm Z1 remains a competitive approach, particularly in environments where overhead limitations reduce the effectiveness of parallel execution.
9. Open Discussion and Challenges
Develop an algorithm for generating all integer partitions of large n on a single hardware system. For n = 510, P(510) = 3,991,268,667,606,861,086,720 which necessitates an efficient enumeration strategy. The proposed method employs a structured subdivision scheme that decomposes the solution space into subsets organized vertically and horizontally according to the Algorithm Z1 framework. Each complete partition is formed by concatenating component partitions drawn from these subsets. The independence of the subsets enables parallel execution and also supports distribution across multiple hardware systems, thereby improving scalability and computational efficiency.
For n = 170, executing 16 subsets on a 16-thread processor reduced computation time by approximately 89% with average CPU utilization below 98%. Similar experiments on 32- and 64-thread systems, using 32 and 64 subsets respectively, are expected to achieve comparable or greater speedup while maintaining CPU utilization under 98%.
A comparative evaluation framework should also be developed to assess Algorithm Z1 against existing parallel algorithms. The evaluation should consider performance, scalability, runtime efficiency, and execution overhead under two deployment models:
10. Standard Form Partitions
For n = 10, the Standard form of the Partitions is:
11. C Code: ZS1, Z1 and Z3
ZS1 - C code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int m,n,i,h,t,r,x1,x2,y1,y2,z1,z2,z3;
long long int v,w;
clock_t clock1, clock2;
time_t tstart, tfinish;
float cpu_time_used;
int main(void) {
printf("\n Algorithm ZS1 - Restricted \n \n \n ");
printf("\n please enter the value of integer N : ");
scanf ("%d", &n);
printf("\n please enter the lowest value of the leftmost part: ");
scanf ("%d", &x1);
printf("\n please enter the highest value of the leftmost part: ");
scanf ("%d", &x2);
if (x1 == 1) {x1 = 2;}
printf("\n \n \n Generating Standard Antilexicographic Partitions for integer %d \n \n", n);
int a[n+1];
for (i=0; i<=n; ++i) a[i]=1;
a[1] = x2 + 1; h = 1; m = n - x2;
time(&tstart);
printf("\n Start: %s", ctime(&tstart));
clock1 = clock();
while (a[1] >= x1){
if (a[h] == 2) { m += 1; a[h] = 1; --h; }
else {
t = m-h+1; a[h] -=1; r = a[h];
while (t >= r) { h +=1; a[h] = r; t -=r;}
if (t < 2) {m = h + t;}
else {m = h + 1; ++h; a[h] = t;}
}
if (a[1] >= x1) {for(i = 1; i <= m; ++i) (printf(" %d", a[i])); printf(" \n");}
} // while (a[1] >= x1)
if (a[1] == 1) {for(i = 1; i <= n; ++i) (printf(" %d", a[i])); printf(" \n");}
clock2 = clock();
cpu_time_used = ( (float) (clock2-clock1))/CLOCKS_PER_SEC;
printf("\n cpu_time_used : %lf seconds \n", cpu_time_used);
printf(" ( CLOCKS_PER_SEC:%d) \n",CLOCKS_PER_SEC);
printf("\n Start : %s", ctime(&tstart));
time(&tfinish);
printf("\n Finish : %s", ctime(&tfinish));
scanf ("%d", &i);
return 0;
}
Z1 – C code
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
int m,n,i,h,t,r,x,x1,x2;
long long int v,v0,v1,v2,v3,v23,v4,w,y,t0,t1,t2;
clock_t clock1, clock2;
time_t tstart, tfinish;
float cpu_time_used;
float vp=0.00;
int main(void) {
printf("\n Algorithm Z1 - Restricted \n \n \n ");
printf("\n please enter the value of integer N : ");
scanf ("%d", &n);
printf("\n please enter leftmost part from: ");
scanf ("%d", &x1);
printf("\n please enter leftmost part to : ");
scanf ("%d", &x2);
if (x1 == 1) {x1 = 2;}
printf("\n \n \n Generating Standard Antilexicographic Partitions for integer %d \n \n", n);
int a[n+1];
for (i=0; i<=n; ++i) a[i]=1;
a[1] = x2 + 1; h = 1; m = n - x2; w=0;
time(&tstart);
printf("\n Start : %s", ctime(&tstart));
clock1 = clock();
while (a[1] >= x1){
while (a[h] == 2){
++m; a[h] = 1; --h;
for (i = 1; i <= m; ++i) (printf("%d ", a[i]));
printf(" \n"); ++w;}
if (a[h] == 3) {
x = m - h; a[h] = 2;
if (x > 3) {
while (x > 0){++h; a[h] = 2; x -=2;}
if (x == 0){m = h + 1;}
else {m = h;}
} // x>3
else if (x == 0) { //m = h
++m;
for (i = 1; i <= m; ++i) (printf("%d ", a[i])); printf("\n");
a[h] = 1; --h; ++m; ++w;}
else if (x == 1) {
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 2"); printf(" \n");
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 1 1"); printf("\n");
a[h] = 1; --h, m += 2; w += 2;}
else if (x == 2) {
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 2 1"); printf(" \n");
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 1 1 1"); printf(" \n");
a[h] = 1; --h; m += 2; w += 2;}
else { // x=3
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 2 2"); printf("\n");
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 2 1 1"); printf(" \n");
for (i = 1; i <= h; ++i) (printf("%d ", a[i]));
printf(" 1 1 1 1"); printf("\n");
a[h] = 1; --h; m += 2; w += 3;}
++w;
for (i = 1; i <= m; ++i) (printf("%d ", a[i])); printf("\n");
} // a[h] == 3)
else if (a[h] > 3) {
t = m-h+1; a[h] -=1; r = a[h];
while (t >= r){++h; a[h] = r; t -=r;}
if (t < 2) {m = h + t;}
else {m = h + 1; ++h; a[h] = t;}
for (i = 1; i <= m; ++i) (printf("%d ", a[i])); printf("\n");
++w;
} // (a[h] > 3)
} // while a[1] >= x1
if (a[1] < x1) {
if (a[1] > 1) { --w;}}
clock2 = clock();
cpu_time_used = ( (float) (clock2-clock1))/CLOCKS_PER_SEC;
printf("\n cpu_time_used : %lf seconds \n", cpu_time_used);
printf(" (CLOCKS_PER_SEC:%d) \n",CLOCKS_PER_SEC);
printf("\n Start : %s", ctime(&tstart));
time(&tfinish);
printf("\n Finish : %s", ctime(&tfinish));
printf("\n partitions_W: %lld\n", w);
scanf ("%d", &i);
return 0;
}
Z3 – C code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
int c,c1,m,n,i,h,k,t,r,x,x1,x2,x3,y1,y2;
long long int w,w1,w2,w3,w4,v;
float x4,u1,u2,u3,u4;
clock_t clock1, clock2;
time_t tstart, tfinish;
float cpu_time_used;
int main(void) {
printf("\n Algorithm Z3 - Multi-Thread setting \n");
printf("\n Please enter the value of integer N : ");
scanf ("%d", &n);
printf("\n Please enter the lowest leftmost part : ");
scanf ("%d", &x1);
printf("\n Please enter the highest leftmost part : ");
scanf ("%d", &x2);
printf("\n Please enter the number of Multi-Threads: ");
scanf ("%d", &x3);
printf("\n Please enter the Partitions factor : ");
scanf ("%f", &x4);
printf("------------------ Finding Threads with Partitions Borders ------------------\n");
if (x1 == 1) {x1 = 2;}
int a[n+1];
for (i=0; i<=n; ++i) a[i]=1;
a[1] = x2 + 1; h = 1; m = n - x2;
time(&tstart);
clock1 = clock();
long long int p[n+1];
for (i=0; i<=n; ++i) p[i]=0;
// calculating P(n)
p[0] = 1;
for (k=1; k<=n; ++k) {
for (i=k; i<=n; ++i) {p[i] += p[i-k];}}
clock2 = clock();
w = p[n];
printf("\n P(n): %lld ", w);
cpu_time_used = ( (float) (clock2-clock1))/CLOCKS_PER_SEC;
printf(" cpu_time: %lf s", cpu_time_used);
printf(" (CLOCKS_PER_SEC:%d) \n\n",CLOCKS_PER_SEC);
// finding partitions borders of threads
for (i=0; i<=n; ++i) a[i]=1;
a[1] = x2 + 1; h = 1; m = n - x2; w1 = 0; c = 0;
v = w / x3; v = v * x4;
printf("Maximum Partitions per one Thread %lld \n\n", v);
c1 = a[1];
while (a[1] >= x1){
if (v > w1) {
if (a[1] < c1) {w2 = 0, c1 = a[1];}
while (a[h] == 2) {++m; a[h] = 1; --h; ++w1; ++w2;}
if (a[h] == 3) {
x = m - h; a[h] = 2;
if (x > 3) {
while (x > 0) {++h; a[h] = 2; x -=2;}
if (x == 0){m = h + 1;}
else {m = h;}}
else if (x == 0) {++m; a[h] = 1; --h; ++m; ++w1; ++ w2;} //m = h
else if (x == 1) {a[h] = 1; --h, m += 2; w1 += 2; w2 += 2;}
else if (x == 2) {a[h] = 1; --h; m += 2; w1 += 2; w2 += 2;}
else {a[h] = 1; --h; m += 2; w1 += 3; w2 += 3;} // x = 3
++w1; ++w2;
} // a[h] == 3)
else if (a[h] > 3) {
t = m-h+1; a[h] -=1; r = a[h];
while (t >= r){++h; a[h] = r; t -=r;}
if (t < 2) {m = h + t;}
else {m = h + 1; ++h; a[h] = t;}
++w1; ++w2;
} // (a[h] > 3)
} // ( v > w1)
else {
clock2 = clock();
cpu_time_used = ( (float) (clock2-clock1))/CLOCKS_PER_SEC;
++c; y1 = a[1] + 1; y2 = x2;
if (n > x2) {y2 = x2 - 1;}
if (y1 > y2) {y1 = y2;}
printf("Thread - %d - ", c); printf("Lower %d - ", y1); printf("Higer %d - ", y2);
w3 = w1 - w2 - 1; w4 = w4 + w3;
printf("Partitions: %lld ", w3); printf("cpu_time : %lf s \n\n", cpu_time_used);
for (i=0; i<=n; ++i) a[i]=1;
x2 = y1; y1 = 0; y2 = 0; w1=0; w2=0; a[1] = x2; h = 1; m = n - x2 + 1;
}
} // while (a[1] > x1)
clock2 = clock();
cpu_time_used = ( (float) (clock2-clock1))/CLOCKS_PER_SEC;
++y1; --x2; ++c;
printf("Thread - %d - ", c); printf("Lower %d - ", y1); printf("Higer %d - ", x2);
if (c == 1) { --w1; }
printf(" Partitions: %lld ", w1); printf(" cpu_time : %lf s \n\n", cpu_time_used);
w4 = w4 + w1;
printf(" Total Partitions: %lld ", w4); printf("\n Start : %s", ctime(&tstart));
time(&tfinish);
printf("\n Finish : %s", ctime(&tfinish));
scanf ("%d", &i);
return 0;
}