# 27-1 Implementing parallel loops using nested parallelism

Consider the following multithreaded algorithm for performing pairwise addition on $n$-element arrays $A[1..n]$ and $B[1..n]$, storing the sums in $C[1..n]$:

`SUM-ARRAYS(A, B, C) parallel for i = 1 to A.length C[i] = A[i] + B[i]`

a.Rewrite the parallel loop in $\text{SUM-ARRAYS}$ using nested parallelism (spawnandsync) in the manner of $\text{MAT-VEC-MAIN-LOOP}$. Analyze the parallelism of your implementation.Consider the following alternative implementation of the parallel loop, which contains a value $grain\text-size$ to be specified:

`SUM-ARRAYS'(A, B, C) n = A.length grain-size = ? // to be determined r = ceil(n / grain-size) for k = 0 to r - 1 spawn ADD-SUBARRAY(A, B, C, k * grain-size + 1, min((k + 1) * grain-size, n)) sync`

`ADD-SUBARRAY(A, B, C, i, j) for k = i to j C[k] = A[k] + B[k]`

b.Suppose that we set $grain\text -size = 1$. What is the parallelism of this implementation?

c.Give a formula for the span of $\text{SUM-ARRAYS}'$ in terms of $n$ and $grain\text-size$. Derive the best value for grain-size to maximize parallelism.

**a.** See the algorithm $\text{SUM-ARRAYS}(A, B, C)$. The parallelism is $O(n)$ since it's work is $n\lg n$ and the span is $\lg n$.

**b.** If grainsize is $1$, this means that each call of $\text{ADD-SUBARRAY}$ just sums a single pair of numbers. This means that since the for loop on line 4 will run $n$ times, both the span and work will be $O(n)$. So, the parallelism is just $O(1)$.

```
SUM-ARRAYS(A, B, C)
n = floor(A.length / 2)
if n == 0
C[1] = A[1] + B[1]
else
spawn SUM-ARRAYS(A[1..n], B[1..n], C[1..n])
SUM-ARRAYS(A[n + 1..A.length], B[n + 1..A..length], C[n + 1..A.length])
```

**c.** Let $g$ be the grainsize. The runtime of the function that spawns all the other functions is $\left\lceil \frac{n}{g} \right\rceil$. The runtime of any particular spawned task is $g$. So, we want to minimize

$$\frac{n}{g} + g.$$

To do this we pull out our freshman calculus hat and take a derivative, we have

$$0 = 1 − \frac{n}{g^2}.$$

To solve this, we set $g = \sqrt n$. This minimizes the quantity and makes the span $O(n / g + g) = O(\sqrt n)$. Resulting in a parallelism of $O(\sqrt n)$.