PageRank parallel solutions

Assignment 4
Due Friday by 11:59pm Points 70
Submitting a file upload Available Oct 4 at 12am - Dec 24 at 11:59pm Start Assignment
Assignment 4 (70 Points)
ue Friday October 11 @ 11:59PM In this assignment, we will improve the parallel solutions of PageRank from Assignment 3. You will
implement different task decomposition and mapping strategies, and observe their effect. You can use the same serial versions of the pull-based and push-basedPageRank from Assignment 3 to check for the correctness of your results. You should use your parallel solutions " page_rank_pull_parallel.cpp "
and " page_rank_push_parallel_atomic.cpp
" as the starting point of this assignment. You do not need to download a separate tarball for this assignment since all you need was provided in Assignment 3.
You should have completed the Slurm Tutorial (https://canvas.sfu.ca/courses/84236/pages/slurm-tutorial) which walks you through how to use our servers for your code testing. If you're still have trouble with
Slurm, please post questions on the canvas discussion board. It is strongly recommended to use Slurm
for this assignment instead of CSIL machines, since the graphs you need for checking your programs
re on the /scratch/ drive on the cluster's compute nodes. General Instructions 1. You are provided with the serial versions and some other tools in the same tarball from Assignment

Assignment 4

83. You will be asked to print the time spent by different threads on specific code regions. The time spent
by any code region can be computed as follows:
timer t1;
t1.start();
/* ---- Code region whose time is to be measured --- */
double time_taken = t1.stop();
4. If you need to time a sub-section inside a loop, you can do that as follows:
double time_taken = 0.0;
timer t1;
while(True){
/* ---- Code region whose time should not be measured --- */
t1.start();
/* ---- Code region whose time is to be measured --- */
time_taken += t1.stop();
/* ---- Code region whose time should not be measured --- */
}
std::cout << "Time spent on required code region : " << time_taken << "\n";
5. The programs operate on graph datasets. Sample input graphs are available
at
/scratch/input_graphs/
on the compute nodes (note they are present on the compute nodes only,
and hence you can access them via slurm only).
$ ls /scratch/input_graphs/*.cs*
rmat.csc test_25M_50M.csc rmat.csr test_25M_50M.csr lj.csc
roadNet-CA.csc
web-Google.csc lj.csr roadNet-CA.csr web-Google.csr
Please read Assignment 3's general instructions if you want to generate more tests graphs.
1. Add a "strategy" Command Line Parameter
For your parallel programs "
page_rank_pull_parallel.cpp
" and " page_rank_push_parallel_atomic.cpp
"
from Assignment 3, you need to add a command line parameter "strategy" which is used to define the program's task decomposition and mapping strategy. If the parameter is set to 1, i.e.,
--strategy 1 , your parallel program should run the same exact way as specified in Assignment 3. "strategy" values
correspond to the strategies described in the following sections. Strategy 1:
Vertex-based decomposition and static mapping, the same strategy as your Assignment 3 programs.
Vertices are statically distributed among threads such that each thread performs all computations on n/T
vertices.
2. Edge-based Decomposition for PageRank
Strategy 2: To achieve more load balance among threads, vertices are distributed among threads such that each thread performs computations on m/T edges. Pull-based PageRank distributes vertices based
Assignment 4 2/8on their in-degrees, while Push-based PageRank distributes vertices based on their out-degrees. Both of your parallel programs need to support this strategy by setting the strategy parameter to 2 in the
command line (
--strategy 2
). For example, the pull-based algorithm's pseudo-code is:
Create T threads Distribute work between threads so that each thread gets to compute on approximately m/T edges
for(i=0; i<max_iterations; i++) {
for each thread in parallel {
for each vertex 'v' allocated to the thread {
for vertex 'u' in inNeighbor(v)
next_page_rank[v] += (current_page_rank[u]/outdegree[u])
}
}
for each thread in parallel {
for each vertex 'v' allocated to the thread {
compute the new_pagerank using the accumulated values in next_page_rank[v].
current_page_rank[v] = new_pagerank
Reset next_page_rank[v] to 0
}
}
}
As an example for how edge-based task decomposition works, a graph has the following vertex distribution <vertex, in-degree, out-degree>:
v1, 8, 1
v2, 5, 6
v3, 8, 3
v4, 2, 8
v5, 3, 5
v6, 1, 4
v7, 5, 6
v8, 4, 3 The graph has 36 edges. To distribute the work evenly among 4 threads, each thread needs to compute approximately 36/4=9 edges. Each thread gets assigned vertices until the total assigned edges is >= (thread_id+1) * m/T. Examining the in-degrees, the pull-basedalgorithm has the following vertex
distribution: Thread 0: v1, v2 (13 edges >= 1x9); Thread 1: v3 (8 edges so 13+8 >= 2x9); Thread 2: v4, v5, v6 (6 edges, so 13+8+6 >= 3x9); Thread 3: v7, v8 (9 edges)
For the push-based algorithm, we can use the following vertex distribution based on out-degrees: Thread 0: v1, v2, v3 (10 edges >= 1x9); Thread 1: v4 (8 edges so 10+8 >= 2x9); Thread 2: v5, v6 (9
edges so 10+8+9 >= 27); Thread 3: v7, v8 (9 edges) Note that the pull-based algorithm in the above example should have a similar runtime as the vertex
based decomposition since the longest thread needs to compute 13 edges in both cases. However, the
push-based algorithm has a shorter runtime since the longest thread computes 10 edges while the vertex-based decomposition requires 11 edge computations for the longest thread.
Assignment 4 3/83. Vertex-based Decomposition for PageRank with Dynamic
Mapping
Strategy 3: To achieve even more load balance among threads, vertices are mapped to threads dynamically based on the time at which different threads finish theirtasks. Instead of allocating approximately equal numbers of vertices (or edges) to threads, we can dynamically allocate work to each thread whenever it is free. In this strategy, each thread dynamically gets the next vertex to be
computed until all the vertices are processed. Both of your parallel programs need to support this
strategy by setting the strategy parameter to 3 in the command line (
--strategy 3
). Below is the
seudo-code showing dynamic task mapping for the push-based Page Rank algorithm with vertex-based
decomposition:
Create T threads
for each thread in parallel {
for(i=0; i<max_iterations; i++) {
while(true){
u = getNextVertexToBeProcessed();
if(u == -1) break;

edges_processed += outDegree(u) // used in output validation
for vertex v in outNeighbor(u)
next_page_rank[v] += (current_page_rank[u]/outdegree[u])
}
barrier1
while(true){
v = getNextVertexToBeProcessed();
if(v == -1) break;
vertices_processed += 1 // used in output validation
compute the new_pagerank using the accumulated values in next_page_rank[v].
current_page_rank[v] = new_pagerank
Reset next_page_rank[v] to 0
}
barrier2
}
}
You need to implement the above dynamic mapping strategy in your solutions for
both "
page_rank_pull_parallel.cpp
" and "
page_rank_push_parallel_atomic.cpp
"
4. Vertex-based Decomposition for PageRank with Coarse-Grained
Dynamic Mapping
Strategy 4: To reduce the time spent by each thread on the
getNextVertexToBeProcessed()
, we will vary
the task granularity so that each thread receives multiple vertices to be processed each time it
calls
getNextVertexToBeProcessed()
. Both of your parallel programs need to support this strategy by
setting the strategy parameter to 4 in the command line (
--strategy 4
). You need to update the dynamic
load distribution logic as follows:
Each thread processes
k vertices and then calls
getNextVertexToBeProcessed()
.
Here,
k
determines the granularity of the work done by each thread before requesting new work.

Assignment 4

4/8For example,
If
k = 1
, the thread calls
getNextVertexToBeProcessed()
after processing each vertex (exactly the
same way as strategy 3)
If
k = 1000
, the thread calls
getNextVertexToBeProcessed()
after processing 1000 vertices.
The
getNextVertexToBeProcessed()
function should return
0
,
k
,
2k
, ... depending on the
granularity
k
.
k
should be provided at run time using command-line parameter. e.g:
--granularity 100
Below is the pseudo-code showing the logic of our parallel solution:
k = 1000 // granularity
Create T threads
for each thread in parallel {
for(i=0; i<max_iterations; i++) {
while(true){
u = getNextVertexToBeProcessed()
if(u == -1) break;
for (j = 0; j < k; j++) {
edges_processed += outDegree(u) // used in output validation
for vertex v in outNeighbor(u)
next_page_rank[v] += (current_page_rank[u]/outdegree[u]
u++
if(u >= n) break; // n is the total number of vertices in the graph
}
}
barrier1
while(true){
v = getNextVertexToBeProcessed()
if(v == -1) break;
for (j = 0; j < k; j++) {
vertices_processed += 1 // used in output validation
compute the new_pagerank using the accumulated values in next_page_rank[v].
current_page_rank[v] = new_pagerank
Reset next_page_rank[v] to 0
v++
if(v >= n) break; // n is the total number of vertices in the graph
}
}
barrier2
}
}
This strategy should be used with command-line parameter
--granularity
to specify the granularity. You
need to add support for
--granularity
to both of your parallel programs "
page_rank_pull_parallel.cpp
"
and "
page_rank_push_parallel_atomic.cpp
"
Input and Output Formats:
1. Your program files should be named
page_rank_pull_parallel.cpp
and
page_rank_push_parallel_atomic.cpp
and should support the
following command-line parameters:
--nThreads
: The number of threads.
--nIterations
: The number of iterations (similar to the serial code). Default is 10.

Assignment 4

5/8--inputFile
: The absolute path to the input graph file (similar to the serial code).
--strategy
: A value of either 1, 2, 3, or 4 corresponding to the strategies defined above. Any
other value should lead to program termination. The default strategy is 1.
--granularity
: A positive integer to be used only in strategy 4. This parameter is ignored for
other strategies. You need to check that
granularity
is always a positive integer. Zero, negative,
and non-integer values lead to program termination. The default granularity is 1.
2. Your parallel solution must output the following information:
Total number of threads used.
Number of iterations.
Strategy used
Granularity used
For each thread:
Thread id (your threads should be numbered between
[0, T)
)
Number of vertices processed (across all iterations) - This is the number of vertices processed only in the second for
loop across all iterations. Refer the pseudocode 代写PageRank parallel solutions of pagerank above. Number of edges processed (across all iterations) - This is the number of edges processed only inthe first
for
loop across all iterations. Refer the pseudocode of pagerank above.
Cumulative time spent waiting at
barrier1
(in seconds)
Cumulative time spent waiting at
barrier2
(in seconds)
Cumulative time spent waiting at
getNextVertexToBeProcessed()
(in seconds).
Time taken by the thread (in seconds).
The sum of pageranks of all vertices.
The total time taken for the entire execution, including the time used in allocating vertices or
edges to threads.
3. The sample console output is below. Please note that you should use this same format exactly for all
4 parts. If an output is not generated (e.g., getNextVertex time in strategy 1 and 2, print 0.0 instead).
Using DOUBLE
Number of Threads : 4
Strategy : 4
Granularity : 1
Iterations : 20
Reading graph
Created graph
thread_id, num_vertices, num_edges, barrier1_time, barrier2_time, getNextVertex_time, total_time
0, 4576646, 26069763, 0.001790, 0.001063, 4.067461, 10.366047
1, 4625648, 23284082, 0.001947, 0.001063, 4.107634, 10.366017
2, 4617670, 26393229, 0.001654, 0.001049, 4.029203, 10.365975
3, 4508596, 26353706, 0.001668, 0.001059, 4.101155, 10.365950
Sum of page ranks : 618961.562500
Time taken (in seconds) : 10.366616
Testing We are not providing a testing script for this assignment. However, you should strictly adhere to the
format above, including all the spaces and commas. You can test your program using similar command

Assignment 4 6/8lines to the test cases of Assignment 3 while adding the strategy parameter. We will not disclose the test
cases used for grading before the assignment due date. Assignment Report In addition to your parallel code, you need to submit a report (in pdf format) that answers the following
two questions:
Q1. Run each of your two parallel programs with strategy 4 above using 8 threads on the test_25M_50M data set with granularities of 10, 100, 1000, 2000. Each of your parallel programs should run 3 times,
ach including 20 iterations on the slow cluster using the default floating point data type. {Total number of runs is 2 (parallel versions) x 4 (different granularities) x 3 (number of runs for each version/thread
combination) = 24 runs]
Plot a graph with average execution time on the y-axis, thread count on the x-axis where each
granularity has 2 bars, one for the average runtime of each of your parallel versions. Your graph should
be something like this (obviously with different values):
Q2. Based on data from the graph in Q1, which of the four granularities is better for pull? And which is
better for the push_atomic algorithm?
Submission Guidelines
Make sure that your solutions folder has the following files and sub-folders. Let's say your solutions
folder is called
my_assignment4_solutions
. It should contain:
core/
-- The folder containing all core files. It is already available in the assignment package. Do not modify it or remove any files. Makefile
-- Makefile for the project. This file should not be changed. page_rank_pull_parallel.cpp
Assignment 4 7/8page_rank_push_parallel_atomic.cpp report.pdf -- A pdf file that includes answers to questions in the previous section.
To create the submission file, follow the steps below:
1. Enter in your solutions folder, and remove all the object/temporary files.
$ cd my_assignment4_solutions/
$ make clean
$ rm -rf input_graphs OR mv input_graphs ../ to avoid large tarballs
2. Create the tar.gz file.
$ tar cvzf assignment4.tar.gz * which creates a compressed tar ball that contains the contents of the folder. Submit via Canvas by the deadline.
Assignment 4 8/8

posted @ 2024-10-11 20:21 r78b6d 阅读(38) 评论(0) 收藏举报

刷新页面返回顶部

comp9313

wx: codinghelp

PageRank parallel solutions

公告