Discussed the results from attempting to parallelize the inner loop of the dijkstra application. Using cuda streams has resulted in a slow down of the application so further tests to determine why this is happening need to be carried out. Possible reason for this is that to simply create the streams on the GPU takes much longer than it does to simply evaluate an individual neighbour.
The paper 'A New GPU-based Approach to the shortest path problem' uses a similar method and reported a speedup of 220x, it may be worth while emailing and asking to see the source code to perform a comparison between their implementation and my own.
Tuesday, 25 February 2014
Monday, 24 February 2014
Streams vs Device Functions
Map
|
Sequential
Time
|
Device
functions
|
Streams
|
32x32
|
1
|
8.341714
|
21.75491
|
64x64
|
3.8
|
16.37582
|
47.9231
|
128x128
|
16.2
|
32.52248
|
133.1749
|
256x256
|
68.01
|
65.31222
|
409.7781
|
512x512
|
277.07
|
140.5209
|
19658.81
|
1024x1024
|
1146.38
|
399.7442
|
144157.2
|
2048x2048
|
4954.72
|
1745.794
|
462554.9
|
Cuda streams are a sequence of operations that execute in issue-order on the GPU. CUDA operations in different streams may run concurrently.
Device functions are functions which can be called from within kernel code and are executed on the GPU, they cannot be called from the host.
Sunday, 23 February 2014
780 vs Titan vs K40
Thanks to the help of some friends i was able to run the parallel dijkstra implementation on some different hardware all based on the kepler architecture.
Map
|
K40
|
780
|
Titan
|
32x32
|
8.341714
|
8.057811
|
8.645273
|
64x64
|
16.37582
|
15.68679
|
15.75337
|
128x128
|
32.52248
|
28.07575
|
28.05111
|
256x256
|
65.31222
|
54.97772
|
55.49513
|
512x512
|
140.5209
|
119.8173
|
119.7446
|
1024x1024
|
399.7442
|
355.7623
|
338.7315
|
2048x2048
|
1745.794
|
1589.061
|
1449.108
|
Tuesday, 18 February 2014
Meetings: Tuesday 18th February 2014
Went over the initial results from the parallel Dijkstra where the outer loop of the algorithm has been parallelized. The initial results are promising having seen a speed up of about 60% for the larger graph sizes.
Further work can be done to parallelize dijsktra through parallelization of the inner loop which should allow for a greater speed up.
Work can also be done to improve the frontier set selection process which could result in better performance.
The dates for the honours project have now changed as well. The poster presentation is now the Wednesday of week 13 and the hand in of the report is the 28th of April.
Further work can be done to parallelize dijsktra through parallelization of the inner loop which should allow for a greater speed up.
Work can also be done to improve the frontier set selection process which could result in better performance.
The dates for the honours project have now changed as well. The poster presentation is now the Wednesday of week 13 and the hand in of the report is the 28th of April.
Monday, 17 February 2014
Parallel Dijkstra
The initial version has only parallelized the outer loop of the algorithm and has not been done in the most efficient way however has provided a significant speed up for the larger graph sizes.
For each graph the source node was the top left and the goal node was the bottom right, both the parallel and sequential implementations found the same path
For each graph the source node was the top left and the goal node was the bottom right, both the parallel and sequential implementations found the same path
map
|
Sequential
Dijkstra (ms)
|
Parallel
Dijkstra (ms)
|
32x32
|
1
|
8.49
|
64x64
|
3.8
|
16.73
|
128x128
|
16.2
|
33.20
|
256x256
|
68.01
|
66.38
|
512x512
|
277.07
|
142.77
|
1024x1024
|
1146.38
|
405.50
|
2048x2048
|
4954.72
|
1763.95
|
Monday, 10 February 2014
Meetings: Monday 10th February 2014
The tesla K40 donated by nvidia arrived last friday and was fitted into the testing hardware, we discussed the results from the initial tests which were carried which involved running the parallel diffusion for a range of graph sizes and discussed the results of the survey that was handed out to several students asking them which path they felt was the most 'realistic' for an agent to take through a map.
The aim for the remaining 7 weeks of the project is to implement and test a parallel dijkstra and finish writing up the report.
Friday, 7 February 2014
Tesla K40
To help with the project Nvidia have kindly agreed to donate a Tesla K40 GPU accelerator to help with the project. Below are some pictures of the card.
before |
compared to the 560ti |
after |
Map
|
560ti
|
K40
|
32x32
|
0.10
|
0.30
|
64x64
|
0.22
|
0.58
|
128x128
|
0.85
|
1.58
|
256x256
|
8.45
|
8.3
|
512x512
|
121.94
|
89.6549
|
1024x1024
|
2158.84
|
1200.38
|
2048x2048
|
30841.5
|
18088
|
Centre as goal
Map
|
560ti
|
K40
|
32x32
|
0.19
|
0.57
|
64x64
|
0.51
|
1.37
|
128x128
|
2.88
|
5.52
|
256x256
|
32.91
|
33.61
|
512x512
|
498.59
|
378.37
|
1024x1024
|
9024.56
|
5062.17
|
2048x2048
|
133185.1
|
77539.1
|
Bottom right as goal
Thursday, 6 February 2014
Sunday, 2 February 2014
Diffusion vs Dijkstra
Size
|
Parallel Diffusion
(ms)
|
Sequential Diffusion (ms)
|
Sequential Dijkstra (ms)
|
32x32
|
0.19
|
0.7
|
1
|
64x64
|
0.51
|
3.7
|
3.8
|
128x128
|
2.88
|
62.01
|
16.2
|
256x256
|
32.91
|
1050.66
|
68.01
|
512x512
|
498.59
|
16949.22
|
277.07
|
1024x1024
|
9024.56
|
280763.5
|
1146.38
|
2048x2048
|
133185.1
|
4598425
|
4954.72
|
Time to fill varying grid sizes.
Map
|
Parallel
|
Sequential
|
32x32
|
31
|
31
|
64x64
|
75
|
68
|
128x128
|
261
|
224
|
256x256
|
1026
|
866
|
512x512
|
4175
|
3507
|
1024x1024
|
17176
|
14382
|
2048x2048
|
65091
|
59256
|
Iterations to fill all tiles with zero value
Map
|
Centre
as goal
|
Bottom
right as goal
|
32x32
|
16
|
31
|
64x64
|
32
|
75
|
128x128
|
76
|
261
|
256x256
|
262
|
1026
|
512x512
|
1018
|
4175
|
1024x1024
|
4089
|
17176
|
2048x2048
|
16589
|
65091
|
Iterations for parallel diffusion for different goals
Map
|
Centre
as goal (ms)
|
Bottom
right as goal (ms)
|
32x32
|
0.10
|
0.19
|
64x64
|
0.22
|
0.51
|
128x128
|
0.85
|
2.88
|
256x256
|
8.45
|
32.91
|
512x512
|
121.94
|
498.59
|
1024x1024
|
2158.84
|
9024.56
|
2048x2048
|
30841.5
|
133185.1
|
Map
|
Centre
as goal (ms)
|
Bottom
right as goal (ms)
|
32x32
|
0.90
|
1
|
64x64
|
3.8
|
3.8
|
128x128
|
16.1
|
16.2
|
256x256
|
66.31
|
68.01
|
512x512
|
277.27
|
277.07
|
1024x1024
|
1109.87
|
1146.38
|
2048x2048
|
4594.73
|
4954.72
|
Map
|
Centre as goal
|
Bottom right as goal
|
32x32
|
0.17
|
0.25
|
64x64
|
0.66
|
0.77
|
128x128
|
2.78
|
3.38
|
256x256
|
11.92
|
13.20
|
512x512
|
47.43
|
51.05
|
1024x1024
|
194.66
|
210.84
|
2048x2048
|
1097.17
|
1188.43
|
Map
|
2^10
|
2^20
|
2^50
|
2^100
|
32x32
|
16
|
16
|
16
|
16
|
64x64
|
32
|
32
|
32
|
32
|
128x128
|
76
|
73
|
66
|
64
|
256x256
|
262
|
245
|
207
|
170
|
512x512
|
1018
|
945
|
779
|
608
|
1024x1024
|
4089
|
3783
|
3091
|
2376
|
2048x2048
|
16589
|
15317
|
12459
|
9516
|
Map
|
2^10 (ms)
|
2^20 (ms)
|
2^50 (ms)
|
2^100 (ms)
|
32x32
|
0.1
|
0.1
|
0.1
|
0.1
|
64x64
|
0.22
|
0.22
|
0.22
|
0.22
|
128x128
|
0.85
|
0.82
|
0.74
|
0.71
|
256x256
|
8.45
|
7.9
|
6.68
|
5.49
|
512x512
|
121.94
|
112.967
|
93.21
|
72.75
|
1024x1024
|
2158.84
|
1990.4
|
1623.91
|
1248.21
|
2048x2048
|
30841.5
|
28490.8
|
23215.2
|
17686.3
|
Subscribe to:
Posts (Atom)