Tuesday 25 February 2014

Meetings: Tuesday 25th February 2014

Discussed the results from attempting to parallelize the inner loop of the dijkstra application. Using cuda streams has resulted in a slow down of the application so further tests to determine why this is happening need to be carried out. Possible reason for this is that to simply create the streams on the GPU takes much longer than it does to simply evaluate an individual neighbour.

The paper 'A New GPU-based Approach to the shortest path problem' uses a similar method and reported a speedup of 220x, it may be worth while emailing and asking to see the source code to perform a comparison between their implementation and my own.

Monday 24 February 2014

Streams vs Device Functions

Map
Sequential Time
Device functions
Streams
32x32
1
8.341714
21.75491
64x64
3.8
16.37582
47.9231
128x128
16.2
32.52248
133.1749
256x256
68.01
65.31222
409.7781
512x512
277.07
140.5209
19658.81
1024x1024
1146.38
399.7442
144157.2
2048x2048
4954.72
1745.794
462554.9


Cuda streams are a sequence of operations that execute in issue-order on the GPU. CUDA operations in different streams may run concurrently. 

Device functions are functions which can be called from within kernel code and are executed on the GPU, they cannot be called from the host. 


Sunday 23 February 2014

780 vs Titan vs K40

Thanks to the help of some friends i was able to run the parallel dijkstra implementation on some different hardware all based on the kepler architecture.

Map
K40
780
Titan
32x32
8.341714
8.057811
8.645273
64x64
16.37582
15.68679
15.75337
128x128
32.52248
28.07575
28.05111
256x256
65.31222
54.97772
55.49513
512x512
140.5209
119.8173
119.7446
1024x1024
399.7442
355.7623
338.7315
2048x2048
1745.794
1589.061
1449.108


Tuesday 18 February 2014

Meetings: Tuesday 18th February 2014

Went over the initial results from the parallel Dijkstra where the outer loop of the algorithm has been parallelized. The initial results are promising having seen a speed up of about 60% for the larger graph sizes.

Further work can be done to parallelize dijsktra through parallelization of the inner loop which should allow for a greater speed up.

Work can also be done to improve the frontier set selection process which could result in better performance.

The dates for the honours project have now changed as well. The poster presentation is now the Wednesday of week 13 and the hand in of the report is the 28th of April.

Monday 17 February 2014

Parallel Dijkstra

The initial version has only parallelized the outer loop of the algorithm and has not been done in the most efficient way however has provided a significant speed up for the larger graph sizes.

For each graph the source node was the top left and the goal node was the bottom right, both the parallel and sequential implementations found the same path

map
Sequential Dijkstra (ms)
Parallel Dijkstra (ms)
32x32
1
8.49
64x64
3.8
16.73
128x128
16.2
33.20
256x256
68.01
66.38
512x512
277.07
142.77
1024x1024
1146.38
405.50
2048x2048
4954.72
1763.95


Monday 10 February 2014

Meetings: Monday 10th February 2014

The tesla K40 donated by nvidia arrived last friday and was fitted into the testing hardware, we discussed the results from the initial tests which were carried which involved running the parallel diffusion for a range of graph sizes and discussed the results of the survey that was handed out to several students asking them which path they felt was the most 'realistic' for an agent to take through a map.

The aim for the remaining 7 weeks of the project is to implement and test a parallel dijkstra and finish writing up the report.

Friday 7 February 2014

Tesla K40

To help with the project Nvidia have kindly agreed to donate a Tesla K40 GPU accelerator to help with the project. Below are some pictures of the card.

before




compared to the 560ti

after


Map
560ti
K40
32x32
0.10
0.30
64x64
0.22
0.58
128x128
0.85
1.58
256x256
8.45
8.3
512x512
121.94
89.6549
1024x1024
2158.84
1200.38
2048x2048
30841.5
18088
Centre as goal

Map
560ti
K40
32x32
0.19
0.57
64x64
0.51
1.37
128x128
2.88
5.52
256x256
32.91
33.61
512x512
498.59
378.37
1024x1024
9024.56
5062.17
2048x2048
133185.1
77539.1
Bottom right as goal




Sunday 2 February 2014

Diffusion vs Dijkstra

Size
Parallel Diffusion
(ms)
Sequential Diffusion (ms)
Sequential Dijkstra (ms)
32x32
0.19
0.7
1
64x64
0.51
3.7
3.8
128x128
2.88
62.01
16.2
256x256
32.91
1050.66
68.01
512x512
498.59
16949.22
277.07
1024x1024
9024.56
280763.5
1146.38
2048x2048
133185.1
4598425
4954.72
Time to fill varying grid sizes.




Map
Parallel
Sequential
32x32
31
31
64x64
75
68
128x128
261
224
256x256
1026
866
512x512
4175
3507
1024x1024
17176
14382
2048x2048
65091
59256
Iterations to fill all tiles with zero value

Map
Centre as goal
Bottom right as goal
32x32
16
31
64x64
32
75
128x128
76
261
256x256
262
1026
512x512
1018
4175
1024x1024
4089
17176
2048x2048
16589
65091
Iterations for parallel diffusion for different goals

Map
Centre as goal (ms)
Bottom right as goal (ms)
32x32
0.10
0.19
64x64
0.22
0.51
128x128
0.85
2.88
256x256
8.45
32.91
512x512
121.94
498.59
1024x1024
2158.84
9024.56
2048x2048
30841.5
133185.1
Time to find paths for different goal tiles parallel diffusion

Map
Centre as goal (ms)
Bottom right as goal (ms)
32x32
0.90
1
64x64
3.8
3.8
128x128
16.1
16.2
256x256
66.31
68.01
512x512
277.27
277.07
1024x1024
1109.87
1146.38
2048x2048
4594.73
4954.72
Time for dijkstra to find paths for different goals

Map
Centre as goal
Bottom right as goal
32x32
0.17
0.25
64x64
0.66
0.77
128x128
2.78
3.38
256x256
11.92
13.20
512x512
47.43
51.05
1024x1024
194.66
210.84
2048x2048
1097.17
1188.43
Time for A* to find paths for different goals

Map
2^10
2^20
2^50
2^100
32x32
16
16
16
16
64x64
32
32
32
32
128x128
76
73
66
64
256x256
262
245
207
170
512x512
1018
945
779
608
1024x1024
4089
3783
3091
2376
2048x2048
16589
15317
12459
9516
Number of iterations for parallel diffusion to find the goal at the centre when the goal value is different

Map
2^10 (ms)
2^20 (ms)
2^50 (ms)
2^100 (ms)
32x32
0.1
0.1
0.1
0.1
64x64
0.22
0.22
0.22
0.22
128x128
0.85
0.82
0.74
0.71
256x256
8.45
7.9
6.68
5.49
512x512
121.94
112.967
93.21
72.75
1024x1024
2158.84
1990.4
1623.91
1248.21
2048x2048
30841.5
28490.8
23215.2
17686.3
Time for parallel diffusion to find goal at the centre when the goal value is different