P
Pierre Lebel
Guest
I am testing how openmp (in Visual Studio) behaves when working with (double) arrays similar to those I use in an application.
The basic operation is very simple:
for(unsigned j = 0 ; j < numel ; j++) {
*c_addr++ += *a_addr++ * *b_addr++;
}
The problem is that when array size is under ~4000 adding threads scales well, with a speedup of about 5 using 6 threads.
When the array size is 20000 or more, only about 20% gains in speed (factor of 1.2) seem to be achieved.
The test uses 48 "a", "b" and "c" arrays and the openmp operates on each of them:
#pragma omp parallel for num_threads(numpr) shared(numel) private(i_pass, j, a_addr, b_addr, c_addr)
for(i_pass = 0 ; i_pass < 48 ; i_pass++) {
a_addr = p_A[i_pass];
b_addr = p_B[i_pass];
c_addr = p_C[i_pass];
for(unsigned j = 0 ; j < numel ; j++) {
*c_addr++ += *a_addr++ * *b_addr++;
}
}
Any thoughts??
Continue reading...
The basic operation is very simple:
for(unsigned j = 0 ; j < numel ; j++) {
*c_addr++ += *a_addr++ * *b_addr++;
}
The problem is that when array size is under ~4000 adding threads scales well, with a speedup of about 5 using 6 threads.
When the array size is 20000 or more, only about 20% gains in speed (factor of 1.2) seem to be achieved.
The test uses 48 "a", "b" and "c" arrays and the openmp operates on each of them:
#pragma omp parallel for num_threads(numpr) shared(numel) private(i_pass, j, a_addr, b_addr, c_addr)
for(i_pass = 0 ; i_pass < 48 ; i_pass++) {
a_addr = p_A[i_pass];
b_addr = p_B[i_pass];
c_addr = p_C[i_pass];
for(unsigned j = 0 ; j < numel ; j++) {
*c_addr++ += *a_addr++ * *b_addr++;
}
}
Any thoughts??
Continue reading...