Cuda, two streams created by a NPP function -


i'm working on image processing project cuda 7.5 , geforce gtx 650 ti. decided use 2 stream, 1 apply algorithms responsible enhance image , stream apply independent algorithm rest of processing.

i wrote example show problem. in example created stream , used nppsetstream.

i invoked function nppithreshold_ltvalgtval_32f_c1r 2 stream used when function executed.

here there's code example:

#include <npp.h> #include <cuda_runtime.h> #include <cuda_profiler_api.h>  int main(void) {  int srcwidth = 1344; int srcheight = 1344; int paddstride = 0; float* srcarraydevice; float* srcarraydevice2; unsigned char* dstarraydevice;  int status = cudamalloc((void**)&srcarraydevice, srcwidth * srcheight * 4); status = cudamalloc((void**)&srcarraydevice2, srcwidth * srcheight * 4); status = cudamalloc((void**)&dstarraydevice, srcwidth * srcheight );  cudastream_t teststream; cudastreamcreatewithflags(&teststream, cudastreamnonblocking); nppsetstream(teststream);  nppisize roisize = { srcwidth,srcheight }; //status = cudamemcpyasync(srcarraydevice, &srcarrayhost, srcwidth*srcheight*4, cudamemcpyhosttodevice, teststream);  int yrect = 100; int xrect = 60; float thrl = 50; float thrh = 1500; nppisize sz = { 200, 400 };  (int = 0; < 10; i++) {     int status3 = nppithreshold_ltvalgtval_32f_c1r(srcarraydevice + (srcwidth*yrect + xrect)         , srcwidth * 4         , srcarraydevice2 + (srcwidth*yrect + xrect)         , srcwidth * 4         , sz         , thrl         , thrl         , thrh         , thrh); }  int length = (srcwidth + paddstride)*srcheight; int status6 = nppiscale_32f8u_c1r(srcarraydevice, srcwidth * 4, dstarraydevice + paddstride, srcwidth + paddstride, roisize, 0, 65535);  //int status7 = cudamemcpyasync(dstpinptr, dsttest, length, cudamemcpydevicetohost, teststream); cudafree(srcarraydevice); cudafree(srcarraydevice2); cudafree(dstarraydevice); cudastreamdestroy(teststream); cudaprofilerstop(); return 0; } 

this got nvidia visual profiler: image_width1344

why there 2 streams if set 1 stream? causes errors in original project i'm thinking switch single stream.

i noticed behaviour dependent size of image, if srcwidth , srcheight set 1500 result this:image_width1500.

why changing size of image produces stream?

why there 2 streams if setted [sic] 1 stream?

it appears nppithreshold_ltvalgtval_32f_c1r creates own internal stream executing 1 of kernels uses. other launched either default stream, or stream specified nppsetstream.

i think documentation oversight/user expectation problem. nppsetstream doing says, stated library limited using 1 stream. should more explicit in documentation how many streams library uses internally, , how nppsetstream interacts library. if problem application, suggest raise bug report nvidia.

why changing size of image produces stream?

my guess there performance heuristics @ work, , whether second stream used depends in image size. library closed source, however, can't sure.


Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -