Cuda, two streams created by a NPP function -
i'm working on image processing project cuda 7.5 , geforce gtx 650 ti. decided use 2 stream, 1 apply algorithms responsible enhance image , stream apply independent algorithm rest of processing.
i wrote example show problem. in example created stream , used nppsetstream.
i invoked function nppithreshold_ltvalgtval_32f_c1r 2 stream used when function executed.
here there's code example:
#include <npp.h> #include <cuda_runtime.h> #include <cuda_profiler_api.h> int main(void) { int srcwidth = 1344; int srcheight = 1344; int paddstride = 0; float* srcarraydevice; float* srcarraydevice2; unsigned char* dstarraydevice; int status = cudamalloc((void**)&srcarraydevice, srcwidth * srcheight * 4); status = cudamalloc((void**)&srcarraydevice2, srcwidth * srcheight * 4); status = cudamalloc((void**)&dstarraydevice, srcwidth * srcheight ); cudastream_t teststream; cudastreamcreatewithflags(&teststream, cudastreamnonblocking); nppsetstream(teststream); nppisize roisize = { srcwidth,srcheight }; //status = cudamemcpyasync(srcarraydevice, &srcarrayhost, srcwidth*srcheight*4, cudamemcpyhosttodevice, teststream); int yrect = 100; int xrect = 60; float thrl = 50; float thrh = 1500; nppisize sz = { 200, 400 }; (int = 0; < 10; i++) { int status3 = nppithreshold_ltvalgtval_32f_c1r(srcarraydevice + (srcwidth*yrect + xrect) , srcwidth * 4 , srcarraydevice2 + (srcwidth*yrect + xrect) , srcwidth * 4 , sz , thrl , thrl , thrh , thrh); } int length = (srcwidth + paddstride)*srcheight; int status6 = nppiscale_32f8u_c1r(srcarraydevice, srcwidth * 4, dstarraydevice + paddstride, srcwidth + paddstride, roisize, 0, 65535); //int status7 = cudamemcpyasync(dstpinptr, dsttest, length, cudamemcpydevicetohost, teststream); cudafree(srcarraydevice); cudafree(srcarraydevice2); cudafree(dstarraydevice); cudastreamdestroy(teststream); cudaprofilerstop(); return 0; }
this got nvidia visual profiler: image_width1344
why there 2 streams if set 1 stream? causes errors in original project i'm thinking switch single stream.
i noticed behaviour dependent size of image, if srcwidth , srcheight set 1500 result this:image_width1500.
why changing size of image produces stream?
why there 2 streams if setted [sic] 1 stream?
it appears nppithreshold_ltvalgtval_32f_c1r
creates own internal stream executing 1 of kernels uses. other launched either default stream, or stream specified nppsetstream
.
i think documentation oversight/user expectation problem. nppsetstream
doing says, stated library limited using 1 stream. should more explicit in documentation how many streams library uses internally, , how nppsetstream
interacts library. if problem application, suggest raise bug report nvidia.
why changing size of image produces stream?
my guess there performance heuristics @ work, , whether second stream used depends in image size. library closed source, however, can't sure.
Comments
Post a Comment