uncomplicate.clojurecuda.core

Core ClojureCUDA functions for CUDA host programming. The kernels should be provided as strings (that may be stored in files) or binaries, written in CUDA C/C++.

Where applicable, methods throw ExceptionInfo in case of errors thrown by the CUDA driver.

add-callback!

(add-callback! hstream callback data)(add-callback! hstream callback)

Adds a callback to a compute stream, with optional data related to the call. If data is not provided, places hstream under data.

See cuStreamAddCallback

blocks-count

(blocks-count block-size global-size)(blocks-count global-size)

Computes the number of blocks that are needed for the global size kernel execution.

callback

(callback ch)

Creates a stream callback that writes stream callback info into async channel ch. Available keys in callback info are :status and :data.

can-access-peer

(can-access-peer dev peer)

Queries if a device may directly access a peer device’s memory.

See cuDeviceCanAccessPeer

compile!

(compile! prog options)(compile! prog)

Compiles the given prog using a list of string options.

context

(context dev flag)(context dev)

Creates a CUDA context on the device using a keyword flag.

Valid flags are: :sched-auto, :sched-spin, :sched-yield, :sched-blocking-sync, :map-host, :lmem-resize-to-max. The default is none. Must be released after use.

Also see cuCtxCreate.

current-context

(current-context)

Returns the CUDA context bound to the calling CPU thread.

See cuCtxGetCurrent

current-context!

(current-context! ctx)

Binds the specified CUDA context ctx to the calling CPU thread.

See cuCtxSetCurrent

default-stream

device

(device id)(device)

Returns a device specified with its ordinal number or string id

device-count

(device-count)

Returns the number of CUDA devices on the system.

disable-peer-access!

(disable-peer-access! ctx)(disable-peer-access!)

Disables direct access to memory allocations in a peer context and unregisters any registered allocations.

See cuCtxDisablePeerAccess

elapsed-time

(elapsed-time start-event end-event)

Computes the elapsed time in milliseconds between start-event and end-event.

See cuEventElapsedTime

enable-peer-access!

(enable-peer-access! ctx)(enable-peer-access!)

Enables direct access to memory allocations in a peer context and unregisters any registered allocations.

See cuCtxEnablePeerAccess

event

(event)(event flag & flags)

Creates an event specified by keyword flags.

Available flags are :default, :blocking-sync, :disable-timing, and :interprocess.

See cuEventCreate

function

(function m name)

Returns CUDA kernel function named name located in module m.

See cuModuleGetFunction

global

(global m name)

Returns CUDA global linear memory named name from module m, with optionally specified size.

See cuModuleGetFunction

grid-1d

(grid-1d dim-x)(grid-1d dim-x block-x)

Creates a 1-dimensional GridDim record with grid and block dimensions x. Note: dim-x is the total number of threads globally, not the number of blocks.

grid-2d

(grid-2d dim-x dim-y)(grid-2d dim-x dim-y block-x block-y)

Creates a 2-dimensional GridDim record with grid and block dimensions x and y. Note: dim-x is the total number of threads globally, not the number of blocks.

grid-3d

(grid-3d dim-x dim-y dim-z)(grid-3d dim-x dim-y dim-z block-x block-y block-z)

Creates a 3-dimensional GridDim record with grid and block dimensions x, y, and z. Note: dim-x is the total number of threads globally, not the number of blocks.

in-context

macro

(in-context ctx & body)

Pushes the context ctx to the top of the context stack, evaluates the body with ctx as the current context, and pops the context from the stack. Does NOT release the context.

init

(init)

Initializes the CUDA driver.

launch!

(launch! fun grid-dim shared-mem-bytes hstream params)(launch! fun grid-dim hstream params)(launch! fun grid-dim params)

Invokes the kernel fun on a grid-dim grid of blocks, using parameters params.

Optionally, you can specify the amount of shared memory that will be available to each thread block, and hstream to use for execution.

See cuModuleGetFunction

load!

(load! m data)(load! m data options)

Load a module’s data from a ptx string, nvrtcProgram, java path, or a binary data, for already existing module.

See cuModuleGetFunction

make-parameters

(make-parameters len)

Creates an array of JCuda Pointers.

mem-alloc

(mem-alloc size)

Allocates the size bytes of memory on the device.

The old memory content is not cleared. size must be greater than 0.

See cuMemAlloc.

mem-alloc-host

(mem-alloc-host size)

Allocates size bytes of page-locked, ‘pinned’ on the host.

The memory is not cleared. size must be greater than 0.

See cuMemAllocHost.

mem-alloc-managed

(mem-alloc-managed size flag)(mem-alloc-managed size)

Allocates the size bytes of memory that will be automatically managed by the Unified Memory system, specified by a keyword flag.

Returns a CULinearMemory object. Valid flags are: :global, :host and :single (the default). The memory is not cleared. size must be greater than 0.

See cuMemAllocManaged.

mem-host-alloc

(mem-host-alloc size flags)(mem-host-alloc size)

Allocates size bytes of page-locked, ‘pinned’ on the host, using keyword flags. For available flags, see [constants/mem-host-alloc-flags]

Valid flags are: :portable, :devicemap and :writecombined. The default is none. The memory is not cleared. size must be greater than 0.

See cuMemHostAlloc.

mem-host-register

(mem-host-register memory flags)(mem-host-register memory)

Registers previously allocated Java memory structure and pins it, using keyword flags.

Valid flags are: :portable, and :devicemap. The default is none. The memory is not cleared.

See cuMemHostRegister.

mem-sub-region

(mem-sub-region mem origin byte-count)

Creates a CULinearMemory that references a sub-region of mem from origin to len.

memcpy!

(memcpy! src dst)(memcpy! src dst count-or-stream)(memcpy! src dst src-offset dst-offset count-or-stream)(memcpy! src dst byte-count hstream)(memcpy! src dst src-offset dst-offset byte-count hstream)

Copies byte-count or all possible device memory from src to dst. If hstream is supplied, executes asynchronously.

See cuMemcpy

memcpy-host!

(memcpy-host! src dst byte-count hstream)(memcpy-host! src dst arg)(memcpy-host! src dst)

Copies byte-count or all possible memory from src to dst, one of which has to be accessible from the host. If hstream is provided, the copy is asynchronous. A polymorphic function that figures out what needs to be done.

See cuMemcpyXtoY

memset!

(memset! cu-mem value)(memset! cu-mem value arg)(memset! cu-mem value len hstream)

Sets len or all 32-bit segments of cu-mem to 32-bit integer value. If hstream is provided, does this asynchronously.

See cuMemset32D

module

(module)(module data)(module data options)

Creates a new CUDA module and loads a string, nvrtc program, or binary data.

p2p-attribute

(p2p-attribute dev peer attribute)

Queries attributes of the link between two devices.

See cuDeviceGetP2PAttribute

parameters

(parameters parameter & parameters)

Creates an array of Pointers to CUDA params. params can be any object on device (CULinearMemory for example), or host (arrays, numbers) that makes sense as a kernel parameter per CUDA specification. Use the result as an parameterument in launch!.

pop-context!

(pop-context!)

Pops the current CUDA context ctx from the current CPU thread.

See cuCtxPopCurrent

program

(program name source-code headers)(program source-code headers)(program source-code)

Creates a CUDA program with an optional name from the source-code, and an optional hash map of headers (as strings) and their names.

program-log

(program-log prog)

Returns the log string generated by the previous compilation of prog.

ptx

(ptx prog)

Returns the PTX generated by the previous compilation of prog.

push-context!

(push-context! ctx)

Pushes a context ctx on the current CPU thread.

See cuCtxPushCurrent

ready?

(ready? obj)

Determines status (ready or not) of a compute stream or event obj.

See cuStreamQuery, and cuEventQuery

record!

(record! stream event)(record! event)

Records an event ev on optional stream.

See cuEventRecord

set-parameter!

(set-parameter! arr i parameter)

Sets the ith parameter in a parameter array arr

set-parameters!

(set-parameters! arr i parameter & parameters)

Sets the ith parameter in a parameter array arr and the rest of parameters in places after i.

stream

(stream)(stream flag)(stream priority flag)

Create a stream using an optional integer priority and a keyword flag.

Valid flags are :default and :non-blocking.

See cuStreamCreate

synchronize!

(synchronize!)(synchronize! hstream)

Block for the current context’s or stream’s tasks to complete.

wait-event!

(wait-event! hstream ev)

Makes a compute stream hstream wait on an event ev.

See cuStreamWaitEvent

with-context

macro

(with-context ctx & body)

Pushes the context ctx to the top of the context stack, evaluates the body, and pops the context from the stack. Releases the context.

with-default

macro

(with-default & body)

Initializes CUDA, creates the default context and executes the body in it.