uncomplicate.clojurecuda.core

Core ClojureCUDA functions for CUDA host programming. The kernels should be provided as strings (that may be stored in files) or binaries, written in CUDA C/C++.

Where applicable, methods throw ExceptionInfo in case of errors thrown by the CUDA driver.

*context*

dynamic

Dynamic var for binding the default context.

add-callback!

(add-callback! hstream callback data)(add-callback! hstream callback)

Adds a StreamCallback to a compute stream, with optional data related to the call.

See cuStreamAddCallback

callback

(callback ch)

Creates a StreamCallback that writes StreamCallbackInfo into async channel ch.

can-access-peer

(can-access-peer dev peer)

Queries if a device may directly access a peer device’s memory.

See cuDeviceCanAccessPeer

context

(context dev flag)(context dev)

Creates a CUDA context on the device using a keyword flag.

Valid flags are: :sched-aulto, :sched-spin, :sched-yield, :sched-blocking-sync, :map-host, :lmem-resize-to-max. The default is none. Must be released after use.

Also see cuCtxCreate.

context*

(context* dev flags)

Creates a CUDA context on the device using a raw integer flag. For available flags, see constants/ctx-flags.

current-context

(current-context)

Returns the CUDA context bound to the calling CPU thread.

See cuCtxGetCurrent

current-context!

(current-context! ctx)

Binds the specified CUDA context to the calling CPU thread.

See cuCtxSetCurrent

default-stream

The default per-thread stream

device

(device id)(device)

Returns a device specified with its ordinal number or string id

device-count

(device-count)

Returns the number of CUDA devices on the system.

disable-peer-access!

(disable-peer-access! ctx)(disable-peer-access!)

Disables direct access to memory allocations in a peer context and unregisters any registered allocations.

See cuCtxDisablePeerAccess

elapsed-time

(elapsed-time start-event end-event)

Computes the elapsed time in milliseconds between start-event and end-event.

See cuEventElapsedTime

enable-peer-access!

(enable-peer-access! ctx)(enable-peer-access!)

Enables direct access to memory allocations in a peer context and unregisters any registered allocations.

See cuCtxEnablePeerAccess

event

(event)(event flag & flags)

Creates an event specified by keyword flags.

Available flags are :default, :blocking-sync, :disable-timing, and :interprocess.

See cuEventCreate

event*

(event* flags)

Creates an event specified by integer flags.

See cuEventCreate

function

(function m name)

Returns CUDA kernel function named name from module m.

See cuModuleGetFunction

global

(global m name)

Returns CUDA global CULinearMemory named name from module m, with optionally specified size..

See cuModuleGetFunction

grid-1d

(grid-1d dim-x)(grid-1d dim-x block-x)

Creates a 1-dimensional GridDim record with grid and block dimensions x. Note: dim-x is the total number of threads globally, not the number of blocks.

grid-2d

(grid-2d dim-x dim-y)(grid-2d dim-x dim-y block-x)(grid-2d dim-x dim-y block-x block-y)

Creates a 2-dimensional GridDim record with grid and block dimensions x and y. Note: dim-x is the total number of threads globally, not the number of blocks.

grid-3d

(grid-3d dim-x dim-y dim-z)(grid-3d dim-x dim-y dim-z block-x)(grid-3d dim-x dim-y dim-z block-x block-y block-z)

Creates a 3-dimensional GridDim record with grid and block dimensions x, y, and z. Note: dim-x is the total number of threads globally, not the number of blocks.

init

(init)

Initializes the CUDA driver.

launch!

(launch! fun grid-dim shared-mem-bytes hstream params)(launch! fun grid-dim hstream params)(launch! fun grid-dim params)

Invokes the kernel fun on a grid-dim grid of blocks, using parameters params.

Optionally, you can specify the amount of shared memory that will be available to each thread block, and hstream to use for execution.

See cuModuleGetFunction

load!

(load! m data)

Load a module’s data from a ntrtc/ptx string, nvrtcProgram, or a binary data, for already existing module.

See cuModuleGetFunction

mem-alloc

(mem-alloc size)

Allocates the size bytes of memory on the device. Returns a CULinearMemory object.

The memory is not cleared. size must be greater than 0.

See cuMemAlloc.

mem-alloc-host

(mem-alloc-host size)

Allocates size bytes of page-locked, ‘pinned’ on the host.

The memory is not cleared. size must be greater than 0.

See cuMemAllocHost.

mem-alloc-managed

(mem-alloc-managed size flag)(mem-alloc-managed size)

Allocates the size bytes of memory that will be automatically managed by the Unified Memory system, specified by a keyword flag.

Returns a CULinearMemory object. Valid flags are: :global, :host and :single (the default). The memory is not cleared. size must be greater than 0.

See cuMemAllocManaged.

mem-alloc-managed*

(mem-alloc-managed* size flag)

Allocates the size bytes of memory that will be automatically managed by the Unified Memory system, specified by an integer flag.

Returns a CULinearmemory object. The memory is not cleared. size must be greater than 0.

See cuMemAllocManaged.

mem-host-alloc

(mem-host-alloc size flags)(mem-host-alloc size)

Allocates size bytes of page-locked, ‘pinned’ on the host, using keyword flags. For available flags, see [constants/mem-host-alloc-flags]

Valid flags are: :portable, :devicemap and :writecombined. The default is none. The memory is not cleared. size must be greater than 0.

See cuMemHostAlloc.

mem-host-alloc*

(mem-host-alloc* size flags)

Allocates size bytes of page-locked, ‘pinned’ on the host, using raw integer flags. For available flags, see [constants/mem-host-alloc-flags]

The memory is not cleared. size must be greater than 0.

See cuMemHostAlloc.

mem-host-register

(mem-host-register memory flags)(mem-host-register memory)

Registers previously allocated Java memory structure and pins it, using keyword flags.

Valid flags are: :portable, and :devicemap. The default is none. The memory is not cleared.

See cuMemHostRegister.

mem-host-register*

(mem-host-register* memory flags)

Registers previously allocated Java memory structure and pins it, using raw integer flags.

See cuMemHostRegister.

memcpy!

(memcpy! src dst byte-count)(memcpy! src dst)

Copies byte-count or all possible device memory from src to dst.

See cuMemcpy

memcpy-host!

(memcpy-host! src dst byte-count hstream)(memcpy-host! src dst arg)(memcpy-host! src dst)

Copies byte-count or all possible memory from src to dst, one of which has to be accessible from the host. If hstream is provided, the copy is asynchronous. Polymorphic function that figures out what needs to be done.

See cuMemcpyXtoY

memset!

(memset! cu-mem value)(memset! cu-mem value arg)(memset! cu-mem value len hstream)

Sets len or all 32-bit segments of cu-mem to 32-bit integer value. If hstream is provided, does this asynchronously.

See cuMemset32D

module

(module)(module data)

Creates a new CUDA module and loads a string, nvrtcProgram, or a binary data.

p2p-attribute

(p2p-attribute dev peer attribute)

Queries attributes of the link between two devices.

See cuDeviceGetP2PAttribute

p2p-attribute*

(p2p-attribute* dev peer attribute)

Queries attributes of the link between two devices.

See cuDeviceGetP2PAttribute

parameters

(parameters & params)

Creates a Pointer to an array of Pointers to CUDA params. params can be any object on device (CULinearMemory for example), or host (arrays, numbers) that makes sense as a kernel parameter per CUDA specification. Use the result as an argument in launch!.

pop-context!

(pop-context!)

Pops the current CUDA context from the current CPU thread.

See cuCtxPopCurrent

push-context!

(push-context! ctx)

Pushes a context on the current CPU thread.

See cuCtxPushCurrent

ready?

(ready? obj)

Determine status (ready or not) of a compute stream or event.

See cuStreamQuery, and cuEventQuery

record!

(record! stream event)(record! event)

Records an event ev on optional stream.

See cuEventRecord

stream

(stream)(stream flag)(stream priority flag)

Create a stream using an optional integer priority and a keyword flag.

Valid flags are :default and :non-blocking.

See cuStreamCreate

stream*

(stream* flag)(stream* priority flag)

Create a stream using an optional priority and an integer flag.

See cuStreamCreate

synchronize!

(synchronize!)(synchronize! hstream)

Block for the current context’s or stream’s tasks to complete.

wait-event!

(wait-event! hstream ev)

Make a compute stream hstream wait on an event `ev

See cuStreamWaitEvent

with-context

macro

(with-context context & body)

Dynamically binds context to the default context *context*, and evaluates the body with the binding. Releases the context in the finally block.

Take care not to release that context again in some other place; JVM might crash.