uncomplicate.clojurecuda.core

Core ClojureCUDA functions for CUDA host programming. The kernels should be provided as strings (that may be stored and read from files) or binaries, written in CUDA C/C++.

Many examples are available in ClojureCUDA core test. You can see how to write CUDA kernels here and here and examples of how to load them here.

For more advanced examples, please read the source code of the CUDA engine of Neanderthal linear algebra library (mainly general CUDA and cuBLAS are used there), and the Deep Diamond tensor and linear algebra library (for extensive use of cuDNN).

Here’s a categorized map of core functions. Most functions throw ExceptionInfo in case of errors thrown by the CUDA driver.

Device management: init, device-count, device.
Context management: context, current-context, current-context!, put-context!, push-context!, in-context, with-context, with-default.
Memory management: memcpy!, mumcpy-to-host!, memcpy-to-device!, memset!. mem-sub-region, mem-alloc-driver, mem-alloc-runtime, cuda-malloc, cuda-free! mem-alloc-pinned, mem-register-pinned!, mem-alloc-mapped,
Module management: link, link-complete!, load!, module.
Execution control: gdid-1d, grid-2d, grid-3d, global, set-parameter!, parameters, function, launch!.
Stream management: stream, default-stream, ready?, synchronize!, add-host-fn!, listen!, wait-event!, attach-mem!.
Event management: event, elapsed-time!, record!, can-access-peer, p2p-attribute, disable-peer-access!, enable-peer-access!.
NVRTC program JIT: program, program-log, compile!, ptx.

Please see CUDA Driver API for details not discussed in ClojureCUDA documentation.

add-host-fn!

(add-host-fn! hstream f data)(add-host-fn! hstream f)

Adds host function f to a compute stream, with optional data related to the call. If data is not provided, places hstream under data.

view source

attach-mem!

(attach-mem! hstream mem byte-size flag)(attach-mem! mem byte-size flag)

Attaches memory mem of size size, specified by flag to a hstream asynchronously. For available flags, see internal.constants/mem-attach-flags. Te default is :single. If :global flag is specified, the memory can be accessed by any stream on any device. If :host flag is specified, the program makes a guarantee that it won’t access the memory on the device from any stream on a device that has no concurrent-managed-access capability. If :single flag is specified and hStream is associated with a device that has no concurrent-managed-access capability, the program makes a guarantee that it will only access the memory on the device from hStream. It is illegal to attach singly to the nil stream, because the nil stream is a virtual global stream and not a specific stream. An error will be returned in this case.

When memory is associated with a single stream, the Unified Memory system will allow CPU access to this memory region so long as all operations in hStream have completed, regardless of whether other streams are active. In effect, this constrains exclusive ownership of the managed memory region by an active GPU to per-stream activity instead of whole-GPU activity.

See CUDA Stream Management.

view source

can-access-peer

(can-access-peer dev peer)

Queries if a device may directly access a peer device’s memory. See CUDA Peer Access Management

view source

compile!

(compile! prog options)(compile! prog)

Compiles the given prog using a list of string options.

view source

context

(context dev flag)(context dev)

Creates a CUDA context on the device using a keyword flag. For available flags, see internal.constants/ctx-flags. The default is none. The context must be released after use.

See CUDA Context Management.

view source

cuda-free!

(cuda-free! dptr)

Frees the runtime device memory that has been created by cuda-malloc. See CUDA Runtime API Memory Management

view source

cuda-malloc

(cuda-malloc byte-size)(cuda-malloc byte-size type)

Returns a Pointer to byte-size bytes of uninitialized memory that will be automatically managed by the Unified Memory system. The pointer is managed by the CUDA runtime API. Optionally, accepts a type of the pointer as a keyword (:float or Float/TYPE for FloatPointer, etc.). This pointer has to be manually released by cuda-free!. For a more seamless experience, use the wrapper provided by the mem-alloc-runtime function. See CUDA Runtime API Memory Management

view source

current-context

(current-context)

Returns the CUDA context bound to the calling CPU thread. See CUDA Context Management.

view source

current-context!

(current-context! ctx)

Binds the specified CUDA context ctx to the calling CPU thread. See CUDA Context Management.

view source

default-stream

view source

device

(device id)(device)

Returns a device specified with its ordinal number id or string PCI Bus id. See CUDA Device Management.

view source

device-count

(device-count)

Returns the number of CUDA devices on the system. See CUDA Device Management.

view source

disable-peer-access!

(disable-peer-access! ctx)(disable-peer-access!)

Disables direct access to memory allocations in a peer context and unregisters any registered allocations. See CUDA Peer Access Management

view source

elapsed-time!

(elapsed-time! start-event end-event)

Computes the elapsed time in milliseconds between start-event and end-event. See CUDA Event Management

view source

enable-peer-access!

(enable-peer-access! ctx)(enable-peer-access!)

Enables direct access to memory allocations in a peer context and unregisters any registered allocations. See CUDA Peer Access Management

view source

event

(event)(event flag & flags)

Creates an event specified by keyword flags. For available flags, see internal.constants/event-flags. See CUDA Event Management

view source

function

(function m name)

Returns CUDA kernel function named name located in module m. See CUDA Module Management

view source

global

(global m name)

Returns CUDA global device memory object named name from module m. Global memory is typically defined in C++ source files of CUDA kernels. See CUDA Module Management

view source

grid-1d

(grid-1d dim-x)(grid-1d dim-x block-x)

Creates a 1-dimensional GridDim record with grid and block dimensions x. Note: dim-x is the total number of threads globally, not the number of blocks.

view source

grid-2d

(grid-2d dim-x dim-y)(grid-2d dim-x dim-y block-x block-y)

Creates a 2-dimensional GridDim record with grid and block dimensions x and y. Note: dim-x is the total number of threads globally, not the number of blocks.

view source

grid-3d

(grid-3d dim-x dim-y dim-z)(grid-3d dim-x dim-y dim-z block-x block-y block-z)

Creates a 3-dimensional GridDim record with grid and block dimensions x, y, and z. Note: dim-x is the total number of threads globally, not the number of blocks.

view source

in-context

macro

(in-context ctx & body)

Pushes the context ctx to the top of the context stack, evaluates the body with ctx as the current context, and pops the context from the stack. Does NOT release the context, unlike with-context. See CUDA Context Management.

view source

init

(init)

Initializes the CUDA driver. This function must be called before any other function from ClojureCUDA in the current process. See CUDA Initialization

view source

launch!

(launch! fun grid-dim shared-mem-bytes hstream params)(launch! fun grid-dim hstream params)(launch! fun grid-dim params)

Invokes the kernel fun on a grid-dim grid of blocks, usinng params PointerPointer. Optionally, you can specify the amount of shared memory that will be available to each thread block, and hstream to use for execution. See CUDA Module Management

view source

link

(link data options)(link data)(link)

Invokes the CUDA linker on data provided as a vector [[type source <options> <name>], ...]. Produces a cubin compiled for a particular Nvidia architecture. Please see relevant examples from the test folder. See CUDA Module Management

view source

link-complete!

(link-complete! link-state)

Completes the link state created by link, so that it can be loaded by the module function. Please see relevant examples from the test folder.

view source

listen!

(listen! hstream ch data)(listen! hstream ch)

Adds a host function listener to a compute stream, with optional data related to the call, and connects it to a Clojure channel chan. If data is not provided, places hstream under data.

view source

load!

(load! m data)

Load module’s data from a ptx string, nvrtc program, java path, or binary data. Please see relevant examples from the test folder. See CUDA Module Management

view source

mem-alloc-driver

(mem-alloc-driver byte-size flag)(mem-alloc-driver byte-size)

Allocates the byte-size bytes of uninitialized memory that will be automatically managed by the Unified Memory system, specified by a keyword flag. For available flags, see internal.constants/mem-attach-flags. Returns a CUDA device memory object, which can NOT be extracted as a Pointer, but can be accessed directly through its address in the device memory. See CUDA Driver API Memory Management

view source

mem-alloc-mapped

(mem-alloc-mapped byte-size)(mem-alloc-mapped byte-size type)

Allocates byte-size bytes of uninitialized host memory, ‘mapped’ to the device. Optionally, accepts a type of the pointer as a keyword (:float or Float/TYPE for FloatPointer, etc.). Mapped memory is optimized for the memcpy! operation, while ‘pinned’ memory is optimized for memcpy-host!. See CUDA Driver API Memory Management

view source

mem-alloc-pinned

(mem-alloc-pinned byte-size)(mem-alloc-pinned byte-size type-or-flags)(mem-alloc-pinned byte-size type flags)

Allocates byte-size bytes of uninitialized page-locked memory, ‘pinned’ on the host, using keyword flags. For available flags, see internal.constants/mem-host-alloc-flags; the default is :none. Optionally, accepts a type of the pointer as a keyword (:float or Float/TYPE for FloatPointer, etc.). Pinned memory is optimized for the memcpy-host! function, while ‘mapped’ memory is optimized for memcpy!. See CUDA Device Driver API Memory Management

view source

mem-alloc-runtime

(mem-alloc-runtime byte-size type)(mem-alloc-runtime byte-size)

Allocates the byte-size bytes of uninitialized memory that will be automatically managed by the Unified Memory system. Returns a CUDA device memory object managed by the CUDA runtime API, which can be extracted as a Pointer. Equivalent unwrapped Pointer can be created by cuda-malloc. See CUDA Runtime API Memory Management

view source

mem-register-pinned!

(mem-register-pinned! memory flags)(mem-register-pinned! memory)

Registers previously instantiated host pointer, ‘pinned’ from the device, using keyword flags. For available flags, see internal.constants/mem-host-register-flags; the default is :none. Returns the pinned object equivalent to the one created by mem-alloc-pinned. Pinned memory is optimized for the memcpy-host! function, while ‘mapped’ memory is optimized for memcpy!. See CUDA Device Driver API Memory Management

view source

mem-sub-region

(mem-sub-region mem origin byte-count)(mem-sub-region mem origin)

Creates CUDA device memory object that references a sub-region of mem from origin to byte-count, or maximum available byte size.

view source

memcpy!

(memcpy! src dst)(memcpy! src dst byte-count-or-stream)(memcpy! src dst byte-count hstream)

Copies byte-count or maximum available device memory from src to dst. TODO mapped, pinned If hstream is provided, executes asynchronously. See CUDA Memory Management

view source

memcpy-host!

(memcpy-host! src dst byte-count hstream)(memcpy-host! src dst count-or-stream)(memcpy-host! src dst)

Copies byte-count or all possible memory from src to dst, one of which has to be accessible from the host. If hstream is provided, executes asynchronously. A polymorphic function that figures out what needs to be done. Supports everything except pointers created by cuda-malloc!. See CUDA Memory Management

view source

memcpy-to-device!

(memcpy-to-device! src dst byte-count hstream)(memcpy-to-device! src dst count-or-stream)(memcpy-to-device! src dst)

Copies byte-count or all possible memory from host src to device dst. Useful when src or dst is a generic pointer for which it cannot be determined whether it manages memory on host or on device (see cuda-malloc!). If hstream is provided, executes asynchronously. See CUDA Memory Management

view source

memcpy-to-host!

(memcpy-to-host! src dst byte-count hstream)(memcpy-to-host! src dst count-or-stream)(memcpy-to-host! src dst)

Copies byte-count or maximum available memory from device src to host dst. Useful when src or dst is a generic pointer for which it cannot be determined whether it manages memory on host or on device (see cuda-malloc!). If hstream is provided, executes asynchronously. See CUDA Memory Management

view source

memset!

(memset! dptr value)(memset! dptr value n-or-hstream)(memset! dptr value n hstream)

Sets n elements or all segments of dptr memory to value (supports all Java primitive number types except double, and long with value larger than Integer/MAX_VALUE). If hstream is provided, executes asynchronously. See CUDA Memory Management

view source

module

(module)(module data)

Creates a new CUDA module and loads a string, nvrtc program, or binary data. See CUDA Module Management

view source

p2p-attribute

(p2p-attribute dev peer attribute)

Queries attributes of the link between two devices. See CUDA Peer Access Management

view source

parameters

(parameters parameter & parameters)

Creates an PointerPointers to CUDA parameter’s. parameter can be any object on device (Device API memory, Runtime API memory, JavaCPP pointers), or host (arrays, numbers, JavaCPP pointers) that makes sense as a kernel parameter per CUDA specification. Use the result as a parameter argument in launch!.

view source

pop-context!

(pop-context!)

Pops the current CUDA context ctx from the current CPU thread. See CUDA Context Management.

view source

program

(program name source-code headers)(program source-code headers)(program source-code)

Creates a CUDA program from the source-code, with an optional name and an optional hash map of headers (as strings) and their names.

view source

program-log

(program-log prog)

Returns the log string generated by the previous compilation of prog.

view source

ptx

(ptx prog)

Returns the PTX generated by the previous compilation of prog.

view source

push-context!

(push-context! ctx)

Pushes a context ctx on the current CPU thread. See CUDA Context Management.

view source

ready?

(ready? obj)

Determines status (ready or not) of a compute stream or event obj. See CUDA Stream Management and CUDA Event Management

view source

record!

(record! stream event)(record! event)

Records an even! ev on optional stream. See CUDA Event Management

view source

set-parameter!

(set-parameter! pp i parameter & parameters)

Sets the ith parameter in a parameter array pp and the rest of parameters in places after i.

view source

stream

(stream)(stream flag)(stream priority flag)

Creates a stream using an optional integer priority and a keyword flag. For available flags, see internal.constants/stream-flags See CUDA Stream Management

view source

synchronize!

(synchronize!)(synchronize! hstream)

Blocks the current thread until the context’s or hstream’s tasks complete.

view source

wait-event!

(wait-event! hstream ev)

Makes a compute stream hstream wait on an event ev. See CUDA Event Management

view source

with-context

macro

(with-context ctx & body)

Pushes the context ctx to the top of the context stack, evaluates the body, and pops the context from the stack. Releases the context, unlike in-context. See CUDA Context Management.

view source

with-default

macro

(with-default & body)

Initializes CUDA, creates the default context and executes the body in it. See CUDA Context Management.

view source

Generated by Codox with RDash UI theme

Clojurecuda 0.19.0

Project

Namespaces

Public Vars

uncomplicate.clojurecuda.core

add-host-fn!

attach-mem!

can-access-peer

compile!

context

cuda-free!

cuda-malloc

current-context

current-context!

default-stream

device

device-count

disable-peer-access!

elapsed-time!

enable-peer-access!

event

function

global

grid-1d

grid-2d

grid-3d

in-context

macro

init

launch!

link

link-complete!

listen!

load!

mem-alloc-driver

mem-alloc-mapped

mem-alloc-pinned

mem-alloc-runtime

mem-register-pinned!

mem-sub-region

memcpy!

memcpy-host!

memcpy-to-device!

memcpy-to-host!

memset!

module

p2p-attribute

parameters

pop-context!

program

program-log

ptx

push-context!

ready?

record!

set-parameter!

stream

synchronize!

wait-event!

with-context

macro

with-default

macro