PhD candidate at Radboud University. thomas.koopman@ru.nl
(GPG: 964110A2)
Research Interests
Very generally speaking, I am interested in scientific computing.
More specifically, my work can be divided into three categories:
Parallel Algorithms. Specifically I have worked on parallel
algorithms for clusters, GPUs, multiprocessors, and vector processors
(SIMD).
Numerical Stability. This is a new interest of mine, about
getting a grip on the accuracy for numerical algorithms.
Productivity and Portability through
SaC. Single-Assignment
C (SaC) is a minimalistic language for scientific computing that does
not expose memory or parallelism. This makes it easy to use. The
compiler can generate parallel code for CPU, GPU, and clusters from a single
specification. If the parallelisation is not too sophisticated, the
performance is comparable to code written in C or Fortran.
Publications
Minimizing Communication in the Multidimensional FFT.
A new parallel algorithm for computing the discrete Fourier transform
on supercomputers.
bib
Rank-Polymorphism for Shape-Guided Blocking.
On showing how locality optimisations can be elegantly expressed in
a language with support for arrays of arbitrary dimension.
bib
Shray: An Owner-Compute Distributed Shared-Memory System.
A software system for easier programming of supercomputers.
bib
Modulo in high-performance code: strength reduction for
modulo-based array indexing in loops.
An optimization for programming languages that translates pretty ways
of programming a certain class of computations (stencils) to efficient
ways.
bib
Multi-GPU Code Generation for Out-Of-Core Problems.
A compiler backend for the programming language SaC that can leverage
multiple GPUs, also on data structures that do not fit on any single
GPU.
bib
Partitioning In-Place on Massively Parallel Systems.
An algorithm for splitting an array in two on a GPU, without needing
a significant amount of additional memory.
bib
Preprints
Category Theory for Supercomputing: The Tensor Product of
Linear BSP Algorithms
The theory used to derive the algorithm of
Minimizing Communication in the Multidimensional FFT.ArXiv
Comparing Functional Array Languages.
Large collaborative paper with Futhark, Accelerate, APL, DaCe, and SaC.
ArXiv
VQhull: a Fast Planar Quickhull
Parallel algorithm and implementation that avoids data movement.
ArXiv
Quickhull is Usually Forward Stable
Quickhull is an algorithm for computing the convex hull of points in a
plane that performs well in practice, but has poor complexity on
adversarial input. In this paper we show the same holds for the
numerical stability of Quickhull.
ArXiv
Software
VQhull.
Library for computing planar convex hulls.
The Fastest Fourier Transform of Utrecht.
The implementation corresponding to
Minimizing Communication in the Multidimensional FFT.
Does a high-dimensional FFT in the higher-dimensional cyclic
distribution. I wrote this when very new to programming, so can probably
be optimised and made more robust.
The Fastest Fourier Transform of Nijmegen.
A multithreaded library for computing the FFT (and inverse FFT)
for large powers of 2. A paper detailing the design and how this can
be generalized is in progress.