PhD candidate at Radboud University. thomas.koopman@ru.nl
(GPG: 964110A2)
Research Interests
I do research in scientific computing on parallel computers. I enjoy
squeezing every flop out of specific algorithms with a variety of parallel
technologies, such as multithreading, SIMD, GPUs, and distributed-memory
computing.
This optimisation work is useful for functions that are used a lot, but it
is untenable to optimise all the scientific code that is out there for
different parallel architectures. For that reason, I also work on
hardware-agnostic programming. The aim of this, is to let the programmer
implement their algorithm once, and to let a compiler generate code that can
run efficiently on a variety of architectures. This requires a language that
can generate multiple types of parallel code (I work on Single-Assignment
C or SaC), and programming
techniques that let us describe data movement and parallelism at a higher
level of abstraction (I work on rank-polymorphism).
Publications
Minimizing Communication in the Multidimensional FFT.
A new parallel algorithm for computing the discrete Fourier transform
on supercomputers.
bib
Rank-Polymorphism for Shape-Guided Blocking.
On showing how locality optimisations can be elegantly expressed in
a language with support for arrays of arbitrary dimension.
bib
Shray: An Owner-Compute Distributed Shared-Memory System.
A software system for easier programming of supercomputers.
bib
Modulo in high-performance code: strength reduction for
modulo-based array indexing in loops.
An optimization for programming languages that translates pretty ways
of programming a certain class of computations (stencils) to efficient
ways.
bib
Multi-GPU Code Generation for Out-Of-Core Problems.
A compiler backend for the programming language SaC that can leverage
multiple GPUs, also on data structures that do not fit on any single
GPU.
bib
Partitioning In-Place on Massively Parallel Systems.
An algorithm for splitting an array in two on a GPU, without needing
a significant amount of additional memory.
bib
Preprints
Category Theory for Supercomputing: The Tensor Product of
Linear BSP Algorithms
The theory used to derive the algorithm of
Minimizing Communication in the Multidimensional FFT.ArXiv
Comparing Functional Array Languages.
Large collaborative paper with Futhark, Accelerate, APL, DaCe, and SaC.
ArXiv
VQhull: a Fast Planar Quickhull
Parallel algorithm and implementation that avoids data movement.
ArXiv
Quickhull is Usually Forward Stable
Quickhull is an algorithm for computing the convex hull of points in a
plane that performs well in practice, but has poor complexity on
adversarial input. In this paper we show the same holds for the
numerical stability of Quickhull.
ArXiv
Software
VQhull.
Library for computing planar convex hulls.
The Fastest Fourier Transform of Utrecht.
The implementation corresponding to
Minimizing Communication in the Multidimensional FFT.
Does a high-dimensional FFT in the higher-dimensional cyclic
distribution. I wrote this when very new to programming, so can probably
be optimised and made more robust.
The Fastest Fourier Transform of Nijmegen.
A multithreaded library for computing the FFT (and inverse FFT)
for large powers of 2. A paper detailing the design and how this can
be generalized is in progress.