# Projects list

This page contains a list of programming oriented projects suitable for students of mathematics or computer science. If you are interested in working on a project please write to lmnd-devel@googlegroups.com as well as the contact address provided in the project description. In case the projects need more time than you have available, feel free to contact the mentors to discuss possible ways to break them into smaller pieces.

This list is intended to be newcomer friendly. New ideas are reviewed and improved on the new projects page, before being transferred here. In order to suggest a new project, please add a description with as much information as possible to the new projects page, and write to lmnd-devel@googlegroups.com

# Software infrastructure projects

## Use OpenGL for blending layers in Volumina

Volumina is a slicing viewer for large 3D/4D datasets, written in Python. Huge volumetric datasets (terrabytes of data) are becoming increasingly common in many sciences. Volumina allows to display and navigate through slicing views of this data, which resides on the hard disk but is too large for main memory. The current implementation can stream tiles from hdf5 files using h5py, processes the slices using numpy transformations, and displays them as grayscale or color-mapped data layers in a PyQt widget.

At the moment Volumina uses a background layer QGraphicsView to draw images to the screen after manually blending together data from different sources. Instead of doing some post-processing and the blending in Python using numpy, we would prefer to make use of the graphics card. The content of each layer in each tile is computed on demand asynchronously by the computational backend lazyflow. In Volumina each tile keeps track of progress and indicates whether the currently displayed content is up to date. This needs to be taken care of when reimplementing the display component.

The idea of this project is to improve the display performance by reimplementing the tiled image display component in PyOpenGL. We intend to make the faster Volumina available via PyPI.

• level of detail - zooming support
• arbitrary slicing: we usually look at slices of the volume along the x,y and z axes - but we want to be able to rotate the camera and see slices in arbitrary directions, which needs specialized data loading

skills/prerequisites: Python, OpenGL, a bit of PyQt

mentors: Thorben Kröger, Carsten Haubold

Contact: ilastik-gsoc@ilastik.org

## Multiscale views based on pyramids in volumina

Most modern image viewers use a simple trick to give the user an approximation of the image without waiting for all the data to be loaded and processed. For instance, when loading a picture in the browser, we notice that it gradually gets sharper. One way to achieve this is to use multi resolution image Pyramids.

Currently, volumina always displays images in full resolution. This is sometimes not necessary, e.g. when moving through the volume along an axis where each slice is only displayed for a short time, or when zooming out. The machine learning and prediction algorithms used by ilastik are designed to work on full resolution, but the display pipeline can be overhauled to allow for multiscale image pyramids (2D/3D). Volumina should automatically compute the image pyramids, cache them, and use them for displaying. The viewer can then improve the displayed quality progressively if needed, by choosing higher quality pyramid levels (e.g. if the user stopped moving through the volume, or is zooming in).

• For example, ImageJ2 already defines an HDF5 document layout for pyramids, and we might be able to use this design as well.

• Since VIGRA already provides 2-dimensional pyramids, it may be natural to generalize them to higher dimensions instead of starting from scratch.

skills/prerequisites: Python, optionally C++

mentors: Thorben Kröger, Stuart Berg

contact: ilastik-gsoc@ilastik.org

## Improve 2d/3d object browser in volumina

Although ilastik can segment and classify 3D objects, there is no way to display them in 3D except in the carving workflow. For example in the object classification workflow, we need a way to view selected objects, all objects or all objects with the same label, in the whole dataset or limited to a region of interest. To display the 3D carving results, our viewer volumina simply uses VTK's Python bindings. One could either extend this functionality to make it accessible in all views, or replace it with a better alternative.

An important component of this task is the decision of which data to select and how to display it. Object classification results in binary masks for large 3D volumes. One could extract single objects from this mask and generate meshes (e.g. using Marching Cubes), or use a ray casting approach to display the iso-surface. The user should be able to navigate the 3D view in sync with the slice views as seen in the screenshot above. A double click in the 3D display should center the slice views on this position.

Additional ideas: It would be nice to be able to overlay the rendered 3D object in the 2D slice views to give the user an impression of the 3D structure of the contour he sees in the slices.

skills/prerequisites: Python, some 3D rendering / ray casting experience

mentors: Thorben Kröger, Stuart Berg, Carsten Haubold

contact: ilastik-gsoc@ilastik.org

## User Experience Improvements in ilastik

ilastik aims to enable non-experts to apply machine learning algorithms to their image processing problems. While user experience is a big part of this goal, the development team mainly consists of researchers who cannot dedicate a lot of time to designing intuitive user interfaces or supporting a wide range of input / output formats. This project provides a good opportunity to become familiar with several image processing / machine learning algorithms while working on the current user interface of ilastik.

The following would make valuable improvements to the user experience of ilastik (different subsets can make a GSoC project, talk to mentors about details of each and possible timelines):

• Currently, object classification of ilastik only returns labels assigned to objects. Many users are requesting more: volumetric measurements of objects, simple statistics, export of meshes. (e.g. fiji) has a nice set of statistics)

• Navigate to frames with labels
• Import labels from other tools
• Workflow graph viewer / editor
• Export and allow batch processing for intermediate results
• Variable importance for classification features
• Overview of prediction results as a histogram of classes over time or over the stack
• Feature groupings for easier navigation and selection

skills/prerequisites: Python, familiarity with PyQt

mentors: Anna Kreshuk, Stuart Berg, Burcin Erocal

contact: ilastik-gsoc@ilastik.org

# Mathematical algorithms & data structures projects

## fplll: Add intermediate floating-point arithmetic(s), between double precision and multi-precision

fplll internally relies on floating-point arithmetic for the underlying Gram-Schmidt orthogonalisation. It tries to perform the computations in machine floating-point numbers: doubles, with 53 bits of precision, or long doubles which sometimes enjoy 64 bits of precision. If that does not prove sufficient, it switches to multi-precision arithmetic (MPFR), and increases the precision progressively, until it seems numerically stable for that precision. A drawback of this approach is that MPFR is much slower than machine floating-point numbers, creating a huge run-time gap between instances that can be run on machine precision, and instances that require larger precision.

The project is to add annother floating-point arithmetic type, with intermediate precision (such as double-double precision) and intermediate efficiency. This should then be combined with dpe and the so-called "fast" Gram-Schmidt approach, which aim at extending the range of exponents. And these arithmetics should finally be incorporated in the fplll wrapper.

mentors: Martin Albrecht, Damien Stéhle

## fplll: Implement a test-suite

Most of the code of fplll is not automatically tested. Sometimes, simple modifications result lead to subtle bugs in cases that are not tried by the main developpers, and that are found out by users much later.

The project would be to add unit-tests to fplll to give more assurance that a code change doesn't break too much of the code. The tests should be devised so that the different arithmetics, the different wrapper scenarios, and also the different main functions (LLL, BKZ, SVP) are tested. This could be extended to adding some standard-ish benchmarks which make it easier to decide if a code modification improves or degrades run-times (for “typical” inputs.)

mentors: Martin Albrecht, Damien Stéhle

## fplll: Implement transformation matrix recovery and LLL on Gram matrices

At the moment, fplll's LLL routines take as input a basis B of a lattice and return a reduced basis C of the same lattice. For some applications, it is interesting to also have access to the transformation matrix U such that C = B*U. One way is to compute U on the fly during the computations, another way is to compute U a posterio, from B and C. In theory, no method is superior to the other: in some contexts the first approach should be better, in others it should not.

In other applications, the input basis is only given implicitely via its Gram matrix G = B^T * B. The LLL algorithm can still be run on such inputs, to get the gram matrix C^T * C of the reduced basis C. Such a variant, taking as input a Gram matrix and returning another Gram matrix, is currently unavailable.

The project is to implement these extensions of the LLL algorithm.

mentors: Martin Albrecht, Damien Stéhle

## fplll: Sampling from Discrete Gaussian Distributions

Sampling from discrete Gaussian distributions over a lattice has many applications in cryptography. This project is about extending and unifying existing code for performing this operation and making it available in fplll.

An implementation for sampling from such distributions over the Integers is available here (in C99):

It doesn't implement all known algorithms, so it could be extended. For example,

are not implemented.

It could be re-implemented in C++. Alternatively, a C++ wrapper could be added so it's usable from fplll.

For sampling from such a distribution over arbitrary integer lattices an implementation is available, e.g. here:

This code would fit better into fplll and could be reimplemented there.

The same file also implements sampling over ideal lattices.

This part wouldn't fit that well into fplll because it relies on fast polynomial arithmetic. However, it it could be a stand alone library.

The idea of this project is a multi-project project:

1. enhance DGS and its Sage integration (cf. http://trac.sagemath.org/ticket/17764 )

2. Add C++ wrapper fro DGS

3. implement the GPV and Peikerts algorithm in fplll

4. Make liboz a stand alone library

mentors: Martin Albrecht, Damien Stéhle

## Implement BLAS wrappers in FLINT to improve linear algebra performance

At the moment, FLINT implements matrix multiplication using integer arithmetic. The goal of this project is to allow FLINT to use an external floating-point BLAS (Basic Linear Algebra Subprograms) library to multiply matrices over small finite fields, and by extension, integers, rationals, or large finite fields.

This should result in a significant speedup (likely a factor 2-4) for many linear algebra operations in FLINT, since modern BLAS implementations such as OpenBLAS are highly optimised and take advantage of features such as vector instructions for floating-point arithmetic on modern CPUs.

For best performance, it is likely that BLAS multiplication has to be combined with Strassen's algorithm and other tricks. Part of the project would be to write tuning code to determine cutoffs between different multiplication algorithms. After wrapping BLAS for matrix multiplication over finite fields, the student could improve higher-level linear algebra operations (such as matrix multiplication over the integers, or system solving) so that the BLAS is used as efficiently as possible.

More background can be found in the list of papers on this page

skills/prerequisites: C, basic linear algebra

mentors: Frederik Johansson, Mike Hansen

## Primality testing in Flint: the APR-CL algorithm

The Flint library currently has very efficient primality testing for integers up to 64 bits. It also has efficient algorithms for probable prime testing for large integers, e.g. the Miller-Rabin algorithm and the Baillie-PSW algorithm.

What is missing from flint is primality *proving* for large integers. We already have the Pocklington-Lehmer $$p-1$$ algorithm and the Morrison $$p+1$$ algorithm with various improvements due to Brillhart, Lehmer and Selfridge. But all of these assume that either $$p-1$$ or $$p+1$$ are "smooth" (products of lots of small primes).

To prove primality of more general large integers, there are two algorithms that are fast in practice (note AKS is not fast in practice), namely the ECPP and APR-CL algorithms. The first requires some advanced mathematics, including elliptic curves and complex multiplication. The APR-CL algorithm on the other hand, requires much less complicated mathematics, though it is a fairly involved algorithm.

APR-CL relies on something called Jacobi sums, which involve nothing more complex than roots of unity and modular arithmetic, though to understand how the APR-CL algorithm works requires some basic knowledge of finite fields and what a group is.

For this project, we'd like to implement APR-CL with the specific goal of proving primality of 100 digit primes in under 0.1 seconds. This will involve considerable work to first implement the algorithm, then optimise it.

A very nice elementary introduction to some of the ideas that go into modern primality testing can be found in the following article:

There are already a number of open source implementations of the APR-CL algorithm, for example:

We'd like to implement this algorithm in Flint, to take advantage of the very fast basic arithmetic implemented in the Flint library, but we want a truly independent implementation.

One significant issue with primality proving is that the code must be written exceptionally carefully. It is very easy to write a function which simply returns 1 whenever it is passed a number which is highly probably prime. It is much harder to implement the algorithm such that one can have a very high level of confidence in the output. This means lots of very carefully thought out testing and large amounts of code comments, checking for corner cases and careful code review. Understanding the algorithm itself is important for this.

skills/prerequisites: C, complex numbers, modular arithmetic

mentors: William Hart, Claus Fieker

## Integer factorisation in flint: the Lenstra elliptic curve method (ECM)

There are two main types of algorithms for factoring large integers (over 30 decimal digits, say). The first of these types of algorithm works for general large integers. Such methods generally use sieving to find "relations", e.g. the quadratic sieve and the general number field sieve. The amount of time these algorithms take depends on the size of the number n being factored.

The second type of algorithm looks for certain mathematical structures (known as groups) containing r elements where r is "smooth" (a product of lots of small primes, e.g. $$2\times 3^3 \times 7 \times 11\times 13$$, etc.), usually with arithmetic done modulo n, where n = pq is the number to be factored. The time these types of algorithms take depends on the size of the smallest nontrivial factor p of n, not on the size of n itself.

Flint already has a decent implementation of the $$p-1$$ and $$p+1$$ factoring algorithms, which work with finite fields whose "multiplicative group" has $$p-1$$ or $$p2 -1$$ elements The $$p+1$$ comes from the fact that $$p+1 = (p^2-1)/(p-1)$$. These algorithms only work when $$p-1$$ or $$p+1$$ happen to be smooth, where $$p$$ is the prime factor being sought. So, these algorithms can only factor certain integers fast.

A more general "smooth" factoring method is Lenstra's elliptic curve method. Although this relies on elliptic curves, it only uses the group law for an elliptic curve. This is easy to understand, since it can be written down explicitly in terms of polynomials. The basic idea of the Elliptic Curve Method is that instead of looking for $$p-1$$ or $$p+1$$ that is smooth, you look for an elliptic curve that has a smooth number of elements (a smooth group order). For each elliptic curve you try, you have an independent chance of finding one with a smooth order, and thus an independent chance of factoring the integer n (the elliptic curve arithmetic is done in $$Z/nZ$$, and if the algorithm is successful, a factor of $$n$$ results).

The ECM method has two stages, called stage 1 and stage 2, and there are various strategies for implementing both. The nice thing about stage 2 of ECM is that it can use most of the same stage 2 code that already exists in Flint for the p+1 method. What remains to be done is an efficient implementation of ECM stage 1 and a fairly straightforward port of the stage 2 code to the ECM method.

What we'd like to do is implement stage 1 ECM with the operations expressed in terms of Edwards coordinates (a way of describing the group law and points on the elliptic curve that has become popular in recent years due to the efficiency of such operations). One can already factor integers with a fast stage 1, but much larger factors can be found when it is combined with an efficient stage 2.

As a stretch goal, there are various other ways of implementing stage 2 that could be implemented in Flint. It would be interesting to implement some of these, though the main focus of the project should be on getting a highly optimised stage 1.

This project also has the opportunity for parallelising the implementation, since one can try multiple elliptic curves at once when looking for one with a smooth group order.

The ECM method is simple enough that it can be implemented easily in the Pari/GP interpreter, to gain understanding, then written up in C for an efficient implementation. We'd encourage such a GP implementation up front before embarking on a C implementation.

There already exists a very fancy open source implementation of the ECM method, in the GMP-ECM project:

But this is way more complicated than what we want in Flint. We think we can get to within about a factor of 2 of the GMP-ECM project, in performance, without writing too much code. Even this will be a massive benefit to flint. ECM factorisation is incredibly important for various other projects we currently have underway in the Flint project, e.g. for computing with number fields.

skills/prerequisites: C, basic definitions of finite fields and groups

mentors: William Hart, Claus Fieker

## Integer factorisation in Flint: the Self Initialising Quadratic Sieve

Some very basic algorithms for factoring integers begin by trying to express the number n to be factored as a difference of two squares. For example, Fermat's algorithm simply does a search for an $$x$$ such that $$x2 - n$$ is a square. If this is the case, then $$n = x2 - y2 = (x - y)(x + y)$$, which factors $$n$$.

Dixon's algorithm improves on this by working modulo n and trying to construct squares that are the same modulo n. For example, if $$x2 = y2$$ modulo $$n$$, then $$(x - y)(x + y)$$ is divisible by $$n$$. The chances are pretty high that either gcd$$(x - y, n)$$ or gcd$$(x + y, n)$$ will be a nontrivial factor of $$n$$.

To construct such values, one takes random values of x, squares them and reduces modulo n. If the resulting value is "smooth" (a product of small primes), we call the resulting expression a "relation". For example for $$n = 77$$ we have $$372 = 22 *3*5$$ modulo $$77$$.

A relation doesn't give a factorisation of the modulus n unless the right hand side is a square. But if we multiply enough relations together we can sometimes construct a square on the right hand side. (Working out how to do this given a bunch of relations requires some linear algebra.)

The quadratic sieve improves on this algorithm by "sieving" for relations, which is more efficient than picking random values of $$x$$.

The quadratic sieve itself has multiple improvements such as the single and double large prime variants (and a triple large prime variant for the die hard), a small prime variant, a multiple polynomial version, including a self initialising version of this (which speeds up the setting up step that you have to do for sieving).

Flint once contained a self initialising quadratic sieve with most of these optimisations, but we've since learned a lot about how this should be implemented. We have already implemented a version of the algorithm which works for small integers (up to 64 bits). But we'd like to port our old quadratic sieve to use all the new improvements. There are also some technical issues that need work, such as restarting efficiently if the algorithm fails to find a factor, and how to deal with duplicate relations and tuning for various architectures.

A specific aim will be to implement a really robust version of the quadratic sieve which can factor a 60 decimal digit number in about 8 seconds and can factor a 100 digit number in under a day, just on a single core. This is a highly nontrivial target, but one that can be achieved with the techniques we want to use. A simple quadratic sieve without all the improvements has no hope of reaching that goal.

Note that the linear algebra that is required for this algorithm is already implemented and highly optimised, so we don't need to do any work on that.

A stretch goal for the project would be to reimplement the quadratic sieve for small integers to *not* use self initialisation, but just multiple polynomials. The reason is that for very small integers, the self initialising version turns out to be very fragile and hard to tune.

The existing implementation of the quadratic sieve can be found in Flint-1.6. But the project is not a simple port to the Flint-2.x infrastructure. We want to completely reimplement it with a bunch of new tricks.

skills/prerequisites: C, modular arithmetic

mentors: William Hart, Claus Fieker

## Generic polynomial and matrix rings in Flint

The Flint library is a library for fast polynomial arithmetic and linear algebra, for doing computations in number theory. As such it has implementations of many specific rings, such as the integers, polynomials and matrices over the integers, the rationals, $$Z/nZ$$, etc.

However, in computer algebra, we often want to construct polynomials or matrices over other rings. In order to do this completely generally, we need a completely recursive and generic implementation of polynomials and matrix algorithms.

Typically this is the domain of higher level languages. But often users of flint are themselves writing a C library and want to do such "generic programming" in C.

Over the years, we've experimented with numerous approaches to generics in C, including C templating, typed unions, context objects and even a cached type system. Recently however, we've settled on a new approach which we'd like to develop a prototype for.

Each ring in the hierarchy would be represented by an object which contains various pieces of information about the ring, e.g. the PolyRing object would contain information about the base ring over which the polynomials are built, information about the variable names and maybe a textual string giving human readable information. It would also contain a long list of function pointers, corresponding to operations that can be performed with objects which belong to this ring.

Polynomials themselves would be a separate kind of object, a Poly, each of which would contain a pointer to the corresponding PolyRing. In this way, individual polynomials contain only information needed to represent them, such as a list of coefficients and a length, and one extra pointer, to the associated PolyRing.

Of course there'd be a setup for matrices, and the whole thing would be recursive, for example, allowing construction of matrices over polynomial rings.

We like this solution for C because it is easy to use, and at the same time very performant.

Much of the work in this project would be in implementing generic versions of algorithms for performing computations with polynomials and matrices. The main goal would be to cover arithmetic, but a stretch goal would be to implement more complex algorithms, such as determinants of matrices and GCD's of polynomials. Of course, not every algorithm is applicable in every ring, so some runtime selection of algorithms is necessary, based on properties of the ring.

skills/prerequisites: C, basic definitions of groups and rings and basic linear algebra

mentors: William Hart, Fredrik Johansson

## Matrices over Z/nZ in Flint

Flint currently provides implementations of many specific polynomial and matrix types for use in Number Theory. But there is one very significant and glaring omission: it doesn't provide a module for matrices over $$Z/nZ$$ for multiprecision n.

Whilst there is a module called nmod_mat for matrices over $$Z/nZ$$ for n that fits in a single machine word (32 or 64 bits), many of the optimal algorithms in the multiprecision case are just completely different! And flint always aims to implement optimal algorithms.

Many of the algorithms are described in the literature as algorithms for finite fields, since $$Z/nZ$$ is a finite field when n is prime (this is a special case of finite fields).

Part of the project would be to look for algorithms described for finite fields $$Z/nZ$$ where $$n$$ is large. We already have such algorithms implemented for polynomials over finite fields, so the ones we are interested in are for matrices over finite fields.

As a guide, we'd eventually like to implement all of the functionality that we already have for matrices over $$Z/nZ$$ with word sized $$n$$ (in Flint's nmod_mat module) but with algorithms and implementations optimised for multiprecision $$n$$.

In many case, e.g. addition, it is obvious what to do after reading the code for the nmod_mat module. In other cases, e.g. matrix multiplication, there are many different tricks that could be employed. The project would involve discussing the best algorithms on the flint list, then implementing the best suggestions.

As an example of the sorts of tricks that can be used for matrices with large coefficients, the IML library contains lots of great open source ideas.

As a stretch goal, it might be interesting to implement some parallelisation of the new fmpz_mod_mat module.

skills/prerequisites: C, modular arithmetic, linear algebra, multiprecision numbers

mentors: William Hart, Fredrik Johansson

## Fast Linear Algebra over Polynomial Rings in Flint

Linear Algebra, through the well known Gaussian elimination procedure is the foundation for all linear algebra based problems, thus for at least half of computer algebra (the other half is based on polynomial operations). While Gaussian elimination is easy to understand, it has a couple of problems: it is based on field operations (i.e. requires arbitrary division) and it suffers badly from intermittent coefficient swell - when applied to exact domains.

The goal of this project is to implement a fast, efficient, practical replacement for the Gaussian elimination operating only on polynomial matrices over finite fields (and possibly the rational numbers).

Such an implementation would have applications in number theory, applied curves used in cryptography and in coding theory. One often encounters such matrix problems in these fields.

While a range of division free elimination methods exists, the Popov based ones have the best complexity for elimination based problems with polynomials. This includes the determination of image, kernel, pre-image and determinant of suitable matrices but also applications usually based on the celebrated LLL algorithm.

The first step would be to implement the basic algorithm, which requires nothing but polynomial arithmetic. Then one could implement one or more of the advanced versions which incorporate fast matrix techniques to obtain asymptotically fast behaviour.

References:

Mulders/ Storjohann: T. Mulders and A. Storjohann, On lattice reduction for polynomial matrices. Journal of Symbolic Computation, 2003, v. 35(4), pp. 377-401 https://cs.uwaterloo.ca/~astorjoh/jscpopov.pdf

Storjohann/ Gupta: S. Gupta and A. Storjohann, Computing Hermite forms of polynomial matrices. Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC'11), ACM Press, 2011 https://cs.uwaterloo.ca/~astorjoh/issac11hermite.pdf

skills/prerequisites: C, polynomials, basic definitions of rings and modules

mentors: Claus Fieker and Tommy Hofmann

## Singular: autoporting tool for Singular to MSYS2 on Windows 64

Whilst some mathematical software builds on Windows 64 under MSYS2 (e.g. MPIR, MPFR, Flint, ...), many projects do not build there.

In some cases this is due to essential use of posix features, which are not available natively on Windows. But in other cases it is merely because an enormous number of largely routine, but messy patches need to be applied and maintained to support Windows 64.

These include use of a 64 bit type instead of unsigned and signed longs, format specifiers for 64 bit types, 64 bit integer constants, versions of printf, scanf, etc. that support the 64 bit types, etc.

Many upstream projects do not want to include such "hacks" to support Windows, especially as it would touch nearly every file in their repository and place extra burdens on their limited developer resources to maintain.

What we'd like to do is develop an autoporting tool, which given a checkout of the source tree of a project, would apply all the above routine changes to the source tree automatically before building. This would leave a relatively small number of other issues to address, which upstream projects are much more likely to accept as diffs to their source repository.

The Singular project would like to support Windows 64 in this way, and therefore we'd like to write a prototype of such a tool which works on the current Singular source tree.

skills/prerequisites: C, Windows development

mentors: Oleksandr Motsak, Hans Schoenemann

## FriCAS - Factorization of linear ordinary differential operators

Differential equations frequently appear in applications. Linear ordinary differential equations (LODE) with rational coefficients form an important subclass. Currently factorization plays a crucial role in search for solutions of LODE. It may reduce an equation to simpler equations. In many cases factorization finds Liouvillian solutions. More precisely, if a homogeneous LODE $$Lf = 0$$ where $$L$$ is a linear ordinary differential operator (LODO) has a Liouvillian solution, then it has a solution $$y$$ such that $$y'/y$$ is an algebraic function. If the solution can be written without using algebraic extensions, then $$y'/y$$ is equal to a rational function $$r$$. In such a case $$(\partial_x - r)$$ is a right factor of $$L$$.

FriCAS contains a procedure for factoring linear ordinary differential operators due to Manuel Bronstein. This procedure gets very slow when the order of an operator grows. Mark van Hoeij in his thesis gave an algorithm which should be more efficient. Key part of van Hoeij's algorithm is computing generalized exponents. Besides factorization, generalized exponents play an important role in the study of LODO, in particular they help to find invariants and algebraic solutions. They are also useful for searching non-Liouvillian solutions.

expected outcome: an implementation of Mark van Hoeij's algorithm in FriCAS (using Spad language), offering at least computation of generalized exponents given factoring routine for polynomials and "easy" cases of factorization given in chapter 3.5 of van Hoeij thesis.

skills/prerequistes: ability to quickly get productive in Spad language, strong math skills

mentors: Waldek Hebisch, Ralf Hemmecke

## FriCAS - Zippel style modular algorithms

The fastest way to solve several computer algebra problems is via modular algorithms. The core idea of modular algorithms is to compute homomorphic images of input data in some convenient small ring (finite fields $$F_p$$ being a popular example), solve it in small rings and then reconstruct the answer of the original problem from homomorphic images. An early example of a modular algorithm is Brown's polynomial greatest common divisor (GCD) algorithm. At the core of Brown's approach is the reconstruction of a polynomial $$P(x_1, x_2, ..., x_n)$$ from its values modulo $$p_1, p_2, ..., p_m$$ at points of a set $$S$$, where $$S$$ is the cartesian product of sets $$S_1, S_2, ..., S_k$$ each choosing values for a single variable. Consequently, $$|S| = |S_1||S_2|...|S_k|$$, i.e., the cardinality of $$S$$ is the product of the cardinalitues of the $$S_i$$'s. In the multivariate case this leads to a large $$S$$ and increases the cost of the algorithm. Zippel observed that in an important case when $$P$$ has a relatively small number of terms, a much smaller $$S$$ is sufficient for the reconstruction of $$P$$. In general, to reconstuct $$P$$ from values at points of $$S$$ requires solving a linear system of size $$|S|$$, which may be expensive. Also evaluating the input at $$S$$ may be expensive. However, if $$S = {x_0, x_1,...,x_t}$$ is chosen in special way, one can make the evaluation of the input and the reconstruction much cheaper. In particular, one can put $$x_i = ab^i$$ where product and power is coordinatewise. If the coordinates of $$b$$ are roots of unity in $$F_p$$ then one can use discrete Fourier transform over $$F_p$$ for evaluation and reconstruction.

Currently FriCAS contains a p-adic polynomial GCD routine which does not work in the case of algebraic coefficients and a Brown style GCD routine for polynomials with algebraic coefficients. Parts of the Brown style GCD routine like GCD over finite rings should be reusable for Zippel style GCD. We need evaluation/reconstruction routines and a driver routine for overall control.

expected outcome: Zippel style modular implemetation of GCD for multivariate polynomials with algebraic coefficients in FriCAS. The code should be in the Spad language. Ideally Zippel part should be reusable for other problems (for example solving linear systems with polynomial coefficients).

skills/prerequisites: ability to quickly get productive in the Spad language, strong math skills

skills/nice-to-have: modular algorithms

mentors: Waldek Hebisch, Ralf Hemmecke

References:

• Javadi, Monagan, A Sparse Modular GCD Algorithm for Polynomials over Algebraic Function Fields
• Kleine, Monagan, Wittkopf, Algorithms for the Non-monic Case of the Sparse Modular GCD Algorithm
• van Hoeij, Monagan, Algorithms for Polynomial GCD Computation over Algebraic Function Fields
• Zippel, Probabilistic algorithms for sparse polynomials

## Fast Linear Algebra over Extension Fields

Many algorithms in computational algebra use linear algebra routines at the core. Implementations of asymptotically fast matrix-matrix multiplication or linear system solving (nullspace computation) over various domains are an essential part in the performance of other algorithms.

Open source libraries like FFLAS-FFPACK provide asymptotically fast exact dense linear algebra routines over finite fields $$F_p$$ for word sized primes $$p$$. However, for extensions of finite fields $$F_{p^k}$$, fast implementations are relatively absent from the open-source world.

A generic data structure which treats matrices with polynomial entries as polynomials with matrix entries can provide the basis for fast implementations of linear algebra over non-prime finite fields $$F_{p_k}$$ or univariate polynommials over finite fields $$F_p[x]$$. This approach reduces matrix-matrix multiplication over these domains to several matrix-matrix multiplications over the prime field $$F_p$$, for which fast linear algebra implementations already exist.

A proof of concept Python implementation can be found here. There are already experimental implementations of this approach in Sage (see Sage trac ticket #12177) and LinBox (see this commit). For large prime fields we can use an evalutation/pointwise-multiplication/interpolation approach and for smaller fields Karatsuba like formulas can be used.

expected outcome: an implementation of this strategy in LinBox offering at least martrix-vector multiplication, matrix-matrix multiplication, addition/subtraction, submatrices, "matrix windows", columns/row swapping, comparison cubic Gaussian elimination. If there is time, asymptotically fast Gaussian elimination would be the next step.

References:

skills/prerequisites: C/C++, basic linear algebra and polynomial algebra algorithms (cubic matrix-matrix multiplication, cubic Gaussian elimination).

skills/nice-to-have: Karatsuba's algorithm, Toom-Cook algorithm, Strassen's algorithm, PLE decomposition. The focus, however, is on decent C++ skills and a basic command of linear algebra. The mentors certainly can help to break down the mathematics involved.

mentors: Martin Albrecht, Pascal Giorgi, Jean-Guillaume Dumas

## LinBox: implementation of High Order Lifting

High order lifting technique is one of the key ingredient of fast linear algebra over integers or polynomials. Indeed, it allows for instance to solve a linear system of equations with univariate polynomials (or integers) within the same cost as matrix multiplication. It is also one on the main ingredient for fast row reduction of polynomial matrices.

Exact linear algebra libraries, such as IML and LinBox, do not yet provide an implementation of high order lifting and therefore cannot benefit from these most efficient algorithms. Recent work within LinBox and FFLAS-FFPACK made available efficient matrix multiplication codes for

• both polynomial matrices and multiprecision-integers matrices. Thanks to this effort, it becomes interesting to build high order lifting code on top

of such matrix multiplication codes.

The goal of this project is to provide an efficient implementations of high order lifting techniques in LinBox. As a first task, standard lifting algorithms will be reviewed and their implementation will be done (or updated if already available). Then, most of the time of this project will be to design code for high order lifting that is:

• efficient
• generic (according to the underlying ring, i.e. integers or univariate polynomials)
• re-usable within other high level algorithms

A real effort on testing will be done to ensure the robustness of the implementation.

Depending on the ability to achieve this project, one could also extend this work to incorporate either parallelism (i.e. OpenMP/ Intel TBB) or provide a variant of the code compliant with the online/relaxed computation model.

References:

skills/prerequisites: C/C++, linear algebra, polynomial/integer arithmetic.

mentors: Pascal Giorgi, Romain Lebreton

## GPU acceleration for dense/sparse matrix multiplication on finite fields

Fast exact dense linear algebra on finite fields is the core of the C++ library FFLAS-FFPACK [1]. More importantly algorithms therein rely on the efficiency of matrix/matrix multiplication and matrix/vector multiplication. The numerical (sparse) BLAS are the building blocks underlying these algorithms.

Recently, a lot of effort has been put into re-factoring the dense code and introducing sparse matrix formats and operations. On the one hand, there is now a clean and efficient implementation for both sequential and shared memory matmul routines. On the other hand, using GPU acceleration (OpenCL) for computations over $$F_p$$ was introduced in LinBox [3]. The goal here is to make use of fast numerical GPU BLAS libraries (cuBLAS, cuSPARSE). Also, it would be nice to import/implement openCL fall back routines in FFLAS-FFPACK.

A first project would consist in using the library for the dense/sparse matrix multipliation operations and write a $$fmod$$ operation for the GPU in cuda/opencl. A little more challenging first project would consist in moving the matrix multiplication OpenCL code from LinBox to FFLAS-FFPACK.

Depending on how this project goes and the goals of the student, it would also be interesting to add an offloading to the GPU mechanism to the existing multi-threaded code.

skills/prerequisites: C/C++, basic linear algebra routines, Cuda or OpenCL

mentors: Brice Boyer, B. David Saunders

References:

• [1] "Dense Linear Algebra over Finite Fields: the FFLAS and FFPACK packages." J.-G. Dumas, P. Giorgi and C. Pernet. pdf

• [2] "Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures" Brice Boyer, Jean-Guillaume Dumas and Pascal Giorgi pdf

• [3] "Dealing with performance/portability and performance/accuracy trade-offs in heterogeneous computing systems: a case study with matrix multiplication modulo primes" Matthew Wezowicz, B. David Saunder and Michela Taufer pdf