Example tasks

Category Theory

Determine if FinStoch has all colimits.

Find a good category of stochastic maps to model our system.

Implement the data structures for a free Markov category.

Verify that our system satisfies the axioms of monoidal functors.

Teach the Yoneda lemma to developers and statisticians.

Dynamical Systems

Given a discrete-time dynamical system defined on a probability simplex with the update function given by a rational function, numerically identify all or at least some steady states of the system. Write a report emphasising which approaches are feasible for determining the steady states numerically and analytically, depending on the dimensionality of the system and the degree of the polynomials involved.

Given a certain class of high-dimensional non-linear dynamical systems, write a report summarising how dimension-reduction methods such as the centre manifold theory, dynamic mode decomposition, and Koopman linear embeddings can help us describe the dynamics of the system in a tractable way. Include a section on how we can leverage machine learning techniques to employ these methods efficiently.

Find a (complete) Lyapunov function for a simple discrete-time dynamical system.

Given a couple discrete-time dynamical systems, analyse the behaviour of the dynamical system in which the update functions are applied in sequence. For instance, identify steady states and explore which properties of the single systems carry over to the sequence system.

Fullstack

Design and implement a solution to allow users to upload large file attachments for inclusion in queries; including changes to the front-end, persistent storage of uploads, changes to the data flow to the back-end, back-end retrieval of uploaded files, and management of user data life cycles.

Design and implement a solution to support collecting user feedback. This includes building a UI for users to submit the feedback, an admin interface to view feedback, designing tables and indexes in Postgres to efficiently support reading and writing user feedback, and implementing endpoints with strong access control to enable reading and writing feedback.

Migrate our request-response architecture (front-end, back-end and infrastructure) to one which can stream intermediate results to the front-end, while full responses are being constructed.

Design and implement infrastructure to run an API Service; including creating a Docker image, setting up ECS and related infrastructure resources for the service, and setting up a CI/CD pipeline to build and deploy the service.

Infrastructure

Design a solution in AWS that would register IP addresses of users using a REST API endpoint and put them into the DynamoDB database. Code your solution in any modern scripting language, and create a Terraform module encapsulating it.

Create a custom Prometheus metric that would monitor execution time of part of an AWS Lambda function. Create an alert that would trigger if the time exceeds certain thresholds.

Design a GitLab CI/CD pipeline that automatically builds, tests, and deploys a containerized REST API to a Kubernetes cluster using Helm. The pipeline should include stages for building the Docker image, running tests, pushing the image to a container registry, and deploying to different environments (dev, staging, production) with environment-specific configuration. Provide the necessary Kubernetes manifests, Helm chart, and a sample .gitlab-ci.yml configuration.

Natural Language Processing

Given an initial dataset of (natural language, domain-specific language) pairs:

Create a procedure for training a model to translate from one language to another
Provide metrics to assess the quality of the trained model
Propose ways to enrich the dataset

Log parameters and experiment metrics to Weights&Biases. Use this data to perform an error analysis.

Fine-tune open-source LLMs for code generation:

Prepare the machine in AWS for fine-tuning
Fine-tune the model using llama.cpp
Quantize the model
Publish the model to HuggingFace

Develop a procedure for handling typos in the natural language queries, taking into account the ambiguity of the text presented by the users.

Collect data from public sources, process it and store it in HuggingFace.

Develop a procedure to classify queries from users given a certain set of labels.

Given the name, input and output types of a function, train a model to produce a sketch of an implementation of the function in some programming language.

Probabilistics

Given a dynamical system in a form of a black-box program whose parameters live on a unit simplex, how would you analyse convergence, steady states, and stability of the system?

Given program code that implements a random sampler, draw a Bayesian network and write down density equations that characterise the distribution the sampler draws from. The aim of this formalisation is to increase the understanding of the code and to uncover possibilities for generalisation and optimization.

Given a recursive data structure D that represents joint distributions and a function that generates probabilistic programs from D, identify in which cases such distributions can be enumerated analytically.

Under what conditions can we tractably find MAP in non-selective Sum-Product Networks?

What bounds on expressivity of SPNs can we put for various leaf distributions?

How can we decompose inference problems in probabilistic programs?

How to define probabilistic computations for heterogeneous samples (of different types)?

Symbolics

Design an efficient search algorithm on a graph. Nodes are different bracketings of the same expression abcdef… and edges are due to the associative property (xy)z=x(yz) applied to strictly 3 terms at a time. The shortest path problem is known to be NP-complete, so how do we find an epsilon-approximate algorithm?

Implement a testing utility to conveniently and tersely specify integration tests. It should be possible to aggregate different test sets together under a parent test set, possibly leveraging the filesystem to specify split test sets in different files. These tests should then be runnable, such that errors thrown during tests do not halt the test suite, and are reported to the user. A final score based on failures and successes of tests should be indicative of the performance of the system to test.

Extract a data-flow representation from a program based on its (unstructured) control-flow graph optionally using the data-centric RVSDG intermediate representation. The program can be assumed to have no side effects (purity), such that it only contributes to the production of final values. This work will however be used for further introspection of programs, and not for executing optimising compiler passes.

Tooling

Propose an automated update process for the Julia version used on CI across multiple repositories in a way to ensure sufficient testing and requires minimal manual effort, if no errors arise.

Automatically close tasks linked to a merge request once the merge request has been merged.

UI

Develop new visualisation features in custom layout algorithms, using d3 for rendering interactive parts of the diagram. Implement a mechanism to allow the user to replace a simplified part of the diagram with a more detailed version.

Prepare and validate data from the API, which is highly complex JSON, involving multiple pre-processing steps and inference of missing values.

Develop a custom display for various data-types returning from the API, for example images, histograms or circuit diagrams, building generic react components for each different type.