High-throughput DNA sequencing of immune repertoires generates large amounts of data. A life scientist interested in extracting information from this data typically needs to apply several processing and manipulation steps, often involving different software tools chained together in a workflow (sometimes also called a pipeline). As a relatively new field, there is no established best practice for immunogenomic workflows: in this internship you will investigate and build an efficient and scalable workflow for immunogenomics.
The internship will include the following activities:
o State-of-the-art survey
o Hands-on comparison of various workflow management systems on real immunogenomic data
o Depending on the length of the internship, prototype implementation of an analysis pipeline for high-throughput DNA analysis of immune repertoires
Requirements:
o Familiarity with docker containers
o Proficiency with an object-oriented programming language, preferably Python