Computer Science Research
Jamie is currently a Post-doctoral Research Associate in the SCALE Lab at Imperial College London.
SCALE Lab's focus is on the management and processing of data in general and HPC data analytics, data visualisation, spatial data, indexing, new hardware for data processing and novel storage technology.
At SCALE lab I work on game-changing DNA storage technology (storing digital data within DNA) and investigating viral insertions into the 100,000 Genomes project and developing high-parallel bioinformatics pipelines to run on HPC (High-Performance Computing) platforms.
Prior to that he worked as a Scientific Programmer (Research Software Engineer) at the
Institute of Cancer Research, London.
Jamie undertook and completed his Doctorate at Royal Holloway University of London where he conducted research in the Centre for Systems and Synthetic Biology (CSSB) and department of Computer Science.
A list of Jamie's research publications can be found under the papers section.
After completing his Ph.D. Jamie worked at the ICR (Institute of Cancer Research) where he provided consulting to support researchers in software engineering, High-performance computing (HPC) and Workflow languages.
Design, Chemical, Pharmacological research
Jamie's doctoral thesis (Ph.D supervised by Dr. Hugh Shanahan) explored the analysis of high-throughput biological datasets using distributed computing, particularly sequencing data produced by high-throughput technologies, which is increasing at an unprecedented scale. As a result of these technological advancements, large, complex data sets are routinely deposited in public archives such as the SRA (Sequence Read Archive) - as of January 2017 the SRA alone contains over a Petabyte of data. Jamie conducted a detailed literature review into biochemical protocol steps applied in preparing nucleic acid samples for sequencing. His thesis describes, in detail, bias that can be introduced at the molecular level of sequencing workflow steps. This work was published in a GigaScience paper: Investigation into the annotation of protocol sequencing steps in the sequence read archive
In this work Jamie also explored sequencing metadata by applying advanced data-mining techniques and SQL (Structured Query Language). This quantified the level of annotation in 29,958 experiments deposited in the SRA by searching for keywords in meta-data annotation of key protocol steps. He found that only 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively) had at least one keyword corresponding to one of the three protocol steps. Only 5.58% of all top-level SRA experiment records had annotation for all three steps.
Jamie's thesis also focused on applying distributed computing to tackle the processing of such large datasets. His thesis reviewed various types of distributed and high-performance computing, namely batch-scheduled computing, Hadoop MapReduce, Spark and MPI (Message-Passing Interface). This was published in Oxford University Press - Briefings in Bioinformatics paper: The application of Hadoop in Structural Bioinformatics. PDB-Hadoop was developed with MapReduce, allowing for user-defined operations (the thesis explored structural analysis and molecular docking jobs) to be carried out, in a high-throughput parallel fashion, on the entirety of the PDB (Protein Data-Bank).
PDB-Hadoop was presented at the ISCB 3DSig Structural Bioinformatics and Computational Biophysics conference 2015 in Dublin, Ireland.
Hadoop MapReduce was demonstrated in benchmarking to be competitive against batch-scheduled computing, and at the time the development of Spark enabled 2-3 orders of magnitude gains in through-put through optimisations such as in-memory caching and lazy-executions. Jamie decided to apply MapReduce on Spark to the processing of typically large short-read RNA-Seq datasets. Given the poor lack of annotation observed in the SRA, Jamie developed the above-mentioned analysis system named Hercules to quantify sequence-specific deviations in the distribution of mapped RNA-Seq reads. The distributed method uses intra-exon motif correlations, and is explained in a Journal of Integrative Bioinformatics paper: A novel method to detect bias in Short Read NGS RNA-seq data, and in more depth for the computer-science employed, in a International Journal for the Foundations of Computer Science paper: Transcriptomics: Quantifying non-uniform read distribution using MapReduce..
After completing his masters, Jamie assisted research efforts at the Biomedical Sciences department of
St. Georges Hospital Medical
School working with Prof. Brian Austen (chair of the Alzheimer's association)
on developing neuroprotective drug molecules for neurodegenerative disorders.
The current drugs under development are Peptide based PDZ binding ligands
Jamie-3-Asn and Jamie-3-Glu based on the work of Pizerchio and Spaller et al and
Austen et al.
of this work involves using immunohistochemistry to label and visualise the
peptide drugs within the cells (The
gallery hosts some images from this process). Various cellular components
and a location on the Jamie-3 drug molecule are labelled with antibodies and
fluorescent dyes which fluoresce (glow) on application of laser light at a
particular wavelength. This allows us to visualise the penetration and the co-localisation
of the drug with the other stained cellular components by virtue of the fluorophores responding to different wavelengths of light, thereby producing different colours.
This work has been published in an American Journals of Chemistry paper:
Cyclisation of Cell-Penetrating PDZ-Binding Peptides Directed to PSD95. It was also presented as a poster at the RSC (The Royal Society
and Peptide Science Group Early Stage Researcher Meeting in November 2011
and subsequently at St.
Georges Research Day November 2011