Linsey Raaijmakers

Universiteit Utrecht

Share this project

Publication date: 20 december 2017

University: Universiteit Utrecht

From data to biological knowledge

Summary

The fourth chapter describes the role of the four PTMs phosphorylation, ubiquitination, acetylation and methylation in melanoma treatment resistance. PTMs are frequently described for their role in cell signaling and disease. In this work, we combine data from these PTMs as well as mutational analysis obtained from RNA-sequencing. It highlights the complexity of melanoma resistance in protein signaling and gives insight in how PTMs and specific protein mutations collaborate in creating a more invasive tumor state by signaling via RHO GTPases and actin dynamics.

Chapter five is dedicated to the work on protein complexes that I performed together with Renske Penning. Most cellular proteins are involved in different protein complexes, where they can be part of stable core subunits or function as different interaction partners for protein complexes. Thereby giving the individual proteins a potentially different biological task depending on the interaction they are involved in at a specific time or/and cellular location. By the use of size exclusion chromatography (SEC) we separated these protein complexes on their size, and with extensive data analysis steps for data normalization and correlation profiling, we identified protein complex dynamics between different cell cycle stages.

Dit boek beschrijft mijn werk van de afgelopen 4 jaar in het lab van Albert Heck onder de begeleiding van Maarten Altelaar. Tijdens mijn onderzoek heb ik gewerkt met verschillende soorten data, vooral voor het onderzoek naar de resistentie die optreedt bij de behandeling van melanoma. De hoofdstukken in dit boek beschrijven in grote lijnen de verschillende projecten waarin ik een rol heb gespeeld.

In het eerste hoofdstuk beschrijf ik de achtergrond van deze projecten en de technieken die gebruik zijn om de resultaten te verkrijgen. De basis van de massa spectrometrie is beschreven en de vervolgstappen voor de data analyse. Omdat de focus van de meeste projecten bij het onderzoek naar melanomen ligt en de rol van post-translationele modificaties, worden deze delen meer uitgebreid beschreven.

Het tweede hoofdstuk beschrijft het eerste project waaraan ik heb gewerkt. Doordat experimentele technieken steeds verder verbeteren wordt er steeds meer data gegenereerd. Om op een overzichtelijke manier de biologische relevante processen uit deze data te halen is het belangrijk dat deze complexe data duidelijk gevisualiseerd kunnen worden. De methodes die hiervoor gebruikt werden waren vooral gericht op genomics data terwijl de enorme groei in phospho-proteomics experimenten vraagt om een andere manier van visualisatie en data analyse. De tool die ik hiervoor heb ontwikkeld is PhosphoPath. PhosphoPath is een plugin dat gebruikt kan worden in Cytoscape, een software dat vaak gebruikt wordt voor netwerk visualisaties. Het berekent de relevantie van biologische pathways en visualiseert interacties tussen eiwitten over verschillende tijdspunten en condities. Om de functionaliteiten van PhosphoPath te tonen hebben we gebruik gemaakt van een kwantitatieve dataset van melanoom cellen uit cel cultuur vergeleken met xenografts.

In het derde hoofdstuk beschrijf ik de uitdagingen die er zijn in het onderzoek naar de resistentie die optreedt bij de behandeling van melanomen. Over de jaren is de behandeling van deze kanker en andere kanker soorten erg verbeterd. Naast een algemene behandeling, worden patienten nu vaker behandeld met een specifieke behandeling tegen een eiwit of gedereguleerd proces in de tumor cellen. Hoewel ook voor melanomen deze specifieke behandeling initieel goede resultaten geeft, treedt na een aantal maanden resistentie op en komen de tumoren weer terug. Deze resistentie die optreedt wordt vaak veroorzaakt door deregulatie in de processen die een rol spelen in de regulatie van cel deling. Het is bekend dat deze deregulatie optreedt, maar hoe dit precies gebeurd, welke eiwitten hierbij een rol spelen en hoe we deze patienten hier tegen kunnen wapenen is nog steeds niet bekend. In dit hoofdstuk bestuderen we deze resistentie en de rol van verschillende eiwit modificaties hierin. We hebben gebruik gemaakt van cellen van een patient voor de behandeling, tijdens de behandeling en nadat de resistentie opgetreden was. De eiwitten uit deze cellen zijn vergeleken tussen de drie condities en de regulatie van deze modificaties. Naast kwantitatieve proteomics data hebben we ook gebruik gemaakt van transcriptomics (RNA) en metabolomics. De resultaten toonden de complexiteit van deze resistentie aan en de rol van verschillende biologische processen hierin.

In het vierde hoofdstuk ga ik verder in op de rol van de verschillende post-translationele modificaties op de eiwitten tussen de 3 condities uit hoofdstuk 3. De modificaties die we bestudeerd hebben zijn: ubiquitinatie, acetylatie, methylatie en phosphorylatie. Naast deze modificaties heb ik ook gebruik gemaakt van mutatie data op RNA niveau en heb deze data gebruikt om de translatie van deze mutaties te zoeken in de eiwit data uit de massa spectrometer. Dit hoofdstuk toont verder de complexiteit van de tumor cellen weer die ook in hoofdstuk 3 in getoond op eiwit niveau. We vonden hier vooral deregulatie op verschillende RHO GTPase signaling terug.

In hoofdstuk 5 beschrijf ik het werk naar eiwit complexen dat ik heb gedaan samen met Renske Penning. Het overgrote deel van de eiwitten maakt deel uit van verschillende eiwit complexen waarin ze steeds verschillende functies kunnen vervullen. Doordat ze individuele eiwitten op deze manier een andere functie kunnen hebben wanneer het eiwit complex veranderd, is het belangrijk om eiwitten niet op een individuele basis te bestuderen, maar ook op de interactie met andere eiwitten in een complex vorm. Conventionele methodes bestuderen eiwit interacties door gebruik te maken van bijvoorbeeld antilichamen, maar deze methodes geven geen informatie over de dynamiek van deze complexen. In dit hoofdstuk hebben we eiwit complexen bestudeerd met behulp van size-exclusion chromatografie tussen verschillende stages van de cel cyclus. Door het correleren van individuele eiwit profielen kunnen co-eluerende eiwitten worden geïdentificeerd die mogelijk in hetzelfde eiwit complex een rol spelen.

Future outlook

In this thesis I described the work I performed in MS-based proteomics data analysis in order to better understand therapy resistance in melanoma and the complex nature of signaling pathways and protein complexes. Over the years, efforts have been taken in revealing the mutational landscape of melanoma, which led to the identification of mutations in several signaling pathways. Combining mutational analysis with RNA expression levels from exome-sequencing or RNA sequencing further revealed differences in RNA expression in melanoma cell lines upon treatment resistance. Although these results highlighted important deregulated signaling pathways, studies have also shown that RNA expression levels are a poor estimate of protein abundance. Alternative splicing, post-translational modifications and protein half-lives cause discrepancies between RNA expression levels and protein abundances. To get a full picture of protein signaling, transcriptome analysis has to be combined with proteomics. Although analytical developments in MS-based proteomics have progressed tremendously over the last years, several issues still have to be addressed.

Handling of bias

Mass spectrometers have become more sensitive and thereby the run-to-run variability is decreased, but appropriate handling of bias in proteomics datasets is still an issue. Also, in cases where applying labeling techniques is not feasible, discriminating changes in the levels of proteins of interest from changes in the amount loaded for analysis is an important step in data analysis. The process of removing this bias is called normalization. Most normalization methods used at this time are taken from the microarray field. The source of bias depends which technique is most appropriate, but often the source is unknown. While for microarray data analysis studies have compared these normalization methods, for proteomics there is no real standard. Besides these techniques from microarray data, now also methods are taken from western blot analysis where ubiquitously expressed proteins, such as β-actin, β-tubulin, and glyceraldehyde phosphate dehydrogenase (GAPDH) often serve as a loading control, because their levels remain unchanged. This idea is now also employed in MS-based proteomics, where analysis of several datasets led to the identification of ubiquitously expressed proteins, possibly to serve as more accurate normalization controls1.

Missing values

Another issue in data analysis is the handling of missing values. In LC−MS/MS experiments, the number of missing values frequently ranges between 10 and 50%, with a proportion between 70-90% that exhibits at least one missing value2. The MaxQuant search engine has worked on this by implementing the ‘match between runs’ algorithm, where peaks are matched between the different runs. Setting up a different experimental method, where besides measuring separate runs, also samples are pulled in order to obtain higher abundant proteins to match against, can be useful for lowering the number of missing values. Unfortunately this still results in missing identifications and ultimately in a lower number of quantified proteins on which statistical analysis can be performed. To deal with this data, missing values can be imputed. This can be based on the measured proteomics data, but when proteomics is combined with transcriptomics, RNA expression levels can be used to predict protein abundances. To be able to do this, due to the relatively poor correlation between these two levels of biological regulation, a gene-specific RNA-to-protein (RTP) conversion factor can be introduced, enhancing the predictability of protein abundance from RNA levels3.

Post-translational modifications

An extra level of regulation that can be studied by proteomics is of post-translational modifications (PTMs). Enrichment methods have been improved over the years, leading to a large increase in the number of identified PTMs. But while the number of identified modifications is increasing rapidly, for more than 95% of these modifications no functional relevance is reported. Bioinformatics approaches are taken to predict functionality and dependencies between PTMs. Predictive models based on sequence motifs and protein-protein interactions give a probability of kinase-substrate relationships4, but this still gives no further information on the functional importance. Structural features of proteins have been demonstrated to be predictive of PTM functionality, but the availability of 3D structures in the Protein Data Bank (PDB) limits the scope of these methods. To be able to handle the large amount of this data and use it for better understanding of biological processes, more advanced bioinformatics tools should be developed to predict functionality and store the information in publically available databases. These tools should preferably integrate the available data on structural information, sequence conservation, PTM crosstalk and occurrences in specific diseases or tissues.

Integration of different -omics datasets

Combining data from different -omics fields can aid in a better understanding of biological processes. Novel splice variants or mutations in the protein sequence can be predicted from RNA-sequencing data. Including these aberrations in the MS search engine will potentially lead to the identification and confirmation of those predictions. Unfortunately, there are still some challenges that have to be tackled. Including predicted sequences and splice-variants in the sequence database will increase its size and with that the search time and the probability of retrieving false identifications. To reduce the search space again, acquired MS spectra can first be searched against the ‘known’ protein sequence database and only the unmatched spectra can then be used for the second search against predicted sequences5. Another way is to use the transcriptome data to identify which genes are not expressed and remove their corresponding protein sequences from the database. Another issue is the dynamic range of the peptides. Splice variants or peptides harboring mutations might be low abundant in reference to their wild-type peptide. These low abundant variant peptides are hard to detect. A way to deal with this is by using a more targeted approach by for example SRMS. To be able to analyze and integrate these datasets, bioinformatics tools should be developed that are easy-to-use and efficient. In the genomics field, tools for data analysis are often implemented in the Galaxy bioinformatics framework7. In Galaxy, sophisticated workflows can be created for standardized data analysis where multiple analysis tools can be combined. Lately, also tools started to be implemented for the analysis of proteomics datasets, which can be combined in an integrated fashion with the transcriptome/genome analysis which simultaneously can be performed in Galaxy8. Although the proteomics part of Galaxy, called Galaxy-P is still in its infancy, the control and reproducibility of multi-omics data analysis can be improved by the development and use of the Galaxy framework in the future.