Título: | PREDICTING DRUG SENSITIVITY OF CANCER CELLS BASED ON GENOMIC DATA | ||||||||||||
Autor: |
SOFIA PONTES DE MIRANDA |
||||||||||||
Colaborador(es): |
JULIA LIMA FLECK - Orientador STEPHEN R. PICCOLO - Coorientador |
||||||||||||
Catalogação: | 22/ABR/2021 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=52348&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=52348&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.52348 | ||||||||||||
Resumo: | |||||||||||||
Accurately predicting drug responses for a given sample based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this dissertation, two case studies were generated, each applying different genomic data to predict drug response. Case study 1 evaluated DNA methylation profile data as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer (GDSC) database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble- and distance-based approaches. By applying artificial subsampling in varying degrees, this research aims to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Case study 2 evaluated RNA-seq data as one of the most popular molecular data used to study drug efficacy. By applying a semi-supervised learning approach, this research aimed to understand the impact of combining labeled and unlabeled data to improve model prediction. Using genome-wide RNA-seq labeled data from an average of 125 AML tumor samples in the Beat AML database (varying by drug type) and 151 unlabeled AML tumor samples in The Cancer Genome Atlas (TCGA) database, we used a semi-supervised model structure to predict cytotoxic responses for four anti-cancer drugs. Semi-supervised models were generated, while assessing several parameter combinations and were compared against supervised classification algorithms.
|
|||||||||||||
|