Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: IMPROVING TEXT-TO-IMAGE SYNTHESIS WITH U2C - TRANSFER LEARNING
Autor: VINICIUS GOMES PEREIRA
Colaborador(es): EDUARDO SANY LABER - Orientador
JONATAS WEHRMANN - Coorientador
Catalogação: 06/FEV/2024 Língua(s): ENGLISH - UNITED STATES
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=65990&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=65990&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.65990
Resumo:
Generative Adversarial Networks (GANs) are unsupervised models that can learn from an indefinitely large amount of images. On the other hand, models that generate images from language queries depend on high-quality labeled data that is scarce. Transfer learning is a known technique that alleviates the need for labeled data, though it is not trivial to turn an unconditional generative model into a text-conditioned one. This work proposes a simple, yet effective fine-tuning approach, called Unconditional-to-Conditional Transfer Learning (U2C transfer). It can leverage well-established pre-trained models while learning to respect the given textual condition conditions. We evaluate U2C transfer efficiency by fine-tuning StyleGAN2 in two of the most widely used text-to-image data sources, generating the Text-Conditioned StyleGAN2 (TC-StyleGAN2). Our models quickly achieved state-of-the-art results in the CUB-200 and Oxford-102 datasets, with FID values of 7.49 and 9.47, respectively. These values represent relative gains of 7 percent and 68 percent compared to prior work. We show that our method is capable of learning fine-grained details from text queries while producing photorealistic and detailed images. Our findings highlight that the images created using our proposed technique are credible and display a robust alignment with their corresponding textual descriptions.
Descrição: Arquivo:   
COMPLETE PDF