Título: | IMPROVING TEXT-TO-IMAGE SYNTHESIS WITH U2C - TRANSFER LEARNING | ||||||||||||
Autor: |
VINICIUS GOMES PEREIRA |
||||||||||||
Colaborador(es): |
EDUARDO SANY LABER - Orientador JONATAS WEHRMANN - Coorientador |
||||||||||||
Catalogação: | 06/FEV/2024 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=65990&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=65990&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.65990 | ||||||||||||
Resumo: | |||||||||||||
Generative Adversarial Networks (GANs) are unsupervised models that
can learn from an indefinitely large amount of images. On the other hand,
models that generate images from language queries depend on high-quality
labeled data that is scarce. Transfer learning is a known technique that alleviates the need for labeled data, though it is not trivial to turn an unconditional
generative model into a text-conditioned one. This work proposes a simple,
yet effective fine-tuning approach, called Unconditional-to-Conditional Transfer Learning (U2C transfer). It can leverage well-established pre-trained models
while learning to respect the given textual condition conditions. We evaluate
U2C transfer efficiency by fine-tuning StyleGAN2 in two of the most widely
used text-to-image data sources, generating the Text-Conditioned StyleGAN2
(TC-StyleGAN2). Our models quickly achieved state-of-the-art results in the
CUB-200 and Oxford-102 datasets, with FID values of 7.49 and 9.47, respectively. These values represent relative gains of 7 percent and 68 percent compared to prior
work. We show that our method is capable of learning fine-grained details from
text queries while producing photorealistic and detailed images. Our findings
highlight that the images created using our proposed technique are credible
and display a robust alignment with their corresponding textual descriptions.
|
|||||||||||||
|