Título: | FCGAN: SPECTRAL CONVOLUTIONS VIA FFT FOR CHANNEL-WIDE RECEPTIVE FIELD IN GENERATIVE ADVERSARIAL NETWORKS | ||||||||||||
Autor: |
PEDRO HENRIQUE BARROSO GOMES |
||||||||||||
Colaborador(es): |
MARCELO GATTASS - Orientador |
||||||||||||
Catalogação: | 23/MAI/2024 | Língua(s): | PORTUGUESE - BRAZIL |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=66801&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=66801&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.66801 | ||||||||||||
Resumo: | |||||||||||||
This thesis proposes the Fast Fourier Convolution Generative Adversarial
Network (FCGAN). This novel approach employs convolutions in the frequency
domain to enable the network to operate with a channel-wide receptive field.
Due to small receptive fields, traditional convolution-based GANs struggle
to capture structural and geometric patterns. Our method uses Fast Fourier
Convolutions (FFCs), which use Fourier Transforms to operate in the spectral
domain, affecting the feature input globally. Thus, FCGAN can generate
images considering information from all feature locations. This new hallmark
of the network can lead to erratic and unstable performance. We show that
employing spectral normalization and noise injections stabilizes adversarial
training. The use of spectral convolutions in convolutional networks has been
explored for tasks such as image inpainting and super-resolution. This work
focuses on its potential for image generation. Our experiments further support
the claim that Fourier features are lightweight replacements for self-attention,
allowing the network to learn global information from early layers. We present
qualitative and quantitative results to demonstrate that the proposed FCGAN
achieves results comparable to state-of-the-art approaches of similar depth
and parameter count, reaching an FID of 18.98 on CIFAR-10 and 38.71 on
STL-10 - a reduction of 4.98 and 1.40, respectively. Moreover, in larger image
dimensions, using FFCs instead of self-attention allows for batch sizes up to
twice as large and iterations up to 26 percent faster.
|
|||||||||||||
|