ETDs

Estatística

Título:

FCGAN: SPECTRAL CONVOLUTIONS VIA FFT FOR CHANNEL-WIDE RECEPTIVE FIELD IN GENERATIVE ADVERSARIAL NETWORKS

Autor:

PEDRO HENRIQUE BARROSO GOMES

Colaborador(es):

MARCELO GATTASS - Orientador

Catalogação:

23/MAI/2024

Língua(s):

PORTUGUESE - BRAZIL

Tipo:

TEXT

Subtipo:

THESIS

Notas:

[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.

Referência(s):

[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=66801&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=66801&idi=2

DOI:

https://doi.org/10.17771/PUCRio.acad.66801

Resumo:

This thesis proposes the Fast Fourier Convolution Generative Adversarial Network (FCGAN). This novel approach employs convolutions in the frequency domain to enable the network to operate with a channel-wide receptive field. Due to small receptive fields, traditional convolution-based GANs struggle to capture structural and geometric patterns. Our method uses Fast Fourier Convolutions (FFCs), which use Fourier Transforms to operate in the spectral domain, affecting the feature input globally. Thus, FCGAN can generate images considering information from all feature locations. This new hallmark of the network can lead to erratic and unstable performance. We show that employing spectral normalization and noise injections stabilizes adversarial training. The use of spectral convolutions in convolutional networks has been explored for tasks such as image inpainting and super-resolution. This work focuses on its potential for image generation. Our experiments further support the claim that Fourier features are lightweight replacements for self-attention, allowing the network to learn global information from early layers. We present qualitative and quantitative results to demonstrate that the proposed FCGAN achieves results comparable to state-of-the-art approaches of similar depth and parameter count, reaching an FID of 18.98 on CIFAR-10 and 38.71 on STL-10 - a reduction of 4.98 and 1.40, respectively. Moreover, in larger image dimensions, using FFCs instead of self-attention allows for batch sizes up to twice as large and iterations up to 26 percent faster.

Descrição:			Arquivo:
COMPLETE			PDF