Título: | ASSESSMENT OF FINE-TUNING ON END-TO-END SPEECH RECOGNITION MODELS | ||||||||||||
Autor: |
JONATAS DOS SANTOS GROSMAN |
||||||||||||
Colaborador(es): |
HELIO CORTES VIEIRA LOPES - Orientador |
||||||||||||
Catalogação: | 04/NOV/2022 | Língua(s): | ENGLISH - UNITED STATES |
||||||||||
Tipo: | TEXT | Subtipo: | THESIS | ||||||||||
Notas: |
[pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio. [en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio. |
||||||||||||
Referência(s): |
[pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=61086&idi=1 [en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=61086&idi=2 |
||||||||||||
DOI: | https://doi.org/10.17771/PUCRio.acad.61086 | ||||||||||||
Resumo: | |||||||||||||
Using representations given by a large pre-trained model has become
the primary strategy to reach the state-of-the-art in the most varied tasks. A
recently proposed large pre-trained model, wav2vec 2.0, was seminal for several
other works on pre-training large models on speech data. Many models are
being pre-trained using the same transformer-based architecture as wav2vec
2.0 and are getting state-of-the-art in various speech-related tasks. However,
few works have proposed further analysis of these models in different finetuning
scenarios. Our work investigates these models concerning two different
aspects. The first is about the cross-lingual transferability of these models. Our
experiments showed us that the size of data used during the pre-training of
these models is not as crucial to the transferability as the diversity. We noticed
that the performance of Indo-European languages is superior to non-Indo-
European languages in the evaluated models. We have seen a positive crosslingual
transfer of knowledge using monolingual models, which was noticed
in all the languages we used but was more evident when the language used
during the pre-training was more similar to the downstream task language. The
second aspect we investigated in our work is how well these models perform
in data imbalance scenarios, where there is a more representative subset in
the fine-tuning dataset. Our results showed that data imbalance in fine-tuning
generally affects the final result of the models, with better performance in
the most representative subsets. However, greater variability in the training
set favors model performance for a more representative subset. Nevertheless,
this greater variability in the data did not favor languages not seen during
training. We also observed that the models seem more robust in dealing with
gender imbalance than age or accent. With these findings, we hope to help the
scientific community in the use of existing pre-trained models, as well as assist
in the pre-training of new models.
|
|||||||||||||
|