Logo PUC-Rio Logo Maxwell
ETDs @PUC-Rio
Estatística
Título: CONTINUOUS SPEECH RECOGNITION FOR THE PORTUGUESE USING HIDDEN MARKOV MODELS
Autor: SIDNEY CERQUEIRA BISPO DOS SANTOS
Colaborador(es): ABRAHAM ALCAIM - Orientador
Catalogação: 24/MAI/2006 Língua(s): PORTUGUESE - BRAZIL
Tipo: TEXT Subtipo: THESIS
Notas: [pt] Todos os dados constantes dos documentos são de inteira responsabilidade de seus autores. Os dados utilizados nas descrições dos documentos estão em conformidade com os sistemas da administração da PUC-Rio.
[en] All data contained in the documents are the sole responsibility of the authors. The data used in the descriptions of the documents are in conformity with the systems of the administration of PUC-Rio.
Referência(s): [pt] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=8372&idi=1
[en] https://www.maxwell.vrac.puc-rio.br/projetosEspeciais/ETDs/consultas/conteudo.php?strSecao=resultado&nrSeq=8372&idi=2
DOI: https://doi.org/10.17771/PUCRio.acad.8372
Resumo:
This work presents several contributions for the improvement of CDHMM-based Continuous Speech Recognition (CSR) Systems. Most of these contributions are specific for Portuguese language. Two reduced sets of phonetic units, based on the characteristics of the Portuguese language, are proposed. Several initialization procedures are analized and an efficient and fast method of model initialization is proposed. Methods are described for segmentation of sentences and for concatenation of unit to form word and sentence models. An efficient training algorithm for the reduced sets of units is then proposed. Simulation results show that the performance of the two sets are comparable when bigrams are used. The number of units of these sets are significantly reduced when compared to diphones and triphones, which are widely used sets of context-dependent units. The performance of Continuous Speech Recognizers is strongly dependent on the speech features. For this reason, a comparative performance of several sets of features for the Portuguese language is carried out. The PLP coefficients with their first and second derivatives yielded the best results. A Continuous Speech Recognition System that uses syntactic knowledge of the Portuguese language is proposed. This system makes use of task dependent knowledges for automatic dial-up telephone calls. The recognition system can allows parsing of digits as well as natural numbers. This is a user friendly feature feature that permits, for the caller, a large degree of freedom in placing a call. Based on the finite state machine proposed for the implementation of the speech recognizer described in this thesis, two parsing algorithms are analized - the Level Building and the One pass. Then, a new algorithm is proposed, which is more efficient than the other two techniques. The proposed scheme is more suitable for the use of synthatic and task-dependent knowledge sources. The contribution of this thesis is concerned with the use of the syllables as phonetic units in Portuguese-based CSR systems. Dependent and Independent speaker tasks are examined. It is shown that syllables provide good results when used as phonetic units in Portuguese-based CSR systemsm, in contrast with their poor performance in English-based recognition schemes. Finally, the influence of word-functions is analized in Portuguese-based speech recognition systems. Although word- functions play a critical role in the English-basec CSR, it was found that this is not true for the Portuguese language.
Descrição: Arquivo:   
COVER, ACKNOWLEDGEMENTS, RESUMO, ABSTRACT, SUMMARY AND LISTS PDF    
CHAPTER 1 AND REFERENCES PDF    
CHAPTER 2 AND REFERENCES PDF    
CHAPTER 3 AND REFERENCES PDF    
CHAPTER 4 AND REFERENCES PDF    
CHAPTER 5 AND REFERENCES PDF    
CHAPTER 6 AND REFERENCES PDF    
CHAPTER 7 AND REFERENCES PDF    
CHAPTER 8 AND REFERENCES PDF    
APPENDIX PDF