Maxwell Para Simples Indexação

Título

[pt] CLASSES DE PALAVRAS - DA GRÉCIA ANTIGA AO GOOGLE: UM ESTUDO MOTIVADO PELA CONVERSÃO DE TAGSETS

Título

[en] PART OF SPEECH - FROM ANCIENT GREECE TO GOOGLE: A STUDY MOTIVATED BY TAGSET CONVERSION

Autor

[pt] LUIZA FRIZZO TRUGO

Vocabulário

[pt] LINGUISTICA COMPUTACIONAL

Vocabulário

[pt] PARTICIPIO

Vocabulário

[pt] ANOTACAO

Vocabulário

[pt] CORPUS

Vocabulário

[pt] CLASSE DE PALAVRAS

Vocabulário

[en] COMPUTATIONAL LINGUISTICS

Vocabulário

[en] PARTICIPLE

Vocabulário

[en] ANNOTATION

Vocabulário

[en] CORPORA

Vocabulário

[en] PART OF SPEECH

Resumo

[pt] A dissertação Classes de palavras — da Grécia Antiga ao Google: um estudo motivado pela conversão de tagsets consiste em um estudo linguístico sobre classes gramaticais. A pesquisa tem como motivação uma tarefa específica da Linguística Computacional: a anotação de classes gramaticais (POS, do inglês part of speech ). Especificamente, a dissertação relata desafios e opções linguísticas decorrentes da tarefa de alinhamento entre dois tagsets: o tagset utilizado na anotação do corpus Mac-Morpho, um corpus brasileiro de 1.1 milhão de palavras, e o tagset proposto por uma equipe dos laboratórios Google e que vem sendo utilizado no âmbito do projeto Universal Dependencies (UD). A dissertação tem como metodologia a investigação por meio da anotação de grandes corpora e tematiza sobretudo o alinhamento entre as formas participiais. Como resultado, além do estudo e da documentação das opções linguísticas, a presente pesquisa também propiciou um cenário que viabiliza o estudo do impacto de diferentes tagsets em sistemas de Processamento de Linguagem Natural (PLN) e possibilitou a criação e a disponibilização de mais um recurso para a área de processamento de linguagem natural do português: o corpus Mac-Morpho anotado com o tagset e a filosofia de anotação do projeto UD, viabilizando assim estudos futuros sobre o impacto de diferentes tagsets no processamento automático de uma língua.

Resumo

[en] The present dissertation, Part of speech — from Ancient Greece to Google: a study motivated by tagset conversion, is a linguistic study regarding gramatical word classes. This research is motivated by a specific task from Computational Linguistics: the annotation of part of speech (POS). Specifically, this dissertation reports the challenges and linguistic options arising from the task of aligning two tagsets: the first used in the annotation of the Mac-Morpho corpus — a Brazilian corpus with 1.1 million words — and the second proposed by Google research lab, which has been used in the context of the Universal Dependencies (UD) project. The present work adopts the annotation of large corpora as methodology and focuses mainly on the alignment of the past participle forms. As a result, in addition to the study and the documentation of the linguistic choices, this research provides a scenario which enables the study of the impact different tagsets have on Natural Language Processing (NLP) systems and presents another Portuguese NLP resource: the Mac-Morpho corpus annotated with project UD s tagset and consistent with its annotation philosophy, thus enabling future studies regarding the impact of different tagsets in the automatic processing of a language.

Orientador(es)

MARIA CLAUDIA DE FREITAS

Banca

HELENA FRANCO MARTINS

Banca

MARIA CLAUDIA DE FREITAS

Banca

SANDRA MARIA ALUÍSIO

Catalogação

2016-11-10

Apresentação

2016-08-25

Tipo

[pt] TEXTO

Formato

application/pdf

Idioma(s)

PORTUGUÊS

Referência [pt]

https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=27933@1

Referência [en]

https://www.maxwell.vrac.puc-rio.br/colecao.php?strSecao=resultado&nrSeq=27933@2

Referência DOI

https://doi.org/10.17771/PUCRio.acad.27933

Arquivos do conteúdo

NA ÍNTEGRA PDF