Exploring Open Information Extraction for Portuguese Using Large Language Models

Nome do aluno

 

Bruno Souza Cabral

 

Título do trabalho

 

Exploring Open Information Extraction for Portuguese Using Large Language Models

 

Resumo do trabalho

 

Open Information Extraction (OpenIE) is a critical area in computer science focusing on extracting structured information from unstructured text in an unsupervised fashion, without necessitating predefined relations. OpenIE extracts valuable information for enhancing language-understanding tasks, such as populating knowledge bases, link prediction, and text comprehension. The extraction of OpenIE relations for Portuguese text presents substantial challenges, primarily due to its highly inflected nature, rich grammar, and numerous linguistic peculiarities. Despite numerous OpenIE studies targeting English, few have concentrated on the Portuguese language utilizing Deep Learning methods. Recently, a new branch of Deep Learning research, Generative information extraction, has emerged as a fruitful approach to address various sequence labeling issues. Contrasting sequence labeling methods, generative techniques can input a sentence and autoregressively generate multi-structured semantic representations of the information conveyed. Although Portuguese appears in a limited number of previous OpenIE works, most Deep Learning approaches primarily target multilingual tasks, treating Portuguese as another dataset during training. Furthermore, most training datasets for Portuguese are automatically translated from English sources. This thesis investigates generative methods and sequence labeling for the automated extraction of OpenIE relations from Portuguese texts. The study proposes building both generative and sequence labeling models, training them on Portuguese data, and comparing their performance in extracting OpenIE relations from Portuguese text. This comprehensive analysis contributes to the growing body of literature on the application of Deep Learning techniques for OpenIE in the Portuguese language and lays the foundation for further advancements in this research field.

 

Orientador

 

Daniela Barreiro Claro

 

Co-orientador

 

Marlo Souza

 

Membro externo 1

 

Gabriel Stanovsky

 

Link para o curriculum lattes

 

https://scholar.google.co.il/citations?user=AtkvBFYAAAAJ&hl=en

 

Membro interno 1

 

Tatiane Rios

 

Link para o curriculum lattes

 

http://lattes.cnpq.br/0851148137941240

 

Suplente do membro externo

 

Aline Paes

 

Link para o curriculum lattes

 

http://lattes.cnpq.br/0506389215528790

 

Suplente do membro interno

 

Ricardo Rios

 

Link para o curriculum lattes

 

http://lattes.cnpq.br/0427387583450747

 

Data do exame

 

13 Sep, 2023

 

Horário do exame

 

8:00 AM

 

 

Data da Defesa: 
13/09/2023 - 08:00
Tipo de Defesa: 
Qualificação de Doutorado