Exploring Open Information Extraction for Portuguese Using Large Language Models

Nome do aluno


Bruno Souza Cabral


Título do trabalho


Exploring Open Information Extraction for Portuguese Using Large Language Models


Resumo do trabalho


Open Information Extraction (OpenIE) is a critical area in computer science focusing on extracting structured information from unstructured text in an unsupervised fashion, without necessitating predefined relations. OpenIE extracts valuable information for enhancing language-understanding tasks, such as populating knowledge bases, link prediction, and text comprehension. The extraction of OpenIE relations for Portuguese text presents substantial challenges, primarily due to its highly inflected nature, rich grammar, and numerous linguistic peculiarities. Despite numerous OpenIE studies targeting English, few have concentrated on the Portuguese language utilizing Deep Learning methods. Recently, a new branch of Deep Learning research, Generative information extraction, has emerged as a fruitful approach to address various sequence labeling issues. Contrasting sequence labeling methods, generative techniques can input a sentence and autoregressively generate multi-structured semantic representations of the information conveyed. Although Portuguese appears in a limited number of previous OpenIE works, most Deep Learning approaches primarily target multilingual tasks, treating Portuguese as another dataset during training. Furthermore, most training datasets for Portuguese are automatically translated from English sources. This thesis investigates generative methods and sequence labeling for the automated extraction of OpenIE relations from Portuguese texts. The study proposes building both generative and sequence labeling models, training them on Portuguese data, and comparing their performance in extracting OpenIE relations from Portuguese text. This comprehensive analysis contributes to the growing body of literature on the application of Deep Learning techniques for OpenIE in the Portuguese language and lays the foundation for further advancements in this research field.




Daniela Barreiro Claro




Marlo Souza


Membro externo 1


Gabriel Stanovsky


Link para o curriculum lattes




Membro interno 1


Tatiane Rios


Link para o curriculum lattes




Suplente do membro externo


Aline Paes


Link para o curriculum lattes




Suplente do membro interno


Ricardo Rios


Link para o curriculum lattes




Data do exame


13 Sep, 2023


Horário do exame


8:00 AM



Data da Defesa: 
13/09/2023 - 08:00
Tipo de Defesa: 
Qualificação de Doutorado