Brazil's New Method Automatically Clusters Civic Proposals
Researchers have developed a novel method to automatically cluster civic proposals submitted through Brazil's National Participation Platform. The innovative approach combines a language model with seed words and an automated validation process.
The team experimented with various pre-trained language models for text representation. However, they discovered the importance of using domain-specific language models for tasks involving specialized terminology. GovBERT-BR, a model trained on Brazilian governmental data, demonstrated promising results in topic modeling.
The automated pipeline can handle large volumes of text data, making it suitable for large-scale participatory platforms. Topic modeling, specifically BERTopic, was employed to classify and understand public proposals. Incorporating seed words from a governmental vocabulary aligned generated topics with existing structures, transforming unstructured citizen input into actionable data for governments. Large Language Models (LLMs) assisted in validating and labeling generated topics, significantly reducing manual effort.
The research demonstrates the potential of automated topic modeling for processing civic proposals at scale. By combining language models with seed words and automated validation, the method transforms unstructured citizen input into valuable data for governments, enhancing their decision-making processes.