Document Type : Original Article
Authors
Department of Computer Engineering Ferdowsi University of Mashhad
Abstract
Abstract—Extracting organizational duties from legal documents is a critical yet challenging task, particularly in low-resource languages like Persian. This paper presents an innovative approach that integrates state-of-the-art Named Entity Recognition (NER) with advanced segmentation techniques and Large Language Models (LLMs) to accurately identify and extract duties assigned to organizations from Persian legal texts. Leveraging the power of the BERT-based model for NER, we enhance the recognition of relevant entities and ensure precise linkage to target organizations. Our method involves segmenting documents into sentences with an enhanced POS-based tokenizer, followed by the retrieval of contextually relevant segments based on the detected entities. We then explore the effectiveness of different LLM configurations, including a hierarchical approach that leverages both small and large models. Our experiments demonstrate that the hierarchical approach, combining ’Llama-3.1-8B’ and ’gpt-4o’, achieves an F1-score of 0.7901, significantly outperforming single-model approaches. This research underscores the potential of LLMs in legal text analysis, paving the way for more advanced tools in Natural Language Processing. Future work will include testing on a broader range of organizations, refining prompt engineering techniques, and enhancing model interpretability.
Keywords
- Index Terms—NLP
- Large Language Models
- Few-shot Learning
- Duty Extraction
- Document Segmentation
- Legal Informatics
Main Subjects