Skip to main navigation Skip to search Skip to main content

Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture

  • University of Management and Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP), focusing on identifying and extracting entities such as names, locations, organizations, and other specific labels from unstructured text data. It plays a crucial role in various NLP applications, including information retrieval, question answering, and sentiment analysis. However, while NER systems have been extensively developed for English, adapting them to languages like Urdu poses unique challenges due to linguistic differences and the scarcity of annotated data. In this research, we enhance data diversity and accessibility for Urdu NER by introducing the ZUNERA corpus, the most extensive Urdu NER dataset to date, comprising 1,189,614 tokens and 89,804 named entities. Additionally, we classify the entities into twenty-three different named entities types. We meticulously annotate the corpus, providing clear guidelines and employing the Kappa coefficient to ensure high-quality annotations. Furthermore, we propose the Urdu-Named Entity Recognition with BiGRU-based Deep Learning Architecture (NERD) framework, which facilitates efficient entity recognition in Urdu text. The proposed framework achieves an impressive F1-score of 94.6%. Comparing ZUNERA with the MK-PUCIT dataset underscores its robustness in accurately recognizing entities. Although this study centers on Urdu, the proposed NER framework and annotation pipeline are designed to be language-agnostic. They can be extended to other morphologically rich or low-resource languages, providing a replicable foundation for future cross-lingual research. Overall, our contributions significantly advance Urdu NER research by providing a comprehensive dataset, evaluating state-of-the-art techniques, and introducing a novel framework for efficient Urdu entity recognition.

Original languageEnglish
Article numbere3678
JournalPeerJ Computer Science
Volume12
DOIs
StatePublished - 2026

Keywords

  • Asian languages
  • Low-resource languages
  • Name entity recognition
  • Urdu
  • Word embedding

Fingerprint

Dive into the research topics of 'Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture'. Together they form a unique fingerprint.

Cite this