Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

Rabab Alkhalifa; Hsuvas Borkakoty; Romain Deveaud; Alaa El-Ebshihy; Luis Espinosa-Anke; Tobias Fink; Petra Galuščáková; Gabriela Gonzalez-Saez; Lorraine Goeuriot; David Iommi; Maria Liakata; Harish Tayyar Madabushi; Pablo Medina-Alias; Philippe Mulhem; Florina Piroi; Martin Popel; Arkaitz Zubiaga

Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

Rabab Alkhalifa^*
, Hsuvas Borkakoty
, Romain Deveaud
, Alaa El-Ebshihy
, Luis Espinosa-Anke
, Tobias Fink
, Petra Galuščáková
, Gabriela Gonzalez-Saez
, Lorraine Goeuriot
, David Iommi
, Maria Liakata
, Harish Tayyar Madabushi
, Pablo Medina-Alias
, Philippe Mulhem
, Florina Piroi
, Martin Popel
, Arkaitz Zubiaga

^*Corresponding author for this work

Computer Engineering Department (CE)

Research output: Contribution to journal › Conference article › peer-review

4 Scopus citations

Abstract

We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

Original language	English
Pages (from-to)	2267-2289
Number of pages	23
Journal	CEUR Workshop Proceedings
Volume	3740
State	Published - 2024
Event	25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024 - Grenoble, France Duration: 9 Sep 2024 → 12 Sep 2024

Keywords

Evaluation
Information Retrieval
Temporal Generalisability
Temporal Persistence
Text Classification

Cite this

Alkhalifa, R., Borkakoty, H., Deveaud, R., El-Ebshihy, A., Espinosa-Anke, L., Fink, T., Galuščáková, P., Gonzalez-Saez, G., Goeuriot, L., Iommi, D., Liakata, M., Madabushi, H. T., Medina-Alias, P., Mulhem, P., Piroi, F., Popel, M., & Zubiaga, A. (2024). Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance. CEUR Workshop Proceedings, 3740, 2267-2289.

@article{5e379eec70c948bcba4cdf4d9ab83b38,

title = "Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance",

abstract = "We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.",

keywords = "Evaluation, Information Retrieval, Temporal Generalisability, Temporal Persistence, Text Classification",

author = "Rabab Alkhalifa and Hsuvas Borkakoty and Romain Deveaud and Alaa El-Ebshihy and Luis Espinosa-Anke and Tobias Fink and Petra Galu{\v s}{\v c}{\'a}kov{\'a} and Gabriela Gonzalez-Saez and Lorraine Goeuriot and David Iommi and Maria Liakata and Madabushi, \{Harish Tayyar\} and Pablo Medina-Alias and Philippe Mulhem and Florina Piroi and Martin Popel and Arkaitz Zubiaga",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright for this paper by its authors.; 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024 ; Conference date: 09-09-2024 Through 12-09-2024",

year = "2024",

language = "English",

volume = "3740",

pages = "2267--2289",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

}

Alkhalifa, R, Borkakoty, H, Deveaud, R, El-Ebshihy, A, Espinosa-Anke, L, Fink, T, Galuščáková, P, Gonzalez-Saez, G, Goeuriot, L, Iommi, D, Liakata, M, Madabushi, HT, Medina-Alias, P, Mulhem, P, Piroi, F, Popel, M & Zubiaga, A 2024, 'Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance', CEUR Workshop Proceedings, vol. 3740, pp. 2267-2289.

TY - JOUR

T1 - Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

AU - Alkhalifa, Rabab

AU - Borkakoty, Hsuvas

AU - Deveaud, Romain

AU - El-Ebshihy, Alaa

AU - Espinosa-Anke, Luis

AU - Fink, Tobias

AU - Galuščáková, Petra

AU - Gonzalez-Saez, Gabriela

AU - Goeuriot, Lorraine

AU - Iommi, David

AU - Liakata, Maria

AU - Madabushi, Harish Tayyar

AU - Medina-Alias, Pablo

AU - Mulhem, Philippe

AU - Piroi, Florina

AU - Popel, Martin

AU - Zubiaga, Arkaitz

PY - 2024

Y1 - 2024

N2 - We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

AB - We describe the second edition of the LongEval CLEF 2024 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

KW - Evaluation

KW - Information Retrieval

KW - Temporal Generalisability

KW - Temporal Persistence

KW - Text Classification

UR - https://www.scopus.com/pages/publications/85201630954

M3 - Conference article

AN - SCOPUS:85201630954

SN - 1613-0073

VL - 3740

SP - 2267

EP - 2289

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024

Y2 - 9 September 2024 through 12 September 2024

ER -

Extended overview of the CLEF 2024 LongEval Lab on Longitudinal Evaluation of Model Performance

Abstract

Keywords

Other files and links

Fingerprint

Cite this