Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance

Rabab Alkhalifa; Iman Bilal; Hsuvas Borkakoty; Jose Camacho-Collados; Romain Deveaud; Alaa El-Ebshihy; Luis Espinosa-Anke; Gabriela Gonzalez-Saez; Petra Galuščáková; Lorraine Goeuriot; Elena Kochkina; Maria Liakata; Daniel Loureiro; Philippe Mulhem; Florina Piroi; Martin Popel; Christophe Servan; Harish Tayyar Madabushi; Arkaitz Zubiaga

Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance

Rabab Alkhalifa^*
, Iman Bilal
, Hsuvas Borkakoty
, Jose Camacho-Collados
, Romain Deveaud^*
, Alaa El-Ebshihy
, Luis Espinosa-Anke
, Gabriela Gonzalez-Saez
, Petra Galuščáková
, Lorraine Goeuriot
, Elena Kochkina
, Maria Liakata
, Daniel Loureiro
, Philippe Mulhem
, Florina Piroi
, Martin Popel
, Christophe Servan
, Harish Tayyar Madabushi
, Arkaitz Zubiaga

^*Corresponding author for this work

Computer Engineering Department (CE)

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

We describe the first edition of the LongEval CLEF 2023 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

Original language	English
Pages (from-to)	2181-2203
Number of pages	23
Journal	CEUR Workshop Proceedings
Volume	3497
State	Published - 2023
Event	24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023 - Thessaloniki, Greece Duration: 18 Sep 2023 → 21 Sep 2023

Keywords

Evaluation
Temporal Generalisability
Temporal Persistence

Cite this

Alkhalifa, R., Bilal, I., Borkakoty, H., Camacho-Collados, J., Deveaud, R., El-Ebshihy, A., Espinosa-Anke, L., Gonzalez-Saez, G., Galuščáková, P., Goeuriot, L., Kochkina, E., Liakata, M., Loureiro, D., Mulhem, P., Piroi, F., Popel, M., Servan, C., Madabushi, H. T., & Zubiaga, A. (2023). Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance. CEUR Workshop Proceedings, 3497, 2181-2203.

@article{e4da009b91574e22b8b9fbb45b6120c7,

title = "Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance",

abstract = "We describe the first edition of the LongEval CLEF 2023 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.",

keywords = "Evaluation, Temporal Generalisability, Temporal Persistence",

author = "Rabab Alkhalifa and Iman Bilal and Hsuvas Borkakoty and Jose Camacho-Collados and Romain Deveaud and Alaa El-Ebshihy and Luis Espinosa-Anke and Gabriela Gonzalez-Saez and Petra Galu{\v s}{\v c}{\'a}kov{\'a} and Lorraine Goeuriot and Elena Kochkina and Maria Liakata and Daniel Loureiro and Philippe Mulhem and Florina Piroi and Martin Popel and Christophe Servan and Madabushi, \{Harish Tayyar\} and Arkaitz Zubiaga",

note = "Publisher Copyright: {\textcopyright} 2023 Copyright for this paper by its authors.; 24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023 ; Conference date: 18-09-2023 Through 21-09-2023",

year = "2023",

language = "English",

volume = "3497",

pages = "2181--2203",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

}

Alkhalifa, R, Bilal, I, Borkakoty, H, Camacho-Collados, J, Deveaud, R, El-Ebshihy, A, Espinosa-Anke, L, Gonzalez-Saez, G, Galuščáková, P, Goeuriot, L, Kochkina, E, Liakata, M, Loureiro, D, Mulhem, P, Piroi, F, Popel, M, Servan, C, Madabushi, HT & Zubiaga, A 2023, 'Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance', CEUR Workshop Proceedings, vol. 3497, pp. 2181-2203.

TY - JOUR

T1 - Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance

AU - Alkhalifa, Rabab

AU - Bilal, Iman

AU - Borkakoty, Hsuvas

AU - Camacho-Collados, Jose

AU - Deveaud, Romain

AU - El-Ebshihy, Alaa

AU - Espinosa-Anke, Luis

AU - Gonzalez-Saez, Gabriela

AU - Galuščáková, Petra

AU - Goeuriot, Lorraine

AU - Kochkina, Elena

AU - Liakata, Maria

AU - Loureiro, Daniel

AU - Mulhem, Philippe

AU - Piroi, Florina

AU - Popel, Martin

AU - Servan, Christophe

AU - Madabushi, Harish Tayyar

AU - Zubiaga, Arkaitz

PY - 2023

Y1 - 2023

N2 - We describe the first edition of the LongEval CLEF 2023 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

AB - We describe the first edition of the LongEval CLEF 2023 shared task. This lab evaluates the temporal persistence of Information Retrieval (IR) systems and Text Classifiers. Task 1 requires IR systems to run on corpora acquired at several timestamps, and evaluates the drop in system quality (NDCG) along these timestamps. Task 2 tackles binary sentiment classification at different points in time, and evaluates the performance drop for different temporal gaps. Overall, 37 teams registered for Task 1 and 25 for Task 2. Ultimately, 14 and 4 teams participated in Task 1 and Task 2, respectively.

KW - Evaluation

KW - Temporal Generalisability

KW - Temporal Persistence

UR - https://www.scopus.com/pages/publications/85175649590

M3 - Conference article

AN - SCOPUS:85175649590

SN - 1613-0073

VL - 3497

SP - 2181

EP - 2203

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023

Y2 - 18 September 2023 through 21 September 2023

ER -

Extended Overview of the CLEF-2023 LongEval Lab on Longitudinal Evaluation of Model Performance

Abstract

Keywords

Other files and links

Fingerprint

Cite this