ECIR ’16 paper: “On the Reproducibility of the TAGME Entity Linking System”

Recently I presented our joint work with Krisztian Balog at the reproducibility track of ECIR 2016, entitled “On the Reproducibility of the TAGME Entity Linking System”.   The paper, slides, and the resources are made publicly available.
I particularly encourage the interested reader to check the authors’ comments file on our online repository, where we include the comments taken from our personal communication with the TAGME authors and additional notes on our experiments. These details may inform future efforts related to the re-implementation of TAGME.

Below are the paper abstract and the summary of lessons to be learned from this work.

Paper abstract:

“Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Parts of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.

Lessons to be learned:

– All technical details that affect effectiveness or efficiency should be explained (or at least mentioned) in paper; sharing the source code helps, but finding answers in a large codebase can be highly non-trivial.

– Differences between the published approach and the public API/code should be made explicit.

– Evaluation metrics should be explained in detail.

– Keep all data sources used in a published paper, so that these can be shared upon requests from other researchers.