Submission Guidelines

Participants are required to submit a paper on their method and experimental results.

Paper Submissions Format

Participants are required to submit a paper describing their methods, experimental results, and discussions to the COLIEE 2025 submission page (opens in a new tab). Accepted papers must be presented (in person or online) at the COLIEE workshop of ICAIL2025, following a peer review process.

Papers should not exceed 10 pages (inclusive of references) and should conform to the standards set out at the ICAIL 2025 CFP page (opens in a new tab), except for the copyright description (you may just delete the copyright description, or claim copyright of your paper yourself). As post-proceedings, we plan to publish selected papers -- after extension and another round of reviews -- in a special issue of The Review of Socionetwork Strategies, published by Springer Nature.

Results Submission Format

All the run results should be submitted following the instructions specified for each task in the sections below.

In addition, run results should be submitted with a text file briefly explaining the system used in the run. The explanation should comply with the following format. In the description, please specify the task ID and run tag. "Run tag" is defined in the following sections (Tasks 1, 2, 3, 4, Pilot).

Task: [1,2,3,4,pilot]
# Choose one task your submitting

Run tag: [Your run tag here]
  
- Machine learning models:
# If you use any machine learning models such as pretrained LLM,
# please provide name(s) and URL(s) for the model. For the Task 3
# and Task 4 participants, please provide the model update date that
# can be identfied from URL(s). It should be before July 9, 2024(JST).
  
- External resources:
# If you use any external resources such as Wikipedia, legal
# documents, please provide name(s) and URL(s) for the resources.
  
- Explaination:
# Please provide brief explanation of the system that will be
# considered for making the explanation of your submission information
# in the overview paper. (1-3 lines)


# If you have more than one run, repeat the above as many times as necessary.

Submit runs (tasks 1-4) and system descriptions (tasks 1-4, and pilot) to: coliee_participation@nii.ac.jp

Upon submission, the subject of the mail should be like "[submision] YOUR_GROUP_ID".

Task 1

For Task 1, a submission should consist of a single ASCII text file. Use a single space to separate columns, with three columns per line as follows:

000001 000018 univABC  
000001 000045 univABC  
000001 000130 univABC  
000002 000433 univABC  
...

where:

The first column is the query file name.
The second column is the official case number of the retrieved case.
The third column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation.

At most three runs for each group should be assessed.

Task 2

For Task 2, a submission should consist of a single ASCII text file. Use a single space to separate columns, with three columns per line as follows:

001 013 univABC  
002 037 univABC  
002 002 univABC  
003 008 univABC  
...

where:

The first column is the query id.
The second column is the paragraph number which entails the decision.
The third column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation.

At most three runs for each group should be assessed.

Task 3

Submission format in Task 3 is the TREC eval format used in trec_eval program. Use a single space to separate columns, with six columns per line as follows:

H21-5-3 Q0 213 1 0.8 univABC

Where

The first column is the query id.
The second column is "iter" for trec_eval and not used in the evaluation. Information of the column will be ignored. But please write Q0 in this column.
The third column is the official article number of the retrieved article.
The fourth column is the rank of the retrieved articles.
The fifth column is the similarity value (float value) of the retrieved articles.
The sixth column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation.

We also ask that participants provide ranked lists for all submissions. The submission format is the same as the official submission, and the maximum number of documents for each query is limited to 100. This information will be used to calculate ordinal information retrieval measures such as Mean Average Precision and R-precision to discuss the characteristics of the submission results. To clarify these different types of submissions, please add the suffix "-L" to the submission result file (e.g., if univABC is the result for the submission with limited number of candidates, please use univABC-L for the submission with large number of candidates).

At most three runs for each group should be assessed.

Task 4

For Task 4, again a submission should consist of a single ASCII text file. Use a single space to separate columns, with three columns per line as follows:

H18-1-2 Y univABC  
H18-5-A N univABC  
H19-19-I Y univABC  
H21-5-3 N univABC  
...

where:

The first column is the query id.
"Y" or "N" indicating whether the Y/N question was confirmed to be true ("Y") by the relevant articles, or confirmed to be false ("N").
The third column is called the "run tag" and should be a unique identifier for the submitting group, i.e., each run should have a different tag that identifies the group. Please restrict run tags to 12 or fewer letters and numbers, with no punctuation.

At most three runs for each group should be assessed.

Participants are also required to submit answers and evaluation results of past three years' formal run settings, i.e. using each of the past three years' datasets (H30, R01, R02) as test datasets, older years' datasets (-H29, -H30, -R01) as training datasets.

If this is not realistic due to e.g. the training time, please consult the task organizers.

In your submission, please add the dataset name as a prefix to the original file name:

R06.task4.YOURID Final submission for TestData_{jp,en}.xml

R02.task4.YOURID for riteval_R02_{jp,en}.xml
R01.task4.YOURID for riteval_R01_{jp,en}.xml
H30.task4.YOURID for riteval_H30_{jp,en}.xml

We request results on these specific three datasets (H30, R01, and R02) rather than using the most recent three years' data to ensure continuity in evaluation each year. Since COLIEE 2022, all submitted models have been evaluated using H30, R01, and R02 as test datasets, allowing for a consistent benchmark. This enables objective comparisons between AI models across different years, ensuring that improvements over time can be fairly assessed. Using a fixed benchmark dataset instead of rolling the test years also enhances reliability in performance tracking and mitigates variations caused by dataset shifts in newer data.

Pilot Task (LJPJT25)

To submit your results, you have to implement a solver (opens in a new tab), which will upload your results to the leaderboard and return metrics. Please read the readme file (EN (opens in a new tab)/JA (opens in a new tab)) of this template repository (opens in a new tab) for the detailed steps.

If you register to join the pilot task and submit your pilot dataset memorandum, we will send you an API key required to use the leaderboard on 15 Feb 2025 via email.

ℹ️

Leaderboard system is now available! (opens in a new tab)

Usage of the API key and leaderboard

Confirm you get the API keys by the end of 16 Feb 2025. If not, please contact us.
Please set up Config file (opens in a new tab) following the instruction (opens in a new tab).
- TEAM_NAME, AFFILIATION, and API-KEY should be consistent with the ones that appeared in the API notification mail.
- SYSTEM_NAME is equivalent to "run tag" in tasks 1-4. You can define it by yourself for each run.
- If you submit your results to the leaderboard multiple times with the same SYSTEM_NAME, the last submission will overwrite the registered score.
You can submit your run up to 20 times. Note that submissions with MODE = development will be also counted for this limitation.
By 15 Mar 2025, you should choose at most three runs from your visible runs on the leaderboard and submit system descriptions of them to coliee_participation@nii.ac.jp
We regard scores with system descriptions as official submissions.

Last updated on March 16, 2025

Evaluation Application