
Task 2 will focus on
cross-domain settings, where we will use our newly acquired activity recognition dataset as the test dataset (This dataset will be provided on 22/May/2025). The domain to collect this dataset is slightly different from SHL dataset, the Kyutech IMU dataset, or others.
Unlike Task 1, participants for Task 2 are allowed to use multiple open datasets of their choice other than SHL datasets as long as they are available for external persons. At the same time, participants can use LLMs such as GPT4, Gemini, Claude, and so on. Additionally, Task 2 will permit participants to use both labeled and unlabeled data of their choice.
We assume particularly in this challenge 2 that foundation models or LLMs are built not using the user's dataset but something else. In this case, in order to achieve decent performance, the large scale datasets in foundation models or LLMs are required. For this purpose, it will be required to use the concatenation of a lot of mediam-sized training data with heterogeneous nature. For this purpose, users are requested to use not only SHL 2025 datasets, other open datasets are valid to use. Ideally, multiple datasets are requested to use.
However, at the same time, you can use LLMs such as GPT4, Gemini, Claude, and so on, to assist pipelines.
As one unlabeled dataset, we provide
the Kyutech IMU data (link to zip file).
This Kyutech IMU dataset collected the human activity data in daily life movements in outside and inside using IMU sensor data. IMU sensor data includes linear acceleration (ax, ay, az), the ratio of normal force to gravitational force (gFx, gFy, gFz), the gyroscope (which measures rotational velocity around three axes)(wx, wy, wz), and magnetmeter (Bx, By, Bz). The typical collected activities are walking, still, riding a bicycle, riding a car, running, and so on. This dataset does not have labels but just sensor data. Each file contains 94997 lines x 500 columns, corresponding to 94997 frames each containing 500 samples (5 seconds at the sampling rate 100Hz).
As one example of self-supervised learner, SENvT (link to github) is provided. You can use this self-supervised learner as well as Chronos, TimesNet, and so on.
This dataset will be provided here(nothing at this moment) on 22/May/2025.
Registration procedure changed! (The email addresses of Tasks 1 and 2 become different! Send Task 2 for registration ASAP (until 17/May)). The question and other issues for Task 2 will be replied by Task 2.
The deadline of registration is 17/May/2025. (Write these four to shltask2@googlegroups.com for Task 2. If everything is consistent with the registration to Task 1,
please send Task 2 (1)The name of your team.) (1)The name of the team, (2)The names of the participants in the team, (3)The organization/company
(individuals are also encouraged), and (4)The contact person with
email address).
1 What is the task? Activity recognition task (or locomotion mode recognition task)
2 What is the training data? First, you can use the SHL2025 training data for the training data. If you can use unlabelled data, you can use the Kyutech IMU data. And then, you can use whatever the data is as long as they are open data. Note that you need to write down all the data you used for the training data in your workshop paper.
3. What is the validation data? Open data which is as same as the training data.
4. What is the testing data? We will release this 22/May/2025. So before this, can you prepare the pretrained models.
5. When will the participant start working on the data, i.e. right now or after the release the testing dataset?
You can start right now to prepare the pretrained models.
6. What do you mean by "cross-domain setting"? If your training data and your test data come from the different datasets, they are called "cross-domain setting". If your training data is PAMAP2 and your test data is Opportunity, this is cross-domain. Similar to this, you should prepare the pretrained model trained with some datasets without knowing what is the test dataset. Therefore, we call this a cross-domain setting.
7. Can we use the learning machine built in Task 1? Yes, that is no problem. Even if you only use the prebuilt Chronos, it is no problem. If you use ChatGPT, no problem.
8. Can we use only one training dataset? Yes, you can. Although it is recommended to use multiple datasets, it is no problem to use only one dataset.
Most of the rules are identical with challenge 1 except the following: (1) Even though your system performs best in challenge 2, unfortunately
we do not provide you money but instead we provide you a certificate of achievement, and
(2) we evaluate based on the performance(in terms of F1 measure on test data).