Sussex-Huawei Locomotion Dataset

SHL Challenge 2025

SHL Challenge 2025 (link to Sussex site)

Task 2

Task 2 will focus on cross-domain settings, where we will use our newly acquired activity recognition dataset as the test dataset (This dataset will be provided on 22/May/2025). The domain to collect this dataset is slightly different from SHL dataset, the Kyutech IMU dataset, or others.

Unlike Task 1, participants for Task 2 are allowed to use multiple open datasets of their choice other than SHL datasets as long as they are available for external persons. At the same time, participants can use LLMs such as GPT4, Gemini, Claude, and so on. Additionally, Task 2 will permit participants to use both labeled and unlabeled data of their choice.

We assume particularly in this challenge 2 that foundation models or LLMs are built not using the user's dataset but something else. In this case, in order to achieve decent performance, the large scale datasets in foundation models or LLMs are required. For this purpose, it will be required to use the concatenation of a lot of mediam-sized training data with heterogeneous nature. For this purpose, users are requested to use not only SHL 2025 datasets, other open datasets are valid to use. Ideally, multiple datasets are requested to use.

However, at the same time, you can use LLMs such as GPT4, Gemini, Claude, and so on, to assist pipelines.

As one unlabeled dataset, we provide the Kyutech IMU data (link to zip file). This Kyutech IMU dataset collected the human activity data in daily life movements in outside and inside using IMU sensor data. IMU sensor data includes linear acceleration (ax, ay, az), the ratio of normal force to gravitational force (gFx, gFy, gFz), the gyroscope (which measures rotational velocity around three axes)(wx, wy, wz), and magnetmeter (Bx, By, Bz). The typical collected activities are walking, still, riding a bicycle, riding a car, running, and so on. This dataset does not have labels but just sensor data. Each file contains 94997 lines x 500 columns, corresponding to 94997 frames each containing 500 samples (5 seconds at the sampling rate 100Hz).

As one example of self-supervised learner, SENvT (link to github) is provided. You can use this self-supervised learner as well as Chronos, TimesNet, and so on.

Test Dataset

Test data is here: test data. Each line of this test data corresponds to one of 23 activities (Refer corresponding training data. The name of the activities is confidential in this challenge: unknown activities). Test data file contains 2860 lines x 500 columns, corresponding to 2860 frames each containing 500 samples (5 seconds at the sampling rate 100Hz) (Released on 28/May/2025). You can use the training data which is corresponding to this test data. Training data file contains 3618 lines x 500 columns, corresponding to 3618 frames each containing 500 samples (5 seconds at the sampling rate 100Hz). Each line of this training data corresponds to some activity among 23 activities (from 0 to 23). Training data file contains 3618 lines x 500 columns, corresponding to 3618 frames each containing 500 samples (5 seconds at the sampling rate 100Hz). (((Added on 1/Jun/2025)training data 2: 2958 lines x 500 columns (28 activities with the same label), nonoverlapped data with training.zip. 5 activities are not occured in test.zip.).)

Provided IMU sensor data contains linear acceleration (ax, ay, az), the ratio of normal force to gravitational force (gFx, gFy, gFz), the gyroscope (which measures rotational velocity around three axes)(wx, wy, wz), and magnetmeter (Bx, By, Bz). Note that these data (test.zip and training.zip) are irrelevant to the SHL dataset nor the Kyutech IMU data. Slightly different domain. Therefore, this task is cross-domain settings. (Note that these test.zip and training.zip are intended for the downstream task in the words of foundation models/self-supervised learning while you can use the SHL dataset and other datasets for building the pre-trained models.)

Submission of Results

You can send the Submission to shltask2@googlegroups.com. You will need to attach the team summary docx (docx document) (pdf). The format of the result submission is as same as task 1 except that the number of lines. The deadlines are as same as task 1. (However, we are not so tight. We allow several days delay. Please contact us for the delay.)

Contact ( New address: shltask2@googlegroups.com)

The email addresses of Tasks 1 and 2 become separate. The question and other issues for Task 2 will be replied by Task 2. The address of Task 2 is shltask2@googlegroups.com.

Questions

1 What is the task? Activity recognition task (or locomotion mode recognition task)
2 What is the training data? First, you can use the SHL2025 training data for the training data. If you can use unlabelled data, you can use the Kyutech IMU data. And then, you can use whatever the data is as long as they are open data. Note that you need to write down all the data you used for the training data in your workshop paper.
3. What is the validation data? Open data which is as same as the training data.
4. What is the testing data? We will release this 22/May/2025. So before this, can you prepare the pretrained models.
5. When will the participant start working on the data, i.e. right now or after the release the testing dataset? You can start right now to prepare the pretrained models.
6. What do you mean by "cross-domain setting"? If your training data and your test data come from the different datasets, they are called "cross-domain setting". If your training data is PAMAP2 and your test data is Opportunity, this is cross-domain. Similar to this, you should prepare the pretrained model trained with some datasets without knowing what is the test dataset. Therefore, we call this a cross-domain setting.
7. Can we use the learning machine built in Task 1? Yes, that is no problem. Even if you only use the prebuilt Chronos, it is no problem. If you use ChatGPT, no problem.
8. Can we use only one training dataset? Yes, you can. Although it is recommended to use multiple datasets, it is no problem to use only one dataset.
9. Can we obtain the usual accelerometer sensor reading? Yes, you can simply add (ax,ay,az) and (gFx,gFy,gFz). That is, (ax+gFx, ay+gFy,az+gFz) is the usual accelerometer readings.

Rules

Most of the rules are identical with challenge 1 except the following: (1) Even though your system performs best in challenge 2, unfortunately we do not provide you money but instead we provide you a certificate of achievement, and (2) we evaluate based on the performance(in terms of F1 measure on test data).

Challenge 2 Persons

Dr. Tsuyoshi Okita (Kyushu Institute of Technology), Dr. Lin Wang, Queen Mary University of London (UK), Prof. Daniel Roggen, University of Sussex (UK), Dr. Mathias Ciliberto, University of Cambridge (UK), Prof. Hristijan Gjoreski, Ss. Cyril and Methodius University (MK), Dr. Kazuya Murao, Ritsumeikan University (JP), Dr. Paula Lago, Concordia University in Montreal (CA).

[References]
SENvT
Tsuyoshi Okita, Kosuke Ukita, Koki Matsuishi, Masaharu Kagiyama, Kodai Hirata, Asahi Miyazaki, Towards LLMs for Sensor Data: Multi-Task Self-Supervised Learning, In Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, 2023.

Kyutech IMU dataset
Asahi Miyazaki, Tengjiu Huang, Tsuyoshi Okita, Asahi Nishikawa, Acquisition of Unlabeled Dataset for Human Activity Recognition, IPSJ (UBI),8 pages, 2025.
pdf