University of Geneva · CUI BSc · Algorithmics & Data Structures

Introduction to the Project

Seminar 4

1 / 16

What Is the Project?

You will work with real physiological data from 12 UNIGE study participants
Data source: Withings ScanWatch smartwatches — step activity, daily totals
Your task: apply course algorithms (sorting, searching, DP, graphs, …)
to answer concrete questions derived from the dataset
Goal: algorithmic thinking — not just calling pandas functions
You must implement the algorithm yourself and justify your choices
Optional: use your own smartphone data (Withings App) for extra analyses

2 / 16

Your Grade — 30% of the Final Mark

Component	Weight	Deadline
Code repository + README	15%	19 May 2026, 23:59
Peer code review	15%	31 May 2026, 23:59
Total for the project	30%	—

3 / 16

How the Code Part (15%) Is Graded

Criterion	Weight inside 15%	What earns full points
Correctness	5%	Outputs are correct on normal + edge cases
Algorithmic reasoning	4%	Data structure choice + clear Big-O argument
Tests	2%	Meaningful tests, including edge cases
README quality	3%	Clear explanation per analysis
Code quality	1%	Readable, structured, PEP-8-friendly

4 / 16

How the Peer Review Part (15%) Is Graded

Criterion	Weight inside 15%	What earns full points
Coverage	5%	Reviews all required analyses and README
Technical accuracy	5%	Correct comments on correctness/complexity/tests
Actionable feedback	3%	Specific suggestions with concrete fixes
Professionalism	2%	Respectful tone and clear structure

5 / 16

Definition of Done (Minimum Requirements)

A GitLab repository with clean structure and run instructions
At least 3 analyses: 1 Beginner + 1 Intermediate + 1 Advanced
Each analysis implemented as your own Python function(s)
README section for each analysis: problem · algorithm · why · complexity
Tests for core functions (include edge cases)
No private data committed — `researchdata/` must be in .gitignore

6 / 16

Good Submission Example (What We Expect)

Analysis question: Which date has the highest total steps?
Implementation: linear scan with dict aggregation by date
README example: 'O(n) time, O(d) space (d = distinct dates)'
Test evidence: normal case + empty input + single-row input
Comparison note (intermediate/advanced): naive vs optimised runtime
Result: clear code, reproducible run instructions, justified choices
Example submission: https://gitlab.unige.ch/alg-26/spring-2026/alexander-horst

7 / 16

The Dataset

Withings ScanWatch — step counts recorded per hour and per day
12 anonymised participants, several months of continuous data
Download the archive from Yareta:
https://yareta.unige.ch/archives/19bf7f3c-7d07-48be-8c03-9f754f766906
CSV format — one file per participant (e.g. 1487.csv)
Read the provided documentation to understand:
Who collected the data? Sample size? How many days?

8 / 16

Three Levels of Analyses (Wearable Data)

Level	What Is Expected	Example Questions
Beginner (start now)	Explicit loops, clear logic, simple complexity claims	Most active day (linear scan) · weekly average (dict) · rank users with bubble sort (learning baseline)
Intermediate (after Week 6-7)	Better structures + baseline comparison + timing evidence	Moving-average trends (sliding window) · unusual days (rule-based) · rank users by multiple metrics
Advanced (after Week 8-11)	Algorithmic depth + stronger evidence + careful scope	Longest streak (DP) · day-similarity graph with BFS/DFS · optional Dijkstra on activity-state transitions

9 / 16

You Need One Analysis From Each Level

Minimum: 3 analyses total — at least one Beginner, one Intermediate, one Advanced
You are free to design your own questions beyond these examples
Each analysis must be implemented as a Python function (not just pandas)
For Intermediate and Advanced: compare to a simpler baseline approach
'My O(n log n) sort is X× faster than brute force on 10,000 rows'
State how you measured timing (input size and number of runs)
Tip: start with Beginner after week 6, move up over coming weeks

10 / 16

What Goes in the README

For each analysis, answer these four questions:
1. Problem — what real question are you answering?
2. Algorithm — which algorithm/data structure did you implement?
3. Why — why is this algorithm appropriate for this problem?
4. Complexity — what is the time and space complexity (Big-O)?
The README is graded as seriously as the code itself
Think of it as a technical report, not a template dump
Include one short run example and one test example per analysis

11 / 16

Project Folder Structure

├── researchdata/ ← gitignored! never commit private data
├── analyses/ ← your implementations here
│ ├── analysis_1.py
│ ├── analysis_2.py
│ └── analysis_3.py
├── requirements.txt
└── README.md ← the most important file
Submissions are done via GitLab (UNIGE). Create your account: https://doc.eresearch.unige.ch/gitlab/start and retrieve your access tokens.
Then request access to: https://gitlab.unige.ch/alg-26/spring-2026 where you will create your own repository under your name.

12 / 16

Common Mistakes to Avoid

Committing data files to git — data is private, add to .gitignore
Calling df.sort_values() and calling it 'sorting' — implement the sort
No README explanation — graders need to understand your algorithmic choices
No unit tests — at least test edge cases (empty input, single element)
No complexity analysis — always state time and space complexity
No reproducibility — missing run steps, dataset path, or timing protocol
Copying code without understanding every line — you will be asked about it

13 / 16

Today's Exercises

Ex 1 — Download the dataset; load one CSV and print df.describe()
Ex 2 — Flatten all participant files into merged.csv
Ex 3 — Compute basic stats manually (min/max/mean steps per participant)
Ex 4 — find_most_active_day(): implement without df.idxmax()
Ex 5 — Longest streak of days above 8 000 steps (loop-based DP)

14 / 16

Key Takeaways

The project counts 30% of your final grade (15% code + 15% peer review)
Definition of done is explicit: 3 analyses + tests + README + clean repo
You must implement algorithms — not just call pandas functions
Justify every choice in your README: problem · algorithm · why · complexity
Keep data local — never commit `researchdata/` files to git
Start today: get the data loaded and one Beginner analysis running
Peer review in week 12 — write code that others can read and understand

15 / 16

University of Geneva · CUI BSc · Algorithmics & Data Structures

Thank You & Have Fun!

16 / 16