University of Geneva · CUI BSc · Algorithmics & Data Structures

Introduction to the Project

Seminar 4

What Is the Project?

  • You will work with real physiological data from 12 UNIGE study participants
  • Data source: Withings ScanWatch smartwatches — step activity, daily totals
  • Your task: apply course algorithms (sorting, searching, DP, graphs, …)
  • to answer concrete questions derived from the dataset
  • Goal: algorithmic thinking — not just calling pandas functions
  • You must implement the algorithm yourself and justify your choices
  • Optional: use your own smartphone data (Withings App) for extra analyses

Your Grade — 30% of the Final Mark

ComponentWeightDeadline
Code repository + README15%19 May 2026, 23:59
Peer code review15%31 May 2026, 23:59
Total for the project30%

How the Code Part (15%) Is Graded

CriterionWeight inside 15%What earns full points
Correctness5%Outputs are correct on normal + edge cases
Algorithmic reasoning4%Data structure choice + clear Big-O argument
Tests2%Meaningful tests, including edge cases
README quality3%Clear explanation per analysis
Code quality1%Readable, structured, PEP-8-friendly

How the Peer Review Part (15%) Is Graded

CriterionWeight inside 15%What earns full points
Coverage5%Reviews all required analyses and README
Technical accuracy5%Correct comments on correctness/complexity/tests
Actionable feedback3%Specific suggestions with concrete fixes
Professionalism2%Respectful tone and clear structure

Definition of Done (Minimum Requirements)

  • A GitLab repository with clean structure and run instructions
  • At least 3 analyses: 1 Beginner + 1 Intermediate + 1 Advanced
  • Each analysis implemented as your own Python function(s)
  • README section for each analysis: problem · algorithm · why · complexity
  • Tests for core functions (include edge cases)
  • No private data committed — `researchdata/` must be in .gitignore

Good Submission Example (What We Expect)

  • Analysis question: Which date has the highest total steps?
  • Implementation: linear scan with dict aggregation by date
  • README example: 'O(n) time, O(d) space (d = distinct dates)'
  • Test evidence: normal case + empty input + single-row input
  • Comparison note (intermediate/advanced): naive vs optimised runtime
  • Result: clear code, reproducible run instructions, justified choices
  • Example submission: https://gitlab.unige.ch/alg-26/spring-2026/alexander-horst

The Dataset

  • Withings ScanWatch — step counts recorded per hour and per day
  • 12 anonymised participants, several months of continuous data
  • Download the archive from Yareta:
  • https://yareta.unige.ch/archives/19bf7f3c-7d07-48be-8c03-9f754f766906
  • CSV format — one file per participant (e.g. 1487.csv)
  • Read the provided documentation to understand:
  • Who collected the data? Sample size? How many days?

Three Levels of Analyses (Wearable Data)

LevelWhat Is ExpectedExample Questions
Beginner (start now)Explicit loops, clear logic, simple complexity claimsMost active day (linear scan) · weekly average (dict) · rank users with bubble sort (learning baseline)
Intermediate (after Week 6-7)Better structures + baseline comparison + timing evidenceMoving-average trends (sliding window) · unusual days (rule-based) · rank users by multiple metrics
Advanced (after Week 8-11)Algorithmic depth + stronger evidence + careful scopeLongest streak (DP) · day-similarity graph with BFS/DFS · optional Dijkstra on activity-state transitions

You Need One Analysis From Each Level

  • Minimum: 3 analyses total — at least one Beginner, one Intermediate, one Advanced
  • You are free to design your own questions beyond these examples
  • Each analysis must be implemented as a Python function (not just pandas)
  • For Intermediate and Advanced: compare to a simpler baseline approach
  • 'My O(n log n) sort is X× faster than brute force on 10,000 rows'
  • State how you measured timing (input size and number of runs)
  • Tip: start with Beginner after week 6, move up over coming weeks

What Goes in the README

  • For each analysis, answer these four questions:
  • 1. Problem — what real question are you answering?
  • 2. Algorithm — which algorithm/data structure did you implement?
  • 3. Why — why is this algorithm appropriate for this problem?
  • 4. Complexity — what is the time and space complexity (Big-O)?
  • The README is graded as seriously as the code itself
  • Think of it as a technical report, not a template dump
  • Include one short run example and one test example per analysis

Project Folder Structure

  • ├── researchdata/ ← gitignored! never commit private data
  • ├── analyses/ ← your implementations here
  • │ ├── analysis_1.py
  • │ ├── analysis_2.py
  • │ └── analysis_3.py
  • ├── requirements.txt
  • └── README.md ← the most important file
  • Submissions are done via GitLab (UNIGE). Create your account: https://doc.eresearch.unige.ch/gitlab/start and retrieve your access tokens.
  • Then request access to: https://gitlab.unige.ch/alg-26/spring-2026 where you will create your own repository under your name.

Common Mistakes to Avoid

  • Committing data files to git — data is private, add to .gitignore
  • Calling df.sort_values() and calling it 'sorting' — implement the sort
  • No README explanation — graders need to understand your algorithmic choices
  • No unit tests — at least test edge cases (empty input, single element)
  • No complexity analysis — always state time and space complexity
  • No reproducibility — missing run steps, dataset path, or timing protocol
  • Copying code without understanding every line — you will be asked about it

Today's Exercises

  • Ex 1 — Download the dataset; load one CSV and print df.describe()
  • Ex 2 — Flatten all participant files into merged.csv
  • Ex 3 — Compute basic stats manually (min/max/mean steps per participant)
  • Ex 4 — find_most_active_day(): implement without df.idxmax()
  • Ex 5 — Longest streak of days above 8 000 steps (loop-based DP)

Key Takeaways

  • The project counts 30% of your final grade (15% code + 15% peer review)
  • Definition of done is explicit: 3 analyses + tests + README + clean repo
  • You must implement algorithms — not just call pandas functions
  • Justify every choice in your README: problem · algorithm · why · complexity
  • Keep data local — never commit `researchdata/` files to git
  • Start today: get the data loaded and one Beginner analysis running
  • Peer review in week 12 — write code that others can read and understand
University of Geneva · CUI BSc · Algorithmics & Data Structures

Thank You & Have Fun!