Introduction

This is a series of texts where I consider some aspects of public transport data modelling, based on my personal experience writing my Master’s thesis from year 2019 on, and working at HSL (Helsinki Region Transport) as transit planner and data engineer. In the following sections, I will discuss my thoughts about time and place representation in transit data, and what software tools I have used and like to recommend for dealing with transit data.

These texts represent my own thoughts and experiences - not exhaustive, scientifically proved facts or common best practices. There is no answer how to best model and use transit data, and what tools to use for it - it depends! And I’ve only just started my own path in the fascinating world of transit data modelling.

Estimated time use

For the 5 ECTS that I wish to acquire from Aalto University, I have calculated a rough estimate of time I’ve spent with topics related to these texts. These hours come, in principle, on top of my paid working time and time spent to my Master’s thesis, although they all have to do with the same topics.

Topic Description Hours
Time and place in transit data Identification of temporal and spatial attributes, their units, and relationships in transit planning and data models. Explored, tested and learned largely during Spring 2020, as I developed a schedule data model as part of my thesis, but the schedule model part was eventually dropped from the final thesis. In 2021, further learning in the Jore4 project at HSL. 20
GTFS schedule & network data Exploring the GTFS data model and open HSL GTFS datasets since 2019 when preparing for the thesis. Later on, many kinds of GTFS applications in analytics at HSL, due to the limited access to original planning data in Jore3. 30
HFP operations data Core of my thesis: a lot of trial-and-error work with large amounts of HFP data since 2019, such as finding the best ways to format, store and query the data, and possible types of errors. Most of the hands-on work not visible in the thesis. 40
Data validation Conceptualization and proof-of-concept testing of data ”content” quality monitoring at HSL since end of 2020, as it was recognized that monitoring only the health of technical systems is not enough, and there is a need for a feedback loop of monitoring and actions to improve the reliability of both transit planning and operations data. 20
Data tools Learning about data and coding related tools as part of everyday work and studies. Especially software library documentation, vignettes, and blog posts about technologies, techniques, workflows, and visualization of relational, spatiotemporal and movement data. 30