Helsinki’s Datahub makes data more usable
4 April 2022
The amount of systems and data sources in an organisation may be overwhelming and near impossible to keep track of. Helsinki is no exception to the rule. Data is dispersed into countless solutions, and bringing it into the same location is not a feasible solution. How can anyone find the data they are seeking for, if they don’t even know where to start looking. How can we bring the findings and wisdom of experts as part of the data for the next users? Helsinki has started a project to seek possibilities of having a data catalog solving these issues, and Forum Virium Helsinkiaims to support the project by piloting its own data catalog, Datahub.
In a city environment, even one question may require multiple points of view and diverse sets of data. For example, the researchers investigating weather impact on traffic have to look for data from multiple systems. The data of e-scooters is in one place, public transportation in another, snow plows in another, etc. In addition, each data source has its own specialist, whose help is often needed for understanding the data correctly. When there is a great amount of data and lots of specialists, building a big picture could prove challenging.
The traditional solution for sharing this kind of knowledge is documentation. When speaking of data, the documentation rarely keeps up in a frequently changing data environment and the data tools usually offer only a possibility to share “just” the data. And usually these tools and documentation are not reachable by all the data users! A data catalog in principle answers to this challenge, by sharing information.
The data catalog collects automatically the schemes and basic information from different systems, and brings them conveniently to a single place. This leaves the systems specialised for storing and moving the data doing what they do best, while the catalog focuses on offering data discovery and understanding. With an easy user interface, everyone has a possibility to get to know what kind of data is available, and what it represents.
Open source as a solution
Forum Virium Helsinki has already been involved in establishing Helsinki Region Infoshare, HRI, which has evolved into an acknowledged open data catalog. HRI is a platform, which collects all public data in one place in the Helsinki Region. Before the data ends up into HRI, it is created in different systems of the city, partnering organizations, projects and processes. Now, we have piloted a next generation data catalog for internal use, which allows even better use of information sources. This helps the gathering of data and the knowledge at an early stage. And like HRI, this is based upon open source solutions.
Automation and usability are essential in next generation data catalogs. Where the public data sources are updated manually, for internal use, the catalog publishes all the data sources automatically and continuously. We started a pilot in the winter of 2022, and already in the first weeks, it has been connected into an important data warehouse and stream processor of IoT.
For users, the data catalog offers many improvements:
- Data discovery over different systems
- Centralized place for information and documentation of data, which offers up to date metadata
- New possibilities to group and enrich datasets, for example by #tagging.
- Automatic information of data quality.
The data catalog is technically a simple tool, but the great benefits of it can be only accomplished with continuous development and collaboration. It serves different kinds of specialists working with data, and supports them to share their knowledge. The aim is to make using diverse data sources easier and more trustworthy. It is our way to make wiser decisions on developing our urban environment.
This article is written by Mikko Hemmi, IT-consultant and CEO of Oikoa Oy, who is consulting Forum Virium Helsinki on building the Datahub data catalog.
Olli-Eemeli Lappi, Forum Virium Helsinki Oy