How to process data

Goal: Generate converted raw data with the same weight unit and reference year (in a machine-readable format that is integrated on the Data Hub).

Approach

Partly in parallel to the data collection, partly thereafter, you will process the data and model the flows and stocks. There are two types of data processing. The first kind of processing deals with bringing the data into the right format and is Data Hub independent, whereas the second type uses the Data Hub and data in the library to integrate it into the database and enable visualisation and modelling.

The first type of processing must be done in any case during the SCA and/or UCA, while the use of the Data Hub is voluntary in carrying out the assessments and therefore can be left out. If left out, then visualisations should be created in a different manner to better represent and understand the data (and possibly include the visualisations in a report).

Processing data: bringing the data in the right format

At this point, you will already have collected some raw data in various files and formats. Your task is to convert this raw (spreadsheet) data into numbers that have the same weight unit (e.g. tonnes) for the same reference year(s), for the length of a year (as opposed to monthly data, for example).

Three main methods for processing

There are three main methods to process the data that are further explained in their respective Handbook parts, all of which you will probably have to use in some combination suitable to your situation:

What is a reference year?

Reference years have been mentioned quite a lot and in the processing their importance become most evident (see below), so we explain here what a reference year is:

A reference year is a single year for which an analysis is made. If there is data for several years, it is best to choose the year that most datasets have in common. If some data only exists for a certain year, then there are ways to adjust them for that reference year.

When you start organising and processing your data, you will notice for which year the data has been generated. You will notice from your different data and sources if they have years or year ranges in common. Based on your data availability, you should choose the same year, namely your reference year(s) that fits. For example, if you have most data for 2019, then use the data from 2019 and 2019 as your reference year, if you have several years to choose from in the different datasets. (If you have data that does not match the reference year, you can adjust this data through conversion.

Filling in the data

For the Sector-wide Circularity Assessment (SCA), fill your processed data into Table 1 and 2. Fill in Table 1 and 2a, by entering the values for each material per lifecycle stage. Be careful to use the same year (your reference year) and the same unit (e.g. tonnes). If you don’t have an economic activity or lifecycle stage for a material, for example, there is no harvesting of fish or manufacturing of aluminium in your city, simply add a 0 (zero). In case you don’t have the value per material (for example, you only have the sum of all organic waste), you can write it in the total column directly instead.

For the Urban Circularity Assessment), fill your processed data in for the data collection sheet.

Processing on the Data Hub: technical process on the platform

The second type of processing that is described here, refers to that of the use of the technical processes on the Data Hub and it leads to an integration of data on it. The documents that were uploaded in its various data formats, such as Excel, PDF, shapefiles, CSV are taken and converted to a standardised format for which templates are provided (see Community Portal image below). This allows data to be entered into the Metabolism of Cities database through a machine-readable format that our website can understand and use.

The data and information points can then be presented, interactive charts or maps can be automatically created and customised in a number of different ways instead of just a static image, manipulated and also used for modelling and analysis.

Where does the technical processing take place

The technical processing takes place on the CityLoops Data Hub. There, under the “Processing” tab, a number of different processing queues are provided. It can be seen what type of data needs to be processed and how many tasks, assigned and unassigned, there are.

The first four queues have very different data types and are linked to the urban boundaries and the material flows and stocks. The bottom four queues are roughly the same type of information, but are not directly related to stocks and flows. For example, in the demographic data queue, data by age groups can be found as well. With the total population for a city, per capita calculations can eventually also be done.

What kind of technical processing can be done

Depending on the Layer, two general types of data and their respective processing steps can be distinguished:

  • (1) GIS data or shapefiles: This type of data produces maps that represent areas (e.g. a park or a suburb) or points on a map (e.g. location of a supermarket or recycling centres).

Processing geospatial data

Processing shapefiles

Processing geospatial spreadsheets

  • (2) Spreadsheet data, such as numerical data: This type of data allows for the creation of single dataset visualisations (e.g. the development of waste amounts collected over the years).

Processing spreadsheets

Once these two types of data are processed, the data can be synthesised, which is also a part of the data processing step. For the synthesis, data points from various single datasets are taken and filled into a data template. The synthesis is required to generate the Sankey diagram that can be seen as a summary of the sector or city and the data processing step altogether.