Architecture diagrams#

System architecture#

Diagram

Front-end architecture#

Diagram

Notes#

We try to adhere to the Redux principle of having a single source of truth.
We create a plain MapLibreGL component and use its API directly, rather than using a binding such as react-map-gl. Although this would integrate more nicely with React and Redux hooks, it adds overhead and we can't guarantee that the binding library will always be maintained. Instead, we simply pass marker data and MapLibre click events through a MapWrapper React component. It is possible to avoid using a wrapper and subscribe to the Redux store without React, but this would be more complicated.

Back-end architecture#

Diagram

Dataset files#

All persistent data is stored on the back-end server as JSON files, in the following folder structure as seen from the SERVER_DATA_ROOT location:

├── datasets
│   ├── some-dataset
│   │   ├── config.json (itemProps, vocabs, UI config, languages, etc.)
│   │   ├── about.md (markdown file containing info to be displayed in AboutPanel)
│   │   ├── locations.json (array of lng-lat coordinates for each item)
│   │   ├── searchable.json (array of the property values and searchable strings for each item)
│   │   ├── items
│   │   |   ├── 0.json (full info of first item in the above aggregate JSONs)
│   │   |   ├── 1.json
│   │   |   ├── ...
│   ├── other-dataset
│   │   ├── ...
│   ├── ...

Note that the config.json for each dataset is kept in source control in the @mykomap/config library (to be implemented).

See the back-end test data for example file contents.

Potential optimisation:#

The searchable.json will be loaded into the back-end server's memory. Since there will be one row per item, with 100k items, every 10 characters adds a new megabyte. The really bulky bit is the text searchString part, so maybe it could be kept in its own plain text file, with one line per item. Searching it could be done by streaming it from disk, which avoids loading the entire file permanently into memory (for each dataset).

For instance, this SO thread has some sample stream-searching code, and a reference to a module which performs the streaming by what appears to be a fast non-buffering algorithm.

Data generation#

These directories of JSONs, including the searchable strings in the searchable.json files, need to be pre-generated by a script. This script will be written in JS/TS and live in the monorepo, to be run on the back-end server.

The script will take the full data CSV for a map (generated by the data factory) as inputs, and write the full data into the required JSON files in the directory structure specified above.

Note:#

We will need to manually copy the standard.csv from the data factory server to the back-end. Maybe in the future, the data factory pipeline can be enhanced to write the JSON files to the back-end server so that no manual duplication is necessary (and maybe we can eventually get rid of the separate data server altogether). Or, the bacl-end server could be given a URL to the appropriate standard.csv file(s) as published by the data factory and download it from there as part of a build-data script (possibly when notified by a webhook, or possibly polling and checking the file modification date)

Dataset instances#

For each dataset available in the datasets directory on server start, a dataset instance is created by the Dataset service. Each Dataset instance has a:
getItem method
getConfig method, which includes the vocabs
getAbout method
getLocations method, which returns a stream of the data
search method, which iterates through the data loaded from searchable.json to find matching items