Mykomap Data#
This aims to describe the Mykomap 4.x back-end data format, and the process for generating it.
Where is the data stored?#
Mykomap's data is stored in JSON files below a designated directory on the server.
The back-end application needs to be configured with the path to this
directory with an entry in .env
:
SERVER_DATA_ROOT=/some/path
If this is not set, the default is to use the data in the back-end
application's test/data
directory - equivalent to setting:
SERVER_DATA_ROOT=test/data
There should be a dataset in there which is used for unit testing. Using this ad the default means the back-end has a dataset available out of the box.
[!NOTE] The
SERVER_DATA_ROOT
path can be relative, in which case it is relative to the current working directory of the application. Typically, when run from annpm
run-script, this is the root back-end application directory.
What format is the Mykomap data stored as?#
The directory structure looks like this, in schematic form:
<dataroot>/
<dataset X>/
about.md
config.json
locations.json
searchable.json
items/
0.json
... <more item files> ...
<N>.json
<dataset Y>/
... <same files as above> ...
items/
... <item files> ...
... <more datasets> ...
[!INFO] File names in the schematic above should be interpreted literally - except when enclosed in angle brackets, which means the name is symbolic.
A brief synopsis of these files' structure and purpose:
about.md
: A human readable description of the dataset, in markdown format. It gets shown in the "about" side-bar panel of the map.config.json
: Meta-data for the dataset, such as property schema, vocabulary definitions, and map configuration parameters goes here.locations.json
: An ordered list of longitude/latitude locations, one per data item.searchable.json
: An ordered index of the searchable and filterable item properties.items/*.json
: These files store the full information for each data item.
More details of these files can be found in the section below, under More Details.
Why is the format like this?#
The aims of this format's design are to:
- Support very large dataset sizes, of approaching 500 thousand items.
- Allow maximally fast responses to browsers loading and querying these datasets via the API, by storing data in a form identical to that being served to API clients.
- Leverage JSON where possible. This format has highly optimised native support on browser platforms, whereas other formats such as CSV usually cannot compete because they need to be parsed in JavaScript - this usually results in more memory use and slower parsing speeds.
- Avoid the need for a relational or triple-store database, which comes with some overhead, both in resources and cognition.
- Minimise the verbosity of data so far as possible, such that file sizes are not too large.
- "Keep It Simple" so far as possible, we want a format which is easy to understand and work with - when it does not affect the above goals.
- Build on the concepts and precedents of earlier Mykomap data. For instance, we utilise SKOS vocabularies and [QNames][qnames], and we use more or less the same property definition and vocabulary schema.
In the first instance, when a map is loaded, we want to minimise the delay between opening a map link and the user seeing pins appear on the map. With a large number of pins, some sort of clustering of item locations is needed, and the clustering algorithm contributes the largest amount to that delay.
We use [MapLibre][maplibre] to do this clustering, which uses a well-optimised algorithm. This library is based on the open source version of the commercial [MapBox][mapbox] library, and is much faster than [Leaflet][leaflet] with the Leaflet.markercluster plug-in that was used for earlier versions of Mykomap. Nevertheless, it can take a small number of seconds to load a file with half a million locations and then cluster them, yet alone when extra data is added.
As main information needed up-front when loading a map are the pin
locations, these are stored by themselves in a file locations.json
as a minimised JSON array of coordinate pairs, without white-space and
with a restricted number of decimal places. This can be sent verbatim
as soon as the dataset ID is known.
Item IDs are omitted from this locations.json
file, which further
minimises the size of the data. Instead of using IDs, dataset items
are loaded in a predefined order and then identified within the
context of the dataset by their index number. This has some
consequences, however, described later.
As most of the other information about the items will not be needed until a user starts to browse the pins on the map, loading this information is deferred. The full information for each dataset item is stored in a separate file for it, with a name constructed from the item index number (not the item ID - a consequence of the locations not having IDs included). JSON is also used here, but since these files are relatively small, they are indented for readability. A dataset item's ID can be found by looking in these files.
After the locations, something which is also needed early on loading a map,
Although typically much smaller and less urgently required than locations data, what's needed next when loading a map are the following:
- Vocabulary definitions, which is shown in the side-bar filters;
- The item property schema, which dictates how items should be interpreted, searched, and displayed; and
- The map configuration settings, which dictate how the map is rendered.
These are all stored as JSON and indented for readability in config.json
.
Finally, there needs to be an index for filtering by category and text
searches. This is stored as JSON in searchable.json
.
As this file can be a lot of items, this file can become quite large and we need to be parsimonious with formatting characters. Therefore is not indented, and uses arrays rather than objects, to avoid repeating the keys for every item. However, for the sake of human readability and inspection, it is broken up such that headers are defined on the first line, and each subsequent line is data for one item, in the defined order. The result reads somewhat like a CSV file.
How to create a dataset.#
Typically, the data for datasets is supplied in some format defined by the supplier. Common examples are spreadsheets; CSV or TSV files; and JSON, possibly via some API over the web; or in some cases sent manually by email.
In addition to the data itself, we need definitions of the semantics and syntax within the data - such as the taxonomies, any mark-up or encoding schemes, and identifiers used. These are typically sent by email... or, in the worst case, we must try to infer them from the data.
It is a requirement that the data is logically tabular - a sequence of items sharing the same set of properties - and that each dataset item representing an entry in the map or directory has a unique identifier within the dataset.
We then need to transform this into the prerequisites for the dataset
import
tool, which generates Mykomap datasets.
Prerequisites and process#
There are two specific files needed as prerequisites for dataset import
.
- A CSV file with a compatible suitable schema, including a row for each dataset item.
- A
config.json
file containing all the required metadata.
More about these below, but the latter is typically the same one used
in the dataset - in fact the dataset
tool copies it into place in
the output folder.
The output is a new folder, containing the dataset files. The name of
this folder is interpreted as the (URI-component encoded)
dataset ID when it is written or copied into the root data directory
(set by SERVER_DATA_ROOT
in the back-end's .env
file.)
[!INFO] The
dataset
command will not write the data if the output directory already exists. It is up to you to delete a directory first if you want to overwrite it - this to make it harder to accidentally overwrite something else with the data.
The dataset
tool is part of the back-end
application, and can be
run from the apps/back-end
project directory using npm
. The
commands supports various subcommands, and the relevant one for
importing data is import
.
It can be used like this:
npm run dataset import $CONFIG $CSV $OUTPUT_DIR
Where:
$CONFIG
should be the path to theconfig.json
file,$CSV
should be the path to the CSV file,$OUTPUT_DIR
should be the path to write the output dataset files to.
[!INFO] More details can be obtained by running the command
npm run dataset -- import --help
config.json
#
The full schema of this file is outside the scope of this document, for the sake of keeping it fairly short. It's also likely to evolve from the exact schema at the time of writing.
For now I will just note that some of the information needed for
config.json
must be composed manually; and some, those parts
describing the vocabularies, can be obtained from the open-data
pipeline's output file, vocabs.json
.
[!NOTE] Beware that the name of this file is configurable and may vary from case to case, although the default is
vocabs.json
.[!INFO] The format of
vocabs.json
files are very similar (if historically not quite identical) toconfig.json
'svocabs
attribute. The information is composed from one or more existing SKOS vocabularies.
Another important function of the config.json
file is to describe
the properties of the dataset items, which is done in the itemProps
attribute.
And finally, these descriptions include an attribute from
which
identifies which CSV header to source the property value from. This is
for the benefit of the dataset
command.
The CSV file#
The format of this file is essentially dictated by config.json
's
itemProps.<item>.from
attribute, as described above. The CSV may
have more headers than this - these will be ignored.
CSV fields are interpreted depending on the other attributes of the
itemProps
property descriptions. See [More Details][#more-details]
below.
standard.csv
#
Each of the datasets in the open-data project output a CSV file
containing the data in a normalised form, with added or refined
information (typically geocoded locations). This is typically what we
use as input to the dataset import
command.
This CSV file has a set of standard headers. It can also have any
number of arbitrary optional extras. For historical reasons this
schema is called the "Standard CSV format", and the file is called
standard.csv
Optionally RDF data can also be generated by open-data
datasets, but this is not useful for dataset import
.
[!INFO] This step is performed by code in the open-data and se-open-data projects, and typically published on http://data.digitalcommons.coop/
In the current incarnation of the schema, the standard headers are:
- Identifier
- Name
- Description
- Organisational Structure
- Primary Activity
- Activities
- Street Address
- Locality
- Region
- Postcode
- Country ID
- Territory ID
- Website
- Phone
- Companies House Number
- Qualifiers
- Membership Type
- Latitude
- Longitude
- Geo Container
- Geo Container Confidence
- Geo Container Latitude
- Geo Container Longitude
- Geocoded Address
However, any given dataset can have arbitrary additional headers added when convenient.
[!INFO] More of the gory details of the semantics are embedded as comments in the code and data here.
Many of these headers are in fact somewhat specific to the ICA and Co-ops UK datasets, and often left blank. The key ones for most purposes are:
- Identifier: an arbitrary but unique (to the dataset) string.
- Name: an arbitrary short string; displayed as pin names.
- Street Address: an arbitrary string which can be combined into an address for geocoding, if necessary, and/or displayed on pin pop-ups.
- Locality: ditto; combined into the address.
- Region: ditto; combined into the address.
- Postcode: ditto; combined into the address, and/or used as for geocoding.
- Country ID: ditto; combined into the address, and/or used as a filter category.
- Latitude: a decimal number; indicates a manually supplied location.
- Longitude: a decimal number; indicates a manually supplied location.
- Geo Container: a unique URI representing the location; typically an open-street-map location URL.
- Geo Container Confidence: a decimal; interpreted as a percentage that indicates the geocoding confidence.
- Geo Container Latitude: a decimal number; indicates a geocoded location.
- Geo Container Longitude: a decimal number; indicates a geocoded location.
- Geocoded Address: an arbitrary string which records the (typically cleaned up) address which was actually passed to the geocoder; used for diagnosing problems with geocoded locations.
The only strictly mandatory one is Identifier. Location coordinates of some sort is obviously needed for a marker to appear on the map, and a Name is needed for it to be identifiable.
Example#
Here's a specific example of the process just described. It's based on the dataset currently in the back-end test data.
First, the CSV data:
Identifier,Name,Desc,Address,Websites,Activity,Other Activities,Latitude,Longitude,Geocoded Latitude,Geocoded Longitude,Validated
aaa,"Apple Co-op",We grow fruit.","1 Apple Way, Appleton",http://apple.coop,AM130,,0,0,51.6084367,-3.6547778,true
bbb,"Banana Co","We straighten bananas.","1 Banana Boulevard, Skinningdale",http://banana.com,AM60,AM70;AM120,0,0,55.9646979,-3.1733052,false
ccc,"The Cabbage Collective","We are artists.","2 Cabbage Close, Caulfield",http://cabbage.coop;http://cabbage.com,AM60,AM130,,,54.9744687,-1.6108945,true
ddd,"The Chateau","The Chateau is not a place, it is a state of mind.",,,,,,,,,
Or in a more readable table format:
Identifier | Name | Desc | Address | Websites | Activity | Other Activities | Latitude | Longitude | Geocoded Latitude | Geocoded Longitude | Validated |
---|---|---|---|---|---|---|---|---|---|---|---|
aaa | Apple Co-op | We grow fruit. | 1 Apple Way, Appleton | http://apple.coop | AM130 | 0 | 0 | 51.6084367 | -3.6547778 | true | |
bbb | Banana Co | We straighten bananas. | 1 Banana Boulevard, Skinningdale | http://banana.com | AM60 | AM70;AM120 | 0 | 0 | 55.9646979 | -3.1733052 | false |
ccc | The Cabbage Collective | We are artists. | 2 Cabbage Close, Caulfield | http://cabbage.coop;http://cabbage.com | AM60 | AM130 | 54.9744687 | -1.6108945 | true | ||
ddd | The Chateau | The Chateau is not a place, it is a state of mind. |
Now, the config.json
. We can bend the strict JSON format here to use
comments to annotate it, but in the real one comments would upset the
JSON parser, so omit them.
{
"prefixes": {
// There's just one prefix defined here, because just one vocabulary
"https://dev.lod.coop/essglobal/2.1/standard/activities-modified/": "am"
},
"languages": ["en"], // Only one language supported: English
"ui": { "directory_panel_field": "activity" }, // Sets the default panel to show
"vocabs": {
"am": { // This is the Activities vocabulary
"en": { // The English title and term definitions follow
"title": "Activities (Modified)",
"terms": {
"AM10": "Arts, Media, Culture & Leisure",
"AM20": "Campaigning, Activism & Advocacy",
"AM30": "Community & Collective Spaces",
"AM40": "Education",
"AM50": "Energy",
"AM60": "Food",
"AM70": "Goods & Services",
"AM80": "Health, Social Care & Wellbeing",
"AM90": "Housing",
"AM100": "Money & Finance",
"AM110": "Nature, Conservation & Environment",
"AM120": "Reduce, Reuse, Repair & Recycle",
"AM130": "Agriculture",
"AM140": "Industry",
"AM150": "Utilities",
"AM160": "Transport"
}
}
// ... similar definitions for other languages would go here,
// using the same structure.
}
},
"itemProps": { // This defines the properties dataset items should have
"id": { // A unique identifier. Mandatory for the dataset to be usable.
"type": "value", // Single valued
"from": "Identifier", // Which CSV field to read it from
"strict": true // Don't tolerate nulls or blanks.
// Format is "string" by default.
},
"name": { // The name of the item
"type": "value",
"search": true, // we want this to be text-searchable
"from": "Name"
},
"manlat": {
"type": "value",
"as": "number", // Convert these values into numbers when loading CSV
"nullable": true, // Allow nulls
"from": "Latitude"
},
"manlng": {
"type": "value",
"as": "number",
"nullable": true,
"from": "Longitude"
},
"lat": {
"type": "value",
"as": "number",
"nullable": true,
"from": "Geocoded Latitude"
},
"lng": {
"type": "value",
"as": "number",
"nullable": true,
"from": "Geocoded Longitude"
},
"address": {
"type": "value",
"filter": true,
"from": "Address"
},
"activity": {
"type": "vocab", // A single-valued identifier from a vocabulary
"uri": "am:", // Specifically, this vocabulary
"from": "Activity"
},
"otherActivities": {
"type": "multi", // Multi-valued
"of": {
"type": "vocab", // Values are identifiers from a vocabulary
"uri": "am:" // Specifically, this one
},
"from": "Other Activities" // When interpreting CSV, by default a
// semi-colon is used as a delimiter.
// Literal semi-colons need to be escaped
// with a backslash.
},
"websites": {
"type": "multi", // Multi-valued
"of": {
"type": "value" // Values are strings
},
"from": "Websites"
},
"validated": {
"type": "value", // Single valued
"as": "boolean", // Convert values to booleans, or null if they're too "weird".
// Non-weird means: "true", "yes", "y", "t" or "1" count as true,
// and "false", "no", "n", "f" or "0" as false. (Ignoring case
// in both cases.)
"from": "Validated"
}
}
}
Given these files as inputs, data.csv
and config.json
, and assuming:
- these files are in
/tmp/
, which is writable - there is no dierctory
/tmp/out
existing - the current directory is
apps/back-end/
...we can then run the following to generate a dataset in /tmp/out/
:
npm run dataset import /tmp/config.json /tmp/data.csv /tmp/out
We should then get a dataset written to /tmp/out/
.
Listing that would get:
$ find /tmp/out -type f
/tmp/out/config.json
/tmp/out/items
/tmp/out/items/0.json
/tmp/out/items/1.json
/tmp/out/items/2.json
/tmp/out/items/3.json
/tmp/out/locations.json
/tmp/out/searchable.json
...where config.json
should be identical to the above. The other files' contents would be as below....
locations.json
#
[[-3.65478,51.60844],[-3.17331,55.9647],[-1.61089,54.97447],null]
searchable.json
#
{ "itemProps":
["name","address","searchString"],
"values":[
["Apple Co-op","1 Apple Way, Appleton","apple co op"],
["Banana Co","1 Banana Boulevard, Skinningdale","banana co"],
["The Cabbage Collective","2 Cabbage Close, Caulfield","the cabbage collective"],
["The Chateau","","the chateau"]
]}
items/0.json
#
{
"id": "aaa",
"name": "Apple Co-op",
"manlat": 0,
"manlng": 0,
"lat": 51.60844,
"lng": -3.65478,
"address": "1 Apple Way, Appleton",
"activity": "AM130",
"otherActivities": [],
"websites": [
"http://apple.coop"
],
"validated": true
}
items/1.json
#
{
"id": "bbb",
"name": "Banana Co",
"manlat": 0,
"manlng": 0,
"lat": 55.9647,
"lng": -3.17331,
"address": "1 Banana Boulevard, Skinningdale",
"activity": "AM60",
"otherActivities": [
"AM70",
"AM120"
],
"websites": [
"http://banana.com"
],
"validated": false
}
items/2.json
#
{
"id": "ccc",
"name": "The Cabbage Collective",
"manlat": null,
"manlng": null,
"lat": 54.97447,
"lng": -1.61089,
"address": "2 Cabbage Close, Caulfield",
"activity": "AM60",
"otherActivities": [
"AM130"
],
"websites": [
"http://cabbage.coop",
"http://cabbage.com"
],
"validated": true
}
items/3.json
#
{
"id": "ddd",
"name": "The Chateau",
"manlat": null,
"manlng": null,
"lat": null,
"lng": null,
"address": "",
"activity": null,
"otherActivities": [],
"websites": [],
"validated": false
}
More details#
Here we go into more detail about the dataset files and their structure.
Datasets#
At the top level below <dataroot>
, which is whatever
SERVER_DATA_ROOT
has been configured with, there are zero or more
directories: one for each dataset. All the data for a dataset is
contained within its dataset directory.
The names of the directories are intepreted as the dataset IDs, URI-component encoded to ensure that they are well-formed directory names: characters like "/" are encoded as "%2F", for instance.
[!NOTE] The data inside a dataset directory does not refer to its own ID, so the directory can be renamed freely without breaking the data structure within.
Inside these top-level directories are a number of mandatory data
files, and a sub-directory named items
which contains a data file
for each of the dataset's items.
about.md
(mandatory)#
This file contains a free-form human-readable description of the dataset in Markdown format.
It will be shown in the "about" panel of the map when this dataset is selected. You can put anything you like in here, but you should keep this viewing context in mind.
[!WARNING] Currently there is no localisation support for this file, this should be added in future.
config.json
(mandatory)#
This is a JSON data file that describes:
- the properties shared by dataset item, and the mapping to this from CSV columns,
- vocabularies and localisations thereof used in the data (AKA taxonomies),
- and map configuration details.
The schema and semantics of this file is more complex than the other
files. The formal specification is defined by the ConfigData
type
contract in contract.ts
- refer to that for the full
details of nested properties.
Here we list the top-level elements and their purpose.
[!WARNING] Although correct at the time of writing, this is subject to change.
prefixes
: An object mapping SKOS vocabulary URIs to the abbreviations used in [QNames][qnames] ("qualified names") in the rest of the file. QNames are a mechanism used in both RDF/XML and Turtle file formats for encoding RDF linked datalanguages
: An array of two-character ISO-639-1 codes. It enumerates the languages that are supported by the vocabularies, defined in thevocabs
property. There should be at least one element. The first element is assumed to be the default language to use.ui
: A collection of user-interface settings for the map when showing this dataset. For example: the side panel to show by default. These are likely to change, and not strictly necessary for generating a dataset.-
itemProps
: An object defining the properties required for each data item. The keys define the property IDs, and the values their definitions, which are sub-objects. A non-exhaustive list of the most important properties of dataset-item property definitions are: -
type
: Required, indicates the basic property type, and can be one of the following:"value"
, this property represents a single literal strings;"vocab"
, this property represents a single identifier from a specific vocabulary defined in this file;"multi"
, this property represents a multi-valued property, containing instances of eithervocab
orvalue
.
of
: Required for"multi"
properties - defines the type of the multiple values. Can be"value"
or"vocab"
.filter
: Optional, indicates if the property should be filterable; boolean, defaults to false.search
: Optional, indicates if the property should be text-searchable; boolean, defaults to false.from
: Optional, used when importing datasets from CSV. It indicates the name of a CSV column header to use as the source for this field. If absent, this field will not get populated by the importer.nullable
: Optional, indicates whether null values are permitted in the data without triggering an error. (Note that for CSV data, an empty field counts as null.) Boolean, defaults to true.strict
: Optional, indicates whether to be sloppy or strict when interpreting CSV data.-
as
: Optional, hints on the interpretation of data when converting from CSV. Can be one of:"string"
: interpret as a string."boolean"
: interpret as a boolean."number"
: interpret as a number.
-
vocabs
: An object defining the vocabularies used in the vocab properties of dataset items. Keys are vocabulary abbreviations as defined inprefixes
. They map to a vocabulary definition for each language - keyed by a language code defined inlanguages
. Each vocabulary definition is in the context of that language, and has a title for the vocabulary and a mapping from term IDs to term labels for that language. All this is to allow identifiers representing vocabulary terms (typically IDs, [QNames][qnames], or full URIs) into localised labels. This useful both for interpreting dataset item properties, but also localisation of the user interface.
locations.json
(mandatory)#
This is a JSON data file that contains an array of locations, one for each item in the dataset.
The order of this array is significant. The Nth item in the locations array represents the location of the Nth item in the dataset. There must be one item in the array for each item in the dataset.
Each location is represented by an array of two floating point coordinates:
[<longitude>, <latitude>],
Or, if there is no location for that item, just a null as a placeholder:
null,
[!NOTE] Typically, the coordinates stored in the array only use just enough decimal places to identify a street address on the map uniquely throughout the world and no more: five. This is avoid the file from becoming unnecessarily big; JSON floating point representations can be quite long.
For similar reasons, all white-space and newlines are omitted. This will make human inspection of very large files somewhat awkward but this is considered worth the inconvenience. If necessary the file can be piped through a tool like
jq
orsed
to unpack it for inspection.
searchable.json
#
This is a JSON data file that contains an index of the searchable properties of the dataset items.
The formal specification of this file, such that it exists, is defined in the source code currently.
In brief: it is a JSON object with two properties:
itemProps
, a list of property IDs, andvalues
, a list of property value arrays.
Like this:
{
"itemProps": [
...
],
"values": [
...
]
}
[!INFO] Note, this example is indented for readability, but the actual file is somewhat minimised and omits superfluous white-space.
itemProps
is inserted on the first line, and subsequent lines define items in the values array, one per line in the predefined order.
itemProps
#
itemProps
defines how to interpret the value arrays. For example:
"itemProps": [
"country_id",
"primary_activity",
"typology",
"data_sources",
"searchString"
],
...
The elements refer to the property IDs defined in config.json
.
In this case, this indicates that there are four properties which can be searched by drop-down filter.
The fifth element searchString
is special: it's not a real property
of the dataset item, but synthesised from them; and is always present
(if only as an empty string). It represents the text-searchable
content of the dataset item, normalised to punctuation-free lower case
words.
values
#
The values
array must contain one element for each of the dataset
items.
The order of the values
array is significant: the Nth item of the
values
array corresponds to the Nth dataset item.
Each element of values
is an array of property values the same size
as the itemProps
array. For instance:
...
"values": [
[
"GB",
"ICA210",
"BMT20",
["DC", "CUK"],
"housing 1 west street sheffield s1 2ab uk apples coop"
],
...
]
In this example, the first dataset item has a country_id
of "GB"
,
a primary_activity
of "ICA212"
, and so on.
The searchString
text is the last element.
searchString
construction#
To illustrate where searchString
comes from, consider an example
case where text-searchable properties have been configured in
config.json
to be:
activities
,address
andname
.
These must be valid dataset item properties of course, but don't
necessarily need to be properties listed in itemProps
.
Let's say the values of those properties are:
"ICA210"
- an activity ID which would be expanded to "Housing" in English;"1 West Street, Sheffield S1 2AB, UK"
(an address), and"Apples Co-op"
(the organisation name).
The searchString
value will be those values concatenated, with
punctuation removed and text down-cased, like this:
housing 1 west street sheffield s1 2ab uk apples coop
This means that a search for "housing" or "west street" will match.
[!WARNING] This illustration assumes vocab term ID are expanded into English, which a) may not be implemented yet, and b) obviously won't support multi-lingual text searches. This may be addressed in future. Also, the type of matching for multiple words may be adjusted.
items/
#
This directory contains one file for each dataset item. The files are
named according to the item index, with a .json
suffix. Their base
names are decimal integers, starting with zero. Therefore 0.json
is
the file for the first dataset item, 1.json
the second, and so on.
Each file contains a JSON object mapping all of the defined dataset item property IDs to values.
There must be an id
property, which is the ID of the item, whose
format is determined by the dataset - it must be unique to the
dataset.
There should be a name
property, which is a short human-readable
name for the item. This is not mandated, but is advisable.
And there should be latitude
and longitude
properties, which are
decimal latitude/longitude coordinates of the item, if it has one. If
absent, then the item will be included in search results, but will not
have a pin on the map.
These and all other fields must be as defined by the itemProps
field
in config.json
.
Absent values should be null
, or []
if the property is a
multi-valued property.
Vocabulary properties should be a bare valid vocab term identifier, without any prefix or URI.
For example:
{
"id": "test/cuk/R000001",
"name": "Apples Co-op",
"description": "We sell apples",
"website": ["https://apples.coop", "https://orchards.coop"],
"dc_domains": ["apples.coop", "orchards.coop"],
"country_id": "GB",
"primary_activity": "ICA210",
"organisational_structure": "OS60",
"typology": "BMT20",
"latitude": 51.507476,
"longitude": -0.127825,
"geocontainer_lat": 51.50747643,
"geocontainer_lon": -0.12782464,
"address": "1 West Street, Sheffield, S1 2AB, UK",
"data_sources": ["DC", "CUK"]
}
Glossary#
- category: when used here, loosely means the same as vocabulary.
- CSV: "comma-separated values" - a common format for storing tabular data in text files. See Wikipedia. Or CSV files begin with a line which contain the headers which define names for the fields found in the following rows. Note that a CSV row may span more than one line, as it can contain embedded newlines.
- data root: the root directory in which to find Mykomap dataset files.
- dataset: a collection of information for use in a Mykomap; typically consists of an ordered collection of items with some metadata describing the properties and types of those properties.
- dataset ID: a unique identifier for a dataset.
- dataset item: represents a single entry in a dataset, typically representing an organisation, usually with an associated location and various other properties.
- dataset item ID: a unique identifier for a dataset item.
- dataset item index: a unique zero-based index number for a dataset item.
- dataset item property: a named attribute of a dataset item, which can have one of a number of pre-defined values and types.
- field: one of the components of a data record, such as in a CSV file or a database table. Often used to refer to a JSON or JavaScript property. We try to consistently use it to mean elements of a CSV row in the context of Mykomap.
- property: one of the components of a JSON or JavaScript object; sometimes also referred to as a field or an attribute. We try and refer consistently to properties in the context of Mykomap.
- QName: a scheme for representing URIs in a more succinct form within contexts where brevity and readability is useful. See Wikipedia for more information.
- RDF: "Resource Description Format": a formal scheme for representing semantic descriptions of concepts and their relationships as subject-predicate-object triplets in a machine-readable way. See Wikipedia.
- RDF/XML: a file encoding for RDF based on XML. See Wikipedia.
- SKOS: a linked-data standard for describing a collection of terms, representing concepts, identified by a URI. Terms are also identified by URIs sharing the same stem as the vocabulary's URI, followed by a unique identifier. The vocabulary has a localised title, and each term can have one or more localised labels. More properties can be ascribed to terms and the vocabulary, but see the definition on Wikipedia for more details.
- taxonomy: when used here, loosely means the same as vocabulary.
- Turtle: a more succinct and readable file encoding for RDF than RDF/XML. See Wikipedia.
- vocabulary: a set of pre-defined labels for some set of concepts, in this context usually a SKOS vocabulary.
- vocabulary term: a concept from a vocabulary.
- vocabulary URI: a unique identifier and namespace for a vocabulary.
- vocabulary prefix: a short locally-defined identifier representing a full URI for the sake of brevity - the first half of a QName