Skip to content

In case of server problems#

Hetzner Cloud#

First port of call is probably Hetzner. The login should be in our password store service. Then you need to navigate to Hetzner's Cloud services (Their Robot site is for bare metal.)

Check the graphs to see if the server is under heavy load CPU/IO/Network, has stopped, or is otherwise having problems. There is a dropdown which can switch the view from "Live" to an overview of the last hour, day, month, etc.

  • List of DCC servers: https://console.hetzner.cloud/projects/1450952/servers
  • Prod-2 graphs: https://console.hetzner.cloud/projects/1450952/servers/39640291/graphs
  • Dev-2 graphs: https://console.hetzner.cloud/projects/1450952/servers/36089475/graphs

If it's a load issue, there's a "Rescale" tab with which we can upgrade the server to more CPU, Memory and disk space. This should be quick, but it will require a reboot.

If it's something which has run amok on the server, it may be resolved by a reboot. Use the reset button on the "Power" tab. However, not if the disks are full, and this can't be determined from the Hetzner console directly.

Some clues might be available from the server console, which you can open using the link in the "Actions" dropdown for the server on the graph page above. This is like taking a peep at the physical monitor of the server, if it had one. If things are looking normal you'll just see a log-in prompt. But you might find some system log messages there which give you a clue.

Root console things#

At the time of writing, there is no password access to the root (or in fact any) user accounts on the server, so you can't actually log in via the server console. But you can reset the root password on the "Rescue" tab of the Hetzner console, although this will probably trigger a reboot. However, then you can use that to log into the server console.

Alternatively someone with access can add your SSH public key, if you have one, to /root/.ssh/authorized_keys, and then you should be able to ssh in via root@prod-2.digitalcommons.coop (insert correct hostname as appropriate)

However, when the server is in real trouble, perhaps because it can't cope with the load, typically logging into the console is fraught with problems. In which case using the Hetzner console is better.

What's the CPU / Memory load?#

On the console this can be done by running the command top, which will show an updating table of processes like this:

Tasks: 145 total,   1 running, 144 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.9 us,  5.9 sy,  0.0 ni, 91.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7747.9 total,   5298.0 free,    954.0 used,   1496.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   6476.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                             
   4431 root      20   0   16916  10220   8372 S  11.8   0.1   0:00.04 sshd                                                                                                                                                                
    395 root      19  -1  340468 115548 114352 S   5.9   1.5   0:11.21 systemd-journal                                                                                                                                                     
      1 root      20   0  100908  11816   8416 S   0.0   0.1   0:03.54 systemd                                                                                                                                                             
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd                                                                                                                                                            
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                                                                              
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                                                                          
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq                                                                                                                                                        
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns                                                                                                                                                               
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                                                                                                         
     10 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                                                                        
     11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_                                                                                                                                                     
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace                                                                                                                                                     
     13 root      20   0       0      0      0 S   0.0   0.0   0:00.11 ksoftirqd/0                                                                                                                                                         
     14 root      20   0       0      0      0 I   0.0   0.0   0:00.37 rcu_sched                                                                                                                                                           
     15 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 migration/0       

The processes are sorted by load. You can switch it to sort by memory use by pressing the m key. The ? key will show brief overview of the keys you can press.

To exit, use q.

Disk full?#

You can check the disk's capacity and free space with df -h. At the time of writing, the output looks like this:

root@dev-2:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           775M  936K  774M   1% /run
/dev/sda1        75G   53G   20G  74% /
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      253M  6.1M  246M   3% /boot/efi
tmpfs           775M     0  775M   0% /run/user/0
tmpfs           775M     0  775M   0% /run/user/7001
tmpfs           775M     0  775M   0% /run/user/7003
tmpfs           775M     0  775M   0% /run/user/7002

This tells us that the main disk / is 75% full, with 20G available. The other partitions can mostly be ignored.

If it's near 100% then the server is in trouble. Fixing that requires either increasing the disk size or finding some garbage or otherwise deletable files to remove. If your technical skill is low, using the "Rescale" option mentioned in the section above is probably the safest option. However, typical places to look are in /var/ - safe things to delete will be in directories named tmp temp cache or similar. Avoid deleting logs unless you really have to, but that's another thing which can fill up. E.g.

root@dev-2:~# du -sh /var/*
2.6M    /var/backups
121M    /var/cache
4.0K    /var/crash
40G     /var/lib
4.0K    /var/local
0       /var/lock
4.5G    /var/log
4.0K    /var/mail
4.0K    /var/opt
0       /var/run
28K     /var/spool
14M     /var/tmp
112M    /var/www
root@dev-2:~# du -sh /var/tmp.*
du: cannot access '/var/tmp.*': No such file or directory
root@dev-2:~# du -sh /var/tmp/*
4.0K    /var/tmp/cloud-init
8.0K    /var/tmp/systemd-private-5d313365530846358c648b6ed26b16fd-apache2.service-k3rsZh
8.0K    /var/tmp/systemd-private-5d313365530846358c648b6ed26b16fd-systemd-logind.service-nMa2no
8.0K    /var/tmp/systemd-private-5d313365530846358c648b6ed26b16fd-systemd-resolved.service-bGGS1b
8.0K    /var/tmp/systemd-private-5d313365530846358c648b6ed26b16fd-systemd-timesyncd.service-jAqmK6
14M     /var/tmp/virtuoso
root@dev-2:~# du -sh /var/log/*
4.0K    /var/log/alternatives.log
12K     /var/log/alternatives.log.1
4.0K    /var/log/alternatives.log.10.gz
4.0K    /var/log/alternatives.log.11.gz
4.0K    /var/log/alternatives.log.12.gz
4.0K    /var/log/alternatives.log.2.gz
4.0K    /var/log/alternatives.log.3.gz
[...elided]

If you're really hunting for the culprit, you can get a sorted list of directories and files by size using:

root@dev-2:~# du -cax / | sort -rn | tee filesizes.txt | head
55008892        total
55008892        /
46799932        /var
41895664        /var/lib
33852844        /var/lib/mysql
32256420        /var/lib/mysql/property_boundaries
18362384        /var/lib/mysql/property_boundaries/land_ownership_polygons.ibd
11341836        /var/lib/mysql/property_boundaries/pending_inspire_polygons.ibd
7584452         /var/lib/backups
7584448         /var/lib/backups/property_boundaries.borg

This also writes the list to filesizes.txt which you can inspect with less filesizes.txt (press the 'h' key after running it for help, or run man less for the manual page)

Rebooting from the root console#

You can do that like this:

root@dev-2:~# reboot

There's also a halt command, but the server will stay off if you use that.

Monitoring and restarting the mykomap server#

As the application user...#

Usually devs do this when logged in as the user running the application - in this case, on dev-2 and prod-2, this will be broccoli.

This shows (a simple case of) how the server is rebuilt. For more and more accurate details, see the mykomap-monolith deployment documentation.

root@dev-2:~# su - broccoli               # switch from root to the broccoli user

broccoli@dev-2:~$ cd ~/deploy/data/       # switch to the data directory

broccoli@dev-2:~/deploy/data$ git pull    # pull the latest data (simple case, assumes no branch switching needed)
Already up to date.

broccoli@dev-2:~/deploy/data$ cd ~/gitworking/mykomap-monolith/            # switch to the app directory

broccoli@dev-2:~/gitworking/mykomap-monolith$ git pull
Already up to date.

broccoli@dev-2:~/gitworking/mykomap-monolith$ . ~/gitworking/deploy.env   # load the environment variables needed to rebuild the app and restart the service

broccoli@dev-2:~/gitworking/mykomap-monolith$ systemctl status --user mykomap-backend.service  # Check the app service status
● mykomap-backend.service - Mykomap back-end process manager for broccoli
     Loaded: loaded (/home/broccoli/.config/systemd/user/mykomap-backend.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-11-24 17:48:10 UTC; 6min ago
       Docs: https://github.com/DigitalCommons/mykomap-monolith/
   Main PID: 4056 (npm run start:a)
      Tasks: 35 (limit: 9242)
     Memory: 151.5M
        CPU: 12.869s
     CGroup: /user.slice/user-7002.slice/user@7002.service/app.slice/mykomap-backend.service
             ├─4056 "npm run start:attached" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─4203 sh -c "npm run start"
             ├─4204 "npm run start" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─4242 sh -c "node ./start.js"
             └─4243 node ./start.js

Nov 24 17:54:34 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470874892,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1d","req":{"method":"GET","url":"/dataset/delhi/search?filter%5B0%5D=data_sources%3A>
Nov 24 17:54:34 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470874910,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1d","res":{"statusCode":200},"responseTime":17.900402000173926,"msg":"request comple>
Nov 24 17:54:40 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470880871,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1e","req":{"method":"GET","url":"/dataset/delhi/search?filter%5B0%5D=data_sources%3A>
Nov 24 17:54:40 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470880902,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1e","res":{"statusCode":200},"responseTime":30.778594000265002,"msg":"request comple>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881048,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1f","req":{"method":"GET","url":"/dataset/delhi/search?filter%5B0%5D=data_sources%3A>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881067,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1f","res":{"statusCode":200},"responseTime":19.006068999879062,"msg":"request comple>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881068,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1g","req":{"method":"GET","url":"/dataset/delhi/search?filter%5B0%5D=data_sources%3A>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881107,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1g","res":{"statusCode":200},"responseTime":38.5415049996227,"msg":"request complete>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881209,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1h","req":{"method":"GET","url":"/dataset/delhi/search?filter%5B0%5D=data_sources%3A>
Nov 24 17:54:41 dev-2.digitalcommons.coop bash[4243]: {"level":30,"time":1732470881238,"pid":4243,"hostname":"dev-2.digitalcommons.coop","reqId":"req-1h","res":{"statusCode":200},"responseTime":28.585051000118256,"msg":"request comple>

broccoli@dev-2:~/gitworking/mykomap-monolith$ ./deploy.sh # start the rebuild. should restart the server automatically

[... detailed output elided]

As the root user...#

But this can also be done as the root user, perhaps with more ease, because of one less step. In this example, the server status is checked, then it is restarted, then started (which should do nothing if it already started) then the status is checked again. You can see the process number and memory usage change.

root@dev-2:~# systemctl --machine broccoli@.host --user  status mykomap-backend
● mykomap-backend.service - Mykomap back-end process manager for broccoli
     Loaded: loaded (/home/broccoli/.config/systemd/user/mykomap-backend.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-11-24 17:14:53 UTC; 33min ago
       Docs: https://github.com/DigitalCommons/mykomap-monolith/
   Main PID: 3498
      Tasks: 35 (limit: 9242)
     Memory: 168.9M
        CPU: 8.240s
     CGroup: /user.slice/user-7002.slice/user@7002.service/app.slice/mykomap-backend.service
             ├─3498 "npm run start:attached" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─3643 sh -c "npm run start"
             ├─3644 "npm run start" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─3682 sh -c "node ./start.js"
             └─3683 node ./start.js

root@dev-2:~# systemctl --machine broccoli@.host --user  restart mykomap-backend

root@dev-2:~# systemctl --machine broccoli@.host --user  start mykomap-backend

root@dev-2:~# systemctl --machine broccoli@.host --user  status mykomap-backend
● mykomap-backend.service - Mykomap back-end process manager for broccoli
     Loaded: loaded (/home/broccoli/.config/systemd/user/mykomap-backend.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2024-11-24 17:48:10 UTC; 8s ago
       Docs: https://github.com/DigitalCommons/mykomap-monolith/
   Main PID: 4056
      Tasks: 35 (limit: 9242)
     Memory: 196.5M
        CPU: 2.977s
     CGroup: /user.slice/user-7002.slice/user@7002.service/app.slice/mykomap-backend.service
             ├─4056 "npm run start:attached" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─4203 sh -c "npm run start"
             ├─4204 "npm run start" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─4242 sh -c "node ./start.js"
             └─4243 node ./start.js