After restarting the VM, reading virus databases is a fast operation, it takes about ~35 seconds and is pretty constant if repeated. After some days of runtime "something" happens which makes loading those signatures a very slow operation, up to the point were it takes 15 or even 20 minutes(!) if the VM additionally handles what it normally needs to do as well during daytime. In the night it's a bit faster, maybe half the time, but it's still a lot of minutes vs. without that "something" happened it's always far less than a minute.
My problem is that I don't find what that "something" is that causes those problems. But after that strange event happened, it doesn't only influence signature loading of ClamD, one can only see the problem very good with this scenario, but seems to influence everything that is CPU-bound. I have the feeling like there's some handbrake in effect on the CPUs: Whenever something CPU-bound is in progress, all other processes seem to accumulate as well, putting a very high load on the system, making it slow, up to the point at which one is not able to use simple cursor key navigation in e.g. Midnight Commander (mc) anymore. Restarting Apache Tomcat serving multiple different web applications triggers that effect as well after that "something" happened, restarting takes way more time than before.
Those effects can easily be seen in htop:
That high load is only because of the ClamD process, normally it's not that high, especially as the requests to Tomcat are served pretty fast normally. Once ClamD finishes, the overall load is much lower again. Additionally recognize that ClamD takes >100% CPU, which is normally not the case, because reading signatures is only done by one CPU. The next picture is interesting as well:
After the former requests have been processed by Tomcat, the load on all CPUs drops, ClamD gets back to what looks like normal with ~100%. But it isn't, ClamD takes too long, it was already working for minutes, and the other top processes like htop itself shouldn't create such a high load as well. Without ClamD running it's ~2-3%.
So it seems like things that are only short to process are getting slower, but stay "fast enough", while everything that consumes a lot of CPU, like ClamD or Tomcat, gets very slow and makes other processes slower as well. This can even be seen in the logs of ClamD, it starts reloading fast and becomes slower:
Code: Select all
Tue May 1 11:56:26 2018 -> Reading databases from /var/lib/clamav
Tue May 1 11:57:01 2018 -> Database correctly reloaded (10566159 signatures)
Tue May 1 19:11:07 2018 -> Reading databases from /var/lib/clamav
Tue May 1 19:11:47 2018 -> Database correctly reloaded (10566159 signatures)
Wed May 2 00:51:15 2018 -> Reading databases from /var/lib/clamav
Wed May 2 00:51:53 2018 -> Database correctly reloaded (10578504 signatures)
Wed May 2 03:41:56 2018 -> Reading databases from /var/lib/clamav
Wed May 2 03:42:31 2018 -> Database correctly reloaded (10579770 signatures)
Wed May 2 20:45:32 2018 -> Reading databases from /var/lib/clamav
Wed May 2 20:46:07 2018 -> Database correctly reloaded (10579770 signatures)
Thu May 3 00:52:29 2018 -> Reading databases from /var/lib/clamav
Thu May 3 00:53:08 2018 -> Database correctly reloaded (10584928 signatures)
Thu May 3 03:42:07 2018 -> Reading databases from /var/lib/clamav
Thu May 3 03:42:46 2018 -> Database correctly reloaded (10586235 signatures)
Thu May 3 08:52:18 2018 -> Reading databases from /var/lib/clamav
Thu May 3 08:53:06 2018 -> Database correctly reloaded (10586235 signatures)
Fri May 4 01:00:30 2018 -> Reading databases from /var/lib/clamav
Fri May 4 01:01:53 2018 -> Database correctly reloaded (10586721 signatures)
Fri May 4 03:42:43 2018 -> Reading databases from /var/lib/clamav
Fri May 4 03:44:01 2018 -> Database correctly reloaded (10588026 signatures)
[...]
Sat May 5 00:56:17 2018 -> Reading databases from /var/lib/clamav
Sat May 5 00:59:48 2018 -> Database correctly reloaded (10589668 signatures)
Sat May 5 03:47:01 2018 -> Reading databases from /var/lib/clamav
Sat May 5 03:53:47 2018 -> Database correctly reloaded (10590874 signatures)
Sat May 5 13:40:49 2018 -> Reading databases from /var/lib/clamav
Sat May 5 13:56:33 2018 -> Database correctly reloaded (10590874 signatures)
Sun May 6 01:00:20 2018 -> Reading databases from /var/lib/clamav
Sun May 6 01:09:27 2018 -> Database correctly reloaded (10597394 signatures)
Sun May 6 03:51:45 2018 -> Reading databases from /var/lib/clamav
Sun May 6 03:59:11 2018 -> Database correctly reloaded (10598555 signatures)
The VM itself is Ubuntu 16.04, 8 vCPUs, 48 GBs of RAM. The VM-host is Ubuntu 16.04 with 2 Intel(R) Xeon(R) CPU X5675 @ 3.07 GHz with Hyperthreading enabled, so a total of 24 logical CPUs, and 148 GBs of RAM. Normally those are enough resources to serve my apps fast. The hypervisor used is VirtualBox 5.2.10.
Any more ideas how to debug this, what could be the "something" creating the trouble? Thanks!