Commit Graph

5 Commits

Author SHA1 Message Date
Anthony F. McInerney
a625faaa1b
service watchdog - add systemd watchdog for resiliency (#12188)
* Add systemd watchdog service

* Add systemd watchdog service

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - update docs for python3-systemd

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 30 second restart, 10 second delay between restarts

* systemd-watchdog - safely integrate changes

* systemd-watchdog - safely integrate changes

* systemd-watchdog - revert old doc changes

* systemd-watchdog -  doc typo fix
2021-03-22 10:34:45 -05:00
Adam Bishop
41ed0537b4
Fix midnight poller data loss (#11582)
* Handle more signals

* Flush buffers before exiting process
This ensures log messages aren't lost

* Restart process before jobs have finished
If there is a very log running job it can cause service restart to
take over 5 minutes.

We tweak the order of things to make sure that running processes
continue, but nothing more is scheduled.

The worst case impact is that a pollling/discovery job gets
scheduled twice, but this should not be a big issue - this should
only occur at most once per day.

* Remove python 3.8 feature

* Ensure that processes from the previous invocation are reaped

* Correct typo's

* Attach subprocess descriptors to /dev/null

Occasionally, PHP would throw a fit and crash when its stdout went
away. To avoid this, we attach stdout to devnull.

This means we lost output of daily.sh - but this is already recorded
in $LOGDIR/daily.log

* Don't immediately schedule long running jobs

To avoid the situation where the maintenance reload happens or a sighup,
then a second long running job is immediately started, we wait
(`last_[poll/discovery]_timetaken` * 1.25) seconds before scheduling
any jobs.

* Add `psutil` to requirements

* Add support for "systemctl reload" to the unit files

* Add a fallback for systems that don't have psutil

* Reduce CPU load when psutil is not installed

* Don't avoid double polling by extending the timeout

This shouldn't happen due to locks

* Remove fallback option

* Remove extra variable

* Fix issue introduced during rebase

* Fix issue introduced when fixing issue introduced during rebase

* Make psutil optional
2020-09-29 23:50:40 -05:00
SourceDoctor
f66b16932a
Update requirements.txt (#11600) 2020-05-12 23:11:59 +02:00
Tony Murray
604a200891
Python dispatcher service v2 (#10050)
* Refactor LibreNMS service
add ping

* services ported
remote legacy stats collection

* alerting

* implement unique queues

* update discovery queue manager

* remove message

* more cleanup

* Don't shuffle queue

* clean up imports

* don't try to discover ping only devices

* Fix for discovery not running timer

* Update docs a bit and and add some additional config options.
Intentionally undocumented.

* Wait until the device is marked up by the poller before discovering

* Handle loosing connection to db gracefully

* Attempt to release master after 5 db failures

* Sleep to give other nodes a chance to acquire

* Update docs and rename the doc to Dispatcher Service to more accurately reflect its function.

* add local notification
2019-05-20 11:35:47 -05:00
Tony Murray
0ba76e6d62 New python service for poller, discovery + more (#8455)
Currently has a file handle leak (and will eventually run out of handles) related to the self update process.

Either need to fix that or rip out self-update and leave that up to cron or something.


DO NOT DELETE THIS TEXT

#### Please note

> Please read this information carefully. You can run `./scripts/pre-commit.php` to check your code before submitting.

- [x] Have you followed our [code guidelines?](http://docs.librenms.org/Developing/Code-Guidelines/)

#### Testers

If you would like to test this pull request then please run: `./scripts/github-apply <pr_id>`, i.e `./scripts/github-apply 5926`
2018-06-30 12:19:49 +01:00