CTDB Not Starting Up on CentOS 7

While preparing for my Red Hat Gluster Storage Administration exam (EX236), I got stuck at a section of configuring IP failover with CTDB Clustered Trivial Database) for Samba. The problem is that I coudldn’t get the ctdb service running on my home lab running CentOS 7 and a newer version of ctdb.

This problem occurs on the following platform and package version:

CentOS Linux release 7.7.1908 (Core)
ctdb 4.9.1-6.el7

# systemctl status ctdb
● ctdb.service - CTDB
   Loaded: loaded (/usr/lib/systemd/system/ctdb.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2019-10-13 23:17:46 AEDT; 3min 7s ago
     Docs: man:ctdbd(1)
           man:ctdb(7)
  Process: 1335 ExecStart=/usr/sbin/ctdbd_wrapper start (code=exited, status=1/FAILURE)

Oct 13 23:17:35 serverb.rh236.local systemd[1]: Starting CTDB...
Oct 13 23:17:35 serverb.rh236.local ctdbd[1348]: CTDB logging to location file:/var/log/log.ctdb
Oct 13 23:17:46 serverb.rh236.local ctdbd_wrapper[1335]: Timed out waiting for initialisation - check logs
Oct 13 23:17:46 serverb.rh236.local systemd[1]: ctdb.service: control process exited, code=exited status=1
Oct 13 23:17:46 serverb.rh236.local systemd[1]: Failed to start CTDB.
Oct 13 23:17:46 serverb.rh236.local systemd[1]: Unit ctdb.service entered failed state.
Oct 13 23:17:46 serverb.rh236.local systemd[1]: ctdb.service failed.

Here is the congent from /var/log/log.ctdb log:

# tail -f /var/log/log.ctdb
2019/10/13 23:31:36.401325 ctdbd[2049]: CTDB starting on node
2019/10/13 23:31:36.401345 ctdbd[2049]: Recovery lock not set
2019/10/13 23:31:36.416041 ctdbd[2050]: Starting CTDBD (Version 4.9.1) as PID: 2050
2019/10/13 23:31:36.416185 ctdbd[2050]: Created PID file /var/run/ctdb/ctdbd.pid
2019/10/13 23:31:36.416211 ctdbd[2050]: Removed stale socket /var/run/ctdb/ctdbd.socket
2019/10/13 23:31:36.416235 ctdbd[2050]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
2019/10/13 23:31:36.416247 ctdbd[2050]: Set real-time scheduler priority
2019/10/13 23:31:36.416414 ctdbd[2050]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 2050 -S 14
2019/10/13 23:31:36.419611 ctdbd[2050]: Set runstate to INIT (1)
2019/10/13 23:31:36.420102 ctdbd[2050]: ctdb exiting with error: Failed to run init event
2019/10/13 23:31:36.420119 ctdbd[2050]:
2019/10/13 23:31:36.420126 ctdbd[2050]: CTDB daemon shutting down
2019/10/13 23:31:42.427208 ctdb-eventd[2053]: PID 2050 gone away, exiting

I’ve spent half a day to figure out why ctdb service couldn’t be started up. It was a bit strange because I was able to get it working for older version of ctdb on RHEL 7.2 using the ROL lab environment.

Just as when I was about to give up and move on with the remaining of my revision, I decided to google for “ctdb exiting with error: Failed to run init event”. I hig a jackpot!

Someone, by the name of Torbjorn Jansson, has filed a bug for the exact the same problem I’ve been facing here (Bug 1656777).

Have a read on the bug report above. It has detailed explanation of what the issues were and the process to figure the fixes.

Anyway, here the solution provided from that bug report:

First create the 2 missing directories:

# mkdir -p /etc/ctdb/events/legacy
# mkdir -p /var/lib/ctdb/state

Then just enable and start the service:

# systemctl enable ctdb --now

Though the bug was reported for ctdb 4.9.3 on Fedora 29 for armv7hl architecture, the suggested solution can also be used to fix ctdb on amd64 architecture on CentOS 7.

Update 1 (2019-10-14):

There are 2 other directories need to be created for this version of ctdb:

# mkdir /var/lib/ctdb/{persistent,volatile}
# systemctl restart ctdb

Reference and credit to:

Bug 1656777 - packaging issues with ctdb (missing folders at least