Package org.jgroups.protocols
Class FD
- java.lang.Object
-
- org.jgroups.stack.Protocol
-
- org.jgroups.protocols.FD
-
public class FD extends Protocol
Failure detection based on simple heartbeat protocol. Regularly polls members for liveness. Multicasts SUSPECT messages when a member is not reachable. The simple algorithms works as follows: the membership is known and ordered. Each HB protocol periodically sends an 'are-you-alive' message to its *neighbor*. A neighbor is the next in rank in the membership list, which is recomputed upon a view change. When a response hasn't been received for n milliseconds and m tries, the corresponding member is suspected (and eventually excluded if faulty).FD starts when it detects (in a view change notification) that there are at least 2 members in the group. It stops running when the membership drops below 2.
When a message is received from the monitored neighbor member, it causes the pinger thread to 'skip' sending the next are-you-alive message. Thus, traffic is reduced.
- Author:
- Bela Ban
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
FD.Broadcaster
Task that periodically broadcasts a list of suspected members to the group.protected class
FD.BroadcastTask
static class
FD.FdHeader
protected class
FD.HeartbeatSender
class
FD.TimeoutChecker
Task which periodically checks of the last_ack from ping_dest exceeded timeout and - if yes - broadcasts a SUSPECT message
-
Field Summary
Fields Modifier and Type Field Description protected FD.Broadcaster
bcast_task
Transmits SUSPECT message until view change or UNSUSPECT is receivedprotected java.util.concurrent.Future<?>
heartbeat_sender_future
protected long
last_ack
protected Address
local_addr
protected java.util.concurrent.locks.Lock
lock
protected int
max_tries
protected java.util.List<Address>
members
protected int
num_heartbeats
protected int
num_suspect_events
protected java.util.concurrent.atomic.AtomicInteger
num_tries
protected Address
ping_dest
protected java.util.List<Address>
pingable_mbrs
Members from which we select ping_dest.protected BoundedList<java.lang.String>
suspect_history
protected long
timeout
protected java.util.concurrent.Future<?>
timeout_checker_future
protected TimeScheduler
timer
-
Fields inherited from class org.jgroups.stack.Protocol
after_creation_hook, down_prot, ergonomics, id, log, stack, stats, up_prot
-
-
Constructor Summary
Constructors Constructor Description FD()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
computePingDest(Address remove)
Computes pingable_mbrs (based on the current membership and the suspected members) and ping_destjava.lang.Object
down(Event evt)
An event is to be sent down the stack.int
getCurrentNumTries()
java.lang.String
getLocalAddress()
int
getMaxTries()
java.lang.String
getMembers()
int
getNumberOfHeartbeatsSent()
int
getNumSuspectEventsGenerated()
java.lang.String
getPingableMembers()
java.lang.String
getPingDest()
protected Address
getPingDest(java.util.List<Address> mbrs)
long
getTimeout()
boolean
isMonitorRunning()
java.lang.String
printSuspectHistory()
void
resetStats()
protected void
sendHeartbeatResponse(Address dest)
void
setMaxTries(int max_tries)
void
setTimeout(long timeout)
void
start()
This method is called on aJChannel.connect(String)
.void
startFailureDetection()
protected void
startMonitor()
Requires lock to held by callervoid
stop()
This method is called on aJChannel.disconnect()
.void
stopFailureDetection()
protected void
stopMonitor()
Requires lock to be held by callerprotected void
unsuspect(Address mbr)
java.lang.Object
up(Message msg)
A single message was received.void
up(MessageBatch batch)
Sends up a multiple messages in aMessageBatch
.protected void
updateTimestamp(Address sender)
-
Methods inherited from class org.jgroups.stack.Protocol
accept, afterCreationHook, destroy, down, enableStats, getConfigurableObjects, getDownProtocol, getDownServices, getId, getIdsAbove, getLevel, getLog, getName, getProtocolStack, getSocketFactory, getThreadFactory, getTransport, getUpProtocol, getUpServices, getValue, init, isErgonomics, level, parse, providedDownServices, providedUpServices, requiredDownServices, requiredUpServices, resetStatistics, setDownProtocol, setErgonomics, setId, setLevel, setProtocolStack, setSocketFactory, setUpProtocol, setValue, statsEnabled, up
-
-
-
-
Field Detail
-
timeout
protected long timeout
-
max_tries
protected int max_tries
-
num_heartbeats
protected int num_heartbeats
-
num_suspect_events
protected int num_suspect_events
-
suspect_history
protected final BoundedList<java.lang.String> suspect_history
-
local_addr
protected Address local_addr
-
last_ack
protected volatile long last_ack
-
num_tries
protected final java.util.concurrent.atomic.AtomicInteger num_tries
-
lock
protected final java.util.concurrent.locks.Lock lock
-
ping_dest
protected volatile Address ping_dest
-
members
protected final java.util.List<Address> members
-
pingable_mbrs
protected final java.util.List<Address> pingable_mbrs
Members from which we select ping_dest. Copy ofmembers
minus the suspected members
-
timer
protected TimeScheduler timer
-
timeout_checker_future
protected java.util.concurrent.Future<?> timeout_checker_future
-
heartbeat_sender_future
protected java.util.concurrent.Future<?> heartbeat_sender_future
-
bcast_task
protected final FD.Broadcaster bcast_task
Transmits SUSPECT message until view change or UNSUSPECT is received
-
-
Method Detail
-
getLocalAddress
public java.lang.String getLocalAddress()
-
getMembers
public java.lang.String getMembers()
-
getPingableMembers
public java.lang.String getPingableMembers()
-
getPingDest
public java.lang.String getPingDest()
-
getNumberOfHeartbeatsSent
public int getNumberOfHeartbeatsSent()
-
getNumSuspectEventsGenerated
public int getNumSuspectEventsGenerated()
-
getTimeout
public long getTimeout()
-
setTimeout
public void setTimeout(long timeout)
-
getMaxTries
public int getMaxTries()
-
setMaxTries
public void setMaxTries(int max_tries)
-
getCurrentNumTries
public int getCurrentNumTries()
-
printSuspectHistory
public java.lang.String printSuspectHistory()
-
resetStats
public void resetStats()
- Overrides:
resetStats
in classProtocol
-
start
public void start() throws java.lang.Exception
Description copied from class:Protocol
This method is called on aJChannel.connect(String)
. Starts work. Protocols are connected and queues are ready to receive events. Will be called from bottom to top. This call will replace the START and START_OK events.- Overrides:
start
in classProtocol
- Throws:
java.lang.Exception
- Thrown if protocol cannot be started successfully. This will cause the ProtocolStack to fail, soJChannel.connect(String)
will throw an exception
-
stop
public void stop()
Description copied from class:Protocol
This method is called on aJChannel.disconnect()
. Stops work (e.g. by closing multicast socket). Will be called from top to bottom. This means that at the time of the method invocation the neighbor protocol below is still working. This method will replace the STOP, STOP_OK, CLEANUP and CLEANUP_OK events. The ProtocolStack guarantees that when this method is called all messages in the down queue will have been flushed
-
stopFailureDetection
public void stopFailureDetection()
-
startFailureDetection
public void startFailureDetection()
-
startMonitor
protected void startMonitor()
Requires lock to held by caller
-
stopMonitor
protected void stopMonitor()
Requires lock to be held by caller
-
isMonitorRunning
public boolean isMonitorRunning()
-
up
public java.lang.Object up(Message msg)
Description copied from class:Protocol
A single message was received. Protocols may examine the message and do something (e.g. add a header) with it before passing it up.
-
up
public void up(MessageBatch batch)
Description copied from class:Protocol
Sends up a multiple messages in aMessageBatch
. The sender of the batch is always the same, and so is the destination (null == multicast messages). Messages in a batch can be OOB messages, regular messages, or mixed messages, although the transport itself will create initial MessageBatches that contain only either OOB or regular messages. The default processing below sends messages up the stack individually, based on a matching criteria (callingProtocol.accept(org.jgroups.Message)
), and - if true - callsProtocol.up(org.jgroups.Event)
for that message and removes the message. If the batch is not empty, it is passed up, or else it is dropped. Subclasses should check if there are any messages destined for them (e.g. usingMessageBatch.getMatchingMessages(short,boolean)
), then possibly remove and process them and finally pass the batch up to the next protocol. Protocols can also modify messages in place, e.g. ENCRYPT could decrypt all encrypted messages in the batch, not remove them, and pass the batch up when done.
-
down
public java.lang.Object down(Event evt)
Description copied from class:Protocol
An event is to be sent down the stack. A protocol may want to examine its type and perform some action on it, depending on the event's type. If the event is a message MSG, then the protocol may need to add a header to it (or do nothing at all) before sending it down the stack usingdown_prot.down()
.
-
sendHeartbeatResponse
protected void sendHeartbeatResponse(Address dest)
-
unsuspect
protected void unsuspect(Address mbr)
-
updateTimestamp
protected void updateTimestamp(Address sender)
-
computePingDest
protected void computePingDest(Address remove)
Computes pingable_mbrs (based on the current membership and the suspected members) and ping_dest- Parameters:
remove
- The member to be removed from pingable_mbrs
-
-