Class FD_SOCK
- java.lang.Object
-
- org.jgroups.stack.Protocol
-
- org.jgroups.protocols.FD_SOCK
-
- All Implemented Interfaces:
java.lang.Runnable
public class FD_SOCK extends Protocol implements java.lang.Runnable
Failure detection protocol based on sockets. Failure detection is ring-based. Each member creates a server socket and announces its address together with the server socket's address in a multicast.A pinger thread will be started when the membership goes above 1 and will be stopped when it drops below 2. The pinger thread connects to its neighbor on the right and waits until the socket is closed. When the socket is closed by the monitored peer in an abnormal fashion (IOException), the neighbor will be suspected.
The main feature of this protocol is that no ping messages need to be exchanged between any 2 peers, as failure detection relies entirely on TCP sockets. The advantage is that no activity will take place between 2 peers as long as they are alive (i.e. have their server sockets open). The disadvantage is that hung servers or crashed routers will not cause sockets to be closed, therefore they won't be detected.
The costs involved are 2 additional threads: one that monitors the client side of the socket connection (to monitor a peer) and another one that manages the server socket. However, those threads will be idle as long as both peers are running.
- Author:
- Bela Ban May 29 2001
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
FD_SOCK.BroadcastTask
Task that periodically broadcasts a list of suspected members to the group.protected static class
FD_SOCK.ClientConnectionHandler
Handles a client connection; multiple client can connect at the same timestatic class
FD_SOCK.FdHeader
protected class
FD_SOCK.ServerSocketHandler
Handles the server-side of a client-server socket connection.
-
Field Summary
Fields Modifier and Type Field Description protected static int
ABNORMAL_TERMINATION
protected FD_SOCK.BroadcastTask
bcast_task
protected java.net.InetAddress
bind_addr
protected LazyRemovalCache<Address,IpAddress>
cache
Cache of member addresses and their ServerSocket addressesprotected long
cache_max_age
protected int
cache_max_elements
protected int
client_bind_port
protected java.net.InetAddress
external_addr
protected int
external_port
protected Promise<java.util.Map<Address,IpAddress>>
get_cache_promise
Used to rendezvous on GET_CACHE and GET_CACHE_RSPprotected long
get_cache_timeout
protected boolean
got_cache_from_coord
protected boolean
keep_alive
protected Address
local_addr
protected java.util.concurrent.locks.Lock
lock
protected boolean
log_suspected_msgs
protected java.util.List<Address>
members
protected static int
NORMAL_TERMINATION
protected int
num_suspect_events
protected int
num_tries
protected Promise<IpAddress>
ping_addr_promise
protected Address
ping_dest
protected java.io.InputStream
ping_input
protected java.net.Socket
ping_sock
protected java.util.List<Address>
pingable_mbrs
protected java.lang.Thread
pinger_thread
protected int
port_range
protected boolean
regular_sock_close
protected boolean
shuttin_down
protected int
sock_conn_timeout
protected java.net.ServerSocket
srv_sock
protected IpAddress
srv_sock_addr
protected FD_SOCK.ServerSocketHandler
srv_sock_handler
protected boolean
srv_sock_sent
protected int
start_port
protected BoundedList<java.lang.String>
suspect_history
protected long
suspect_msg_interval
protected java.util.Set<Address>
suspected_mbrs
protected TimeScheduler
timer
-
Fields inherited from class org.jgroups.stack.Protocol
after_creation_hook, down_prot, ergonomics, id, log, stack, stats, up_prot
-
-
Constructor Summary
Constructors Constructor Description FD_SOCK()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
broadcastSuspectMessage(Address suspected_mbr)
Sends a SUSPECT message to all group members.protected void
broadcastUnuspectMessage(Address mbr)
protected Address
determineCoordinator()
protected Address
determinePingDest()
java.lang.Object
down(Event evt)
An event is to be sent down the stack.protected IpAddress
fetchPingAddress(Address mbr)
Attempts to obtain the ping_addr first from the cache, then by unicasting q request tombr
, then by multicasting a request to all members.protected void
getCacheFromCoordinator()
Determines coordinator C.int
getClientBindPortActual()
java.lang.String
getLocalAddress()
java.lang.String
getMembers()
int
getNumSuspectedMembers()
int
getNumSuspectEventsGenerated()
java.lang.String
getPingableMembers()
java.lang.String
getPingDest()
java.lang.String
getSuspectedMembers()
protected void
handleSocketClose(java.lang.Exception ex)
protected boolean
hasPingableMembers()
void
init()
Called after instance has been created (null constructor) and before protocol is started.protected void
interruptPingerThread(boolean sendTerminationSignal)
Interrupts the pinger thread.boolean
isLogSuspectedMessages()
boolean
isNodeCrashMonitorRunning()
protected boolean
isPingerThreadRunning()
static Buffer
marshal(LazyRemovalCache<Address,IpAddress> addrs)
java.lang.String
printCache()
protected java.lang.String
printPingableMembers()
java.lang.String
printSuspectHistory()
protected boolean
removeFromPingableMembers(Address mbr)
protected void
resetPingableMembers(java.util.Collection<Address> new_mbrs)
void
resetStats()
void
run()
Runs as long as there are 2 members and more.protected void
sendIHaveSockMessage(Address dst, Address mbr, IpAddress addr)
Sends or broadcasts a I_HAVE_SOCK response.protected void
sendPingSignal(int signal)
protected void
sendPingTermination()
void
setLogSuspectedMessages(boolean log_suspected_msgs)
protected boolean
setupPingSocket(IpAddress dest)
Creates a socket todest
, and assigns it to ping_sock.protected static java.lang.String
signalToString(int signal)
void
start()
This method is called on aJChannel.connect(String)
.boolean
startNodeCrashMonitor()
protected boolean
startPingerThread()
Does *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquiredprotected void
startServerSocket()
void
stop()
This method is called on aJChannel.disconnect()
.protected void
stopPingerThread()
void
stopServerSocket(boolean graceful)
protected void
suspect(java.util.Set<Address> suspects)
protected void
teardownPingSocket()
protected java.util.Map<Address,IpAddress>
unmarshal(byte[] buffer, int offset, int length)
protected void
unsuspect(Address mbr)
java.lang.Object
up(Event evt)
An event was received from the protocol below.java.lang.Object
up(Message msg)
A single message was received.-
Methods inherited from class org.jgroups.stack.Protocol
accept, afterCreationHook, destroy, down, enableStats, getConfigurableObjects, getDownProtocol, getDownServices, getId, getIdsAbove, getLevel, getLog, getName, getProtocolStack, getSocketFactory, getThreadFactory, getTransport, getUpProtocol, getUpServices, getValue, isErgonomics, level, parse, providedDownServices, providedUpServices, requiredDownServices, requiredUpServices, resetStatistics, setDownProtocol, setErgonomics, setId, setLevel, setProtocolStack, setSocketFactory, setUpProtocol, setValue, statsEnabled, up
-
-
-
-
Field Detail
-
NORMAL_TERMINATION
protected static final int NORMAL_TERMINATION
- See Also:
- Constant Field Values
-
ABNORMAL_TERMINATION
protected static final int ABNORMAL_TERMINATION
- See Also:
- Constant Field Values
-
bind_addr
protected java.net.InetAddress bind_addr
-
external_addr
protected java.net.InetAddress external_addr
-
external_port
protected int external_port
-
get_cache_timeout
protected long get_cache_timeout
-
cache_max_elements
protected int cache_max_elements
-
cache_max_age
protected long cache_max_age
-
suspect_msg_interval
protected long suspect_msg_interval
-
num_tries
protected int num_tries
-
start_port
protected int start_port
-
client_bind_port
protected int client_bind_port
-
port_range
protected int port_range
-
keep_alive
protected boolean keep_alive
-
sock_conn_timeout
protected int sock_conn_timeout
-
num_suspect_events
protected int num_suspect_events
-
suspect_history
protected final BoundedList<java.lang.String> suspect_history
-
members
protected volatile java.util.List<Address> members
-
suspected_mbrs
protected final java.util.Set<Address> suspected_mbrs
-
pingable_mbrs
protected final java.util.List<Address> pingable_mbrs
-
srv_sock_sent
protected volatile boolean srv_sock_sent
-
get_cache_promise
protected final Promise<java.util.Map<Address,IpAddress>> get_cache_promise
Used to rendezvous on GET_CACHE and GET_CACHE_RSP
-
got_cache_from_coord
protected volatile boolean got_cache_from_coord
-
local_addr
protected Address local_addr
-
srv_sock
protected java.net.ServerSocket srv_sock
-
srv_sock_handler
protected FD_SOCK.ServerSocketHandler srv_sock_handler
-
srv_sock_addr
protected IpAddress srv_sock_addr
-
ping_dest
protected Address ping_dest
-
ping_sock
protected java.net.Socket ping_sock
-
ping_input
protected java.io.InputStream ping_input
-
pinger_thread
protected volatile java.lang.Thread pinger_thread
-
cache
protected LazyRemovalCache<Address,IpAddress> cache
Cache of member addresses and their ServerSocket addresses
-
lock
protected final java.util.concurrent.locks.Lock lock
-
timer
protected TimeScheduler timer
-
bcast_task
protected final FD_SOCK.BroadcastTask bcast_task
-
regular_sock_close
protected volatile boolean regular_sock_close
-
shuttin_down
protected volatile boolean shuttin_down
-
log_suspected_msgs
protected boolean log_suspected_msgs
-
-
Method Detail
-
getLocalAddress
public java.lang.String getLocalAddress()
-
getMembers
public java.lang.String getMembers()
-
getPingableMembers
public java.lang.String getPingableMembers()
-
getSuspectedMembers
public java.lang.String getSuspectedMembers()
-
getNumSuspectedMembers
public int getNumSuspectedMembers()
-
getPingDest
public java.lang.String getPingDest()
-
getNumSuspectEventsGenerated
public int getNumSuspectEventsGenerated()
-
isNodeCrashMonitorRunning
public boolean isNodeCrashMonitorRunning()
-
isLogSuspectedMessages
public boolean isLogSuspectedMessages()
-
setLogSuspectedMessages
public void setLogSuspectedMessages(boolean log_suspected_msgs)
-
getClientBindPortActual
public int getClientBindPortActual()
-
printSuspectHistory
public java.lang.String printSuspectHistory()
-
printCache
public java.lang.String printCache()
-
startNodeCrashMonitor
public boolean startNodeCrashMonitor()
-
init
public void init() throws java.lang.Exception
Description copied from class:Protocol
Called after instance has been created (null constructor) and before protocol is started. Properties are already set. Other protocols are not yet connected and events cannot yet be sent.
-
start
public void start() throws java.lang.Exception
Description copied from class:Protocol
This method is called on aJChannel.connect(String)
. Starts work. Protocols are connected and queues are ready to receive events. Will be called from bottom to top. This call will replace the START and START_OK events.- Overrides:
start
in classProtocol
- Throws:
java.lang.Exception
- Thrown if protocol cannot be started successfully. This will cause the ProtocolStack to fail, soJChannel.connect(String)
will throw an exception
-
stop
public void stop()
Description copied from class:Protocol
This method is called on aJChannel.disconnect()
. Stops work (e.g. by closing multicast socket). Will be called from top to bottom. This means that at the time of the method invocation the neighbor protocol below is still working. This method will replace the STOP, STOP_OK, CLEANUP and CLEANUP_OK events. The ProtocolStack guarantees that when this method is called all messages in the down queue will have been flushed
-
resetStats
public void resetStats()
- Overrides:
resetStats
in classProtocol
-
up
public java.lang.Object up(Event evt)
Description copied from class:Protocol
An event was received from the protocol below. Usually the current protocol will want to examine the event type and - depending on its type - perform some computation (e.g. removing headers from a MSG event type, or updating the internal membership list when receiving a VIEW_CHANGE event). Finally the event is either a) discarded, or b) an event is sent down the stack usingdown_prot.down()
or c) the event (or another event) is sent up the stack usingup_prot.up()
.
-
up
public java.lang.Object up(Message msg)
Description copied from class:Protocol
A single message was received. Protocols may examine the message and do something (e.g. add a header) with it before passing it up.
-
down
public java.lang.Object down(Event evt)
Description copied from class:Protocol
An event is to be sent down the stack. A protocol may want to examine its type and perform some action on it, depending on the event's type. If the event is a message MSG, then the protocol may need to add a header to it (or do nothing at all) before sending it down the stack usingdown_prot.down()
.
-
run
public void run()
Runs as long as there are 2 members and more. Determines the member to be monitored and fetches its server socket address (if n/a, sends a message to obtain it). The creates a client socket and listens on it until the connection breaks. If it breaks, emits a SUSPECT message. It the connection is closed regularly, nothing happens. In both cases, a new member to be monitored will be chosen and monitoring continues (unless there are fewer than 2 members).- Specified by:
run
in interfacejava.lang.Runnable
-
isPingerThreadRunning
protected boolean isPingerThreadRunning()
-
resetPingableMembers
protected void resetPingableMembers(java.util.Collection<Address> new_mbrs)
-
hasPingableMembers
protected boolean hasPingableMembers()
-
removeFromPingableMembers
protected boolean removeFromPingableMembers(Address mbr)
-
printPingableMembers
protected java.lang.String printPingableMembers()
-
suspect
protected void suspect(java.util.Set<Address> suspects)
-
unsuspect
protected void unsuspect(Address mbr)
-
handleSocketClose
protected void handleSocketClose(java.lang.Exception ex)
-
startPingerThread
protected boolean startPingerThread()
Does *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquired
-
interruptPingerThread
protected void interruptPingerThread(boolean sendTerminationSignal)
Interrupts the pinger thread. The Thread.interrupt() method doesn't seem to work under Linux with JDK 1.3.1 (JDK 1.2.2 had no problems here), therefore we close the socket (setSoLinger has to be set !) if we are running under Linux. This should be tested under Windows. (Solaris 8 and JDK 1.3.1 definitely works).Oct 29 2001 (bela): completely removed Thread.interrupt(), but used socket close on all OSs. This makes this code portable and we don't have to check for OSs.
-
stopPingerThread
protected void stopPingerThread()
-
sendPingTermination
protected void sendPingTermination()
-
sendPingSignal
protected void sendPingSignal(int signal)
-
startServerSocket
protected void startServerSocket() throws java.lang.Exception
- Throws:
java.lang.Exception
-
stopServerSocket
public void stopServerSocket(boolean graceful)
-
setupPingSocket
protected boolean setupPingSocket(IpAddress dest)
Creates a socket todest
, and assigns it to ping_sock. Also assigns ping_input
-
teardownPingSocket
protected void teardownPingSocket()
-
getCacheFromCoordinator
protected void getCacheFromCoordinator()
Determines coordinator C. If C is null and we are the first member, return. Else loop: send GET_CACHE message to coordinator and wait for GET_CACHE_RSP response. Loop until valid response has been received.
-
broadcastSuspectMessage
protected void broadcastSuspectMessage(Address suspected_mbr)
Sends a SUSPECT message to all group members. Only the coordinator (or the next member in line if the coord itself is suspected) will react to this message by installing a new view. To overcome the unreliability of the SUSPECT message (it may be lost because we are not above any retransmission layer), the following scheme is used: after sending the SUSPECT message, it is also added to the broadcast task, which will periodically re-send the SUSPECT until a view is received in which the suspected process is not a member anymore. The reason is that - at one point - either the coordinator or another participant taking over for a crashed coordinator, will react to the SUSPECT message and issue a new view, at which point the broadcast task stops.
-
broadcastUnuspectMessage
protected void broadcastUnuspectMessage(Address mbr)
-
sendIHaveSockMessage
protected void sendIHaveSockMessage(Address dst, Address mbr, IpAddress addr)
Sends or broadcasts a I_HAVE_SOCK response. If 'dst' is null, the reponse will be broadcast, otherwise it will be unicast back to the requester
-
fetchPingAddress
protected IpAddress fetchPingAddress(Address mbr)
Attempts to obtain the ping_addr first from the cache, then by unicasting q request tombr
, then by multicasting a request to all members.
-
determinePingDest
protected Address determinePingDest()
-
marshal
public static Buffer marshal(LazyRemovalCache<Address,IpAddress> addrs)
-
unmarshal
protected java.util.Map<Address,IpAddress> unmarshal(byte[] buffer, int offset, int length)
-
determineCoordinator
protected Address determineCoordinator()
-
signalToString
protected static java.lang.String signalToString(int signal)
-
-