16
Technical Support / can I disable retry? "Lost connection to node during wsconnect():"
« on: March 14, 2018, 02:17:01 pm »
When I get this error and it attempts a retry, it never actually does connect because the node itself is down or having issues.
How can I "catch" this retry-attempt like an error?
How can I provide an argument to prevent any retry at all?
currently the only way I have found, is to use a `multiprocess.Process` approach to enforce a timeout that overrides the retry attempt by starting a side process to contain the node connect attempt, then breaking that process if it takes too long.
I would rather it just break as soon as it says
as I've yet to see a retry actually accomplish anything.
Using a timeout approach has limitations; the timeout must be long enough to account for latency fluctuation, while being short enough to not eat up too much script time.
From my perspective, it would be advantageous if the pybitshares module, rather than attempting a retry on a down node, throws an exception that I can handle with try/except. Upon exception, I can then specify a new node to attempt... rather than waiting for a timeout to expire.
related pybitshares code:
https://github.com/xeroc/python-bitshares/blob/9250544ca8eadf66de31c7f38fc37294c11f9548/bitsharesapi/websocket.py
beginning line 291
this snippet here is a major source of brittleness:
A better behavior than just waiting and retrying on the same failed node is to switch nodes, THEN retry.
or simply raise the exception and let the user handle the failure: switching nodes; perhaps also blacklisting the old, or ringing a bell:
where the user could command:
would it be possible to import the websocket.WebsocketException from the bitshares module?
something like
but that doesn't work
I've also tried
but none of them work either, don't mind my ignorance here... just throwing things at the wall to see what sticks
I've also tried importing the websocket module and preempting the exception
does not catch
Code: [Select]
Retrying in 2 seconds
Retrying in 4 seconds
Retrying in 6 seconds
etc.How can I "catch" this retry-attempt like an error?
How can I provide an argument to prevent any retry at all?
currently the only way I have found, is to use a `multiprocess.Process` approach to enforce a timeout that overrides the retry attempt by starting a side process to contain the node connect attempt, then breaking that process if it takes too long.
I would rather it just break as soon as it says
Code: [Select]
Retrying in 2 seconds
as I've yet to see a retry actually accomplish anything.
Using a timeout approach has limitations; the timeout must be long enough to account for latency fluctuation, while being short enough to not eat up too much script time.
From my perspective, it would be advantageous if the pybitshares module, rather than attempting a retry on a down node, throws an exception that I can handle with try/except. Upon exception, I can then specify a new node to attempt... rather than waiting for a timeout to expire.
related pybitshares code:
https://github.com/xeroc/python-bitshares/blob/9250544ca8eadf66de31c7f38fc37294c11f9548/bitsharesapi/websocket.py
beginning line 291
Code: [Select]
def run_forever(self):
""" This method is used to run the websocket app continuously.
It will execute callbacks as defined and try to stay
connected with the provided APIs
"""
cnt = 0
while not self.run_event.is_set():
cnt += 1
self.url = next(self.urls)
log.debug("Trying to connect to node %s" % self.url)
try:
# websocket.enableTrace(True)
self.ws = websocket.WebSocketApp(
self.url,
on_message=self.on_message,
on_error=self.on_error,
on_close=self.on_close,
on_open=self.on_open
)
self.ws.run_forever()
except websocket.WebSocketException as exc:
if (self.num_retries >= 0 and cnt > self.num_retries):
raise NumRetriesReached()
sleeptime = (cnt - 1) * 2 if cnt < 10 else 10
if sleeptime:
log.warning(
"Lost connection to node during wsconnect(): %s (%d/%d) "
% (self.url, cnt, self.num_retries) +
"Retrying in %d seconds" % sleeptime
)
time.sleep(sleeptime)
except KeyboardInterrupt:
self.ws.keep_running = False
raise
except Exception as e:
log.critical("{}\n\n{}".format(str(e), traceback.format_exc()))
this snippet here is a major source of brittleness:
Code: [Select]
"Lost connection to node during wsconnect(): %s (%d/%d) "
% (self.url, cnt, self.num_retries) +
"Retrying in %d seconds" % sleeptime
A better behavior than just waiting and retrying on the same failed node is to switch nodes, THEN retry.
or simply raise the exception and let the user handle the failure: switching nodes; perhaps also blacklisting the old, or ringing a bell:
Code: [Select]
except websocket.WebSocketException:
self.ws.keep_running = False
raise
where the user could command:
Code: [Select]
attempt = 1
while attempt:
try:
# make api call
attempt = 0
except websocket.WebSocketException:
# blacklist this node
# run routine to find another white-listed node with low latency and non-stale blocktime
attempt +=1
if attempt > n:
# run failsafe routine to generate new node list
else:
# switch node to known good node and loop
would it be possible to import the websocket.WebsocketException from the bitshares module?
something like
Code: [Select]
from bitshares import websocket.WebsocketException
but that doesn't work
I've also tried
Code: [Select]
from bitshares.websocket import WebsocketException
from bitshares import websocket
from bitshares import WebsocketException
and attempted to import the class function:Quote
from bitshares import BitSharesWebsocket
but none of them work either, don't mind my ignorance here... just throwing things at the wall to see what sticks
I've also tried importing the websocket module and preempting the exception
Code: [Select]
import websocket
try:
# connect to api
except websocket.WebsocketException:
# print ('hello world')
does not catch