Author Topic: can I disable retry? "Lost connection to node during wsconnect():"  (Read 187 times)

0 Members and 1 Guest are viewing this topic.

Offline litepresence

  • Full Member
  • ***
  • Posts: 71
    • View Profile
When I get this error and it attempts a retry, it never actually does connect because the node itself is down or having issues.

Code: [Select]
Retrying in 2 seconds
Retrying in 4 seconds
Retrying in 6 seconds
etc.

How can I "catch" this retry-attempt like an error? 
How can I provide an argument to prevent any retry at all?

currently the only way I have found, is to use a `multiprocess.Process` approach to enforce a timeout that overrides the retry attempt by starting a side process to contain the node connect attempt, then breaking that process if it takes too long.   

I would rather it just break as soon as it says
Code: [Select]
Retrying in 2 seconds
as I've yet to see a retry actually accomplish anything.

Using a timeout approach has limitations; the timeout must be long enough to account for latency fluctuation, while being short enough to not eat up too much script time.

From my perspective, it would be advantageous if the pybitshares module, rather than attempting a retry on a down node, throws an exception that I can handle with try/except.  Upon exception, I can then specify a new node to attempt... rather than waiting for a timeout to expire.



related pybitshares code:

https://github.com/xeroc/python-bitshares/blob/9250544ca8eadf66de31c7f38fc37294c11f9548/bitsharesapi/websocket.py

beginning line 291

Code: [Select]
    def run_forever(self):
        """ This method is used to run the websocket app continuously.
            It will execute callbacks as defined and try to stay
            connected with the provided APIs
        """
        cnt = 0
        while not self.run_event.is_set():
            cnt += 1
            self.url = next(self.urls)
            log.debug("Trying to connect to node %s" % self.url)
            try:
                # websocket.enableTrace(True)
                self.ws = websocket.WebSocketApp(
                    self.url,
                    on_message=self.on_message,
                    on_error=self.on_error,
                    on_close=self.on_close,
                    on_open=self.on_open
                )
                self.ws.run_forever()
            except websocket.WebSocketException as exc:
                if (self.num_retries >= 0 and cnt > self.num_retries):
                    raise NumRetriesReached()

                sleeptime = (cnt - 1) * 2 if cnt < 10 else 10
                if sleeptime:
                    log.warning(
                        "Lost connection to node during wsconnect(): %s (%d/%d) "
                        % (self.url, cnt, self.num_retries) +
                        "Retrying in %d seconds" % sleeptime
                    )
                    time.sleep(sleeptime)

            except KeyboardInterrupt:
                self.ws.keep_running = False
                raise

            except Exception as e:
                log.critical("{}\n\n{}".format(str(e), traceback.format_exc()))


this snippet here is a major source of brittleness:

Code: [Select]
"Lost connection to node during wsconnect(): %s (%d/%d) "
                        % (self.url, cnt, self.num_retries) +
                        "Retrying in %d seconds" % sleeptime

A better behavior than just waiting and retrying on the same failed node is to switch nodes, THEN retry.


or simply raise the exception and let the user handle the failure: switching nodes; perhaps also blacklisting the old, or ringing a bell:

Code: [Select]
except websocket.WebSocketException:
    self.ws.keep_running = False
    raise

where the user could command:

Code: [Select]

attempt = 1
while attempt:

     try:

      # make api call
     attempt = 0

    except websocket.WebSocketException:

      # blacklist this node
      # run routine to find another white-listed node with low latency and non-stale blocktime
      attempt +=1
      if attempt > n:
          # run failsafe routine to generate new node list
      else:
          # switch node to known good node and loop
 


would it be possible to import the websocket.WebsocketException from the bitshares module?

something like

Code: [Select]
from bitshares import websocket.WebsocketException
but that doesn't work

I've also tried

Code: [Select]
from bitshares.websocket import WebsocketException
from bitshares import websocket
from bitshares import WebsocketException
and attempted to import the class function:

Quote
from bitshares import BitSharesWebsocket

but none of them work either, don't mind my ignorance here... just throwing things at the wall to see what sticks

I've also tried importing the websocket module and preempting the exception

Code: [Select]
import websocket

try:
    # connect to api
except websocket.WebsocketException:
    # print ('hello world')

does not catch
« Last Edit: March 14, 2018, 06:04:12 pm by litepresence »

Offline litepresence

  • Full Member
  • ***
  • Posts: 71
    • View Profile
Re: can I disable retry? "Lost connection to node during wsconnect():"
« Reply #1 on: March 15, 2018, 03:29:23 pm »
h/t @ Alex M

resolved was able to pass:

Code: [Select]
num_retries=0

Code: [Select]
chain = Blockchain(bitshares_instance=BitShares(n, num_retries=0), mode='head')
I had given up on this early on because I tried this without success:

Code: [Select]
chain = Blockchain(bitshares_instance=BitShares(n), num_retries=0, mode='head')
« Last Edit: March 15, 2018, 04:03:17 pm by litepresence »