Author Topic: witness_node completely froze overnight  (Read 7582 times)

0 Members and 1 Guest are viewing this topic.

Offline alt

  • Hero Member
  • *****
  • Posts: 2821
    • View Profile
  • BitShares: baozi
I believe we have identified the cause of this and will roll out a fix next week.
thanks!

Offline bytemaster

I believe we have identified the cause of this and will roll out a fix next week.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
BitShares committee member: abit
BitShares witness: in.abit

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
Code: [Select]
#0  0x00000000021c352c in graphene::chain::price_feed::price_feed (this=0x23d86de8)
    at /app/bts/bts2-2.0.160103b/libraries/chain/include/graphene/chain/protocol/asset.hpp:150
#1  0x00000000024e61c9 in std::pair<fc::time_point_sec, graphene::chain::price_feed>::pair (this=0x23d86de0)
    at /usr/include/c++/4.8/bits/stl_pair.h:127
#2  0x00000000024e6203 in boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >::pair (this=0x23d86dd8, x=...)
    at /app/boost_1_57_0.bin/include/boost/container/detail/pair.hpp:115
#3  0x00000000024e3dc1 in boost::container::allocator_traits<std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::priv_construct_dispatch2<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> ><boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >&> > (p=0x23d86dd8) at /app/boost_1_57_0.bin/include/boost/container/allocator_traits.hpp:343
#4  0x00000000024e0984 in boost::container::allocator_traits<std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::priv_construct<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> ><boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >&> > (a=..., p=0x23d86dd8) at /app/boost_1_57_0.bin/include/boost/container/allocator_traits.hpp:334
#5  0x00000000024dbec7 in boost::container::allocator_traits<std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::construct<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> ><boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >&> > (a=..., p=0x23d86dd8) at /app/boost_1_57_0.bin/include/boost/container/allocator_traits.hpp:288
#6  0x00000000024d4f18 in boost::container::uninitialized_copy_alloc_n<std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > >, boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >*, boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >*> (a=..., f=0x23ccfe28, n=1, r=0x23d86dd8) at /app/boost_1_57_0.bin/include/boost/container/detail/utilities.hpp:630
#7  0x00000000024ccf4d in boost::container::vector<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >, std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::vector (this=0x7ffff5ce6b70, x=...)
    at /app/boost_1_57_0.bin/include/boost/container/vector.hpp:726
#8  0x00000000024c4129 in boost::container::container_detail::flat_tree<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >, boost::container::container_detail::select1st<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > >, std::less<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object> >, std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::Data::Data (this=0x7ffff5ce6b70, d=...) at /app/boost_1_57_0.bin/include/boost/container/detail/flat_tree.hpp:122
#9  0x00000000024bad2f in boost::container::container_detail::flat_tree<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> >, boost::container::container_detail::select1st<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > >, std::less<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object> >, std::allocator<boost::container::container_detail::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::flat_tree (this=0x7ffff5ce6b70, x=...) at /app/boost_1_57_0.bin/include/boost/container/detail/flat_tree.hpp:217
#10 0x00000000024afb43 in boost::container::flat_map<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed>, std::less<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object> >, std::allocator<std::pair<graphene::db::object_id<(unsigned char)1, (unsigned char)2, graphene::chain::account_object>, std::pair<fc::time_point_sec, graphene::chain::price_feed> > > >::flat_map (this=0x7ffff5ce6b70, x=...)
    at /app/boost_1_57_0.bin/include/boost/container/flat_map.hpp:268
#11 0x00000000024a27a5 in graphene::chain::asset_bitasset_data_object::asset_bitasset_data_object (this=0x7ffff5ce6b30)
    at /app/bts/bts2-2.0.160103b/libraries/chain/include/graphene/chain/asset_object.hpp:164
#12 0x000000000248439b in graphene::chain::database::clear_expired_orders (this=0x596e278)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_update.cpp:294
#13 0x000000000246045f in graphene::chain::database::_apply_block (this=0x596e278, next_block=...)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:519
#14 0x000000000245f90e in graphene::chain::database::__lambda19::operator() (__closure=0x7ffff5ce6ef0)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:475
#15 0x0000000002486757 in graphene::chain::detail::with_skip_flags<graphene::chain::database::apply_block(const graphene::chain::signed_block&, uint32_t)::__lambda19>(graphene::chain::database &, uint32_t, graphene::chain::database::__lambda19) (db=...,
    skip_flags=0, callback=...) at /app/bts/bts2-2.0.160103b/libraries/chain/include/graphene/chain/db_with.hpp:126
#16 0x000000000245fd25 in graphene::chain::database::apply_block (this=0x596e278, next_block=..., skip=0)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:476
#17 0x00000000024583d8 in graphene::chain::database::_push_block (this=0x596e278, new_block=...)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:191
#18 0x0000000002457383 in graphene::chain::database::__lambda15::__lambda16::operator() (__closure=0x7ffff5ce75f0)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:116
#19 0x0000000002486560 in graphene::chain::detail::without_pending_transactions<graphene::chain::database::push_block(const graphene::chain::signed_block&, uint32_t)::__lambda15::__lambda16>(graphene::chain::database &, <unknown type in /app/bts/bts2-2.0.160103b/programs/witness_node/witness_node, CU 0x1063cb1, DIE 0x12afeb4>, graphene::chain::database::__lambda15::__lambda16) (db=...,
    pending_transactions=<unknown type in /app/bts/bts2-2.0.160103b/programs/witness_node/witness_node, CU 0x1063cb1, DIE 0x12afeb4>, callback=...) at /app/bts/bts2-2.0.160103b/libraries/chain/include/graphene/chain/db_with.hpp:143
#20 0x00000000024573ff in graphene::chain::database::__lambda15::operator() (__closure=0x7ffff5ce7690)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:117
#21 0x00000000024865db in graphene::chain::detail::with_skip_flags<graphene::chain::database::push_block(const graphene::chain::signed_block&, uint32_t)::__lambda15>(graphene::chain::database &, uint32_t, graphene::chain::database::__lambda15) (db=..., skip_flags=0,
    callback=...) at /app/bts/bts2-2.0.160103b/libraries/chain/include/graphene/chain/db_with.hpp:126
#22 0x0000000002457458 in graphene::chain::database::push_block (this=0x596e278, new_block=..., skip=0)
    at /app/bts/bts2-2.0.160103b/libraries/chain/db_block.cpp:118
#23 0x00000000020c01b4 in graphene::app::detail::application_impl::handle_block (this=0x596e140, blk_msg=..., sync_mode=false,
    contained_transaction_message_ids=...) at /app/bts/bts2-2.0.160103b/libraries/app/application.cpp:505
#24 0x00000000029a4a63 in graphene::net::detail::statistics_gathering_node_delegate_wrapper::__lambda56::operator() (
    __closure=0x7fffe80b9ee8) at /app/bts/bts2-2.0.160103b/libraries/net/node.cpp:5388
#25 0x00000000029b610c in fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(const graphene::net::block_message&, bool, std::vector<fc::ripemd160>&)::__lambda56>::run(void *, void *) (functor=0x7fffe80b9ee8,
    prom=0x7fffe80b9fd8) at /app/bts/bts2-2.0.160103b/libraries/fc/include/fc/thread/task.hpp:77
#26 0x00000000027760f9 in fc::task_base::run_impl (this=0x7fffe80b9f10)
    at /app/bts/bts2-2.0.160103b/libraries/fc/src/thread/task.cpp:43
#27 0x000000000277608a in fc::task_base::run (this=0x7fffe80b9f10) at /app/bts/bts2-2.0.160103b/libraries/fc/src/thread/task.cpp:32
#28 0x000000000276a76c in fc::thread_d::run_next_task (this=0x59a4410)
    at /app/bts/bts2-2.0.160103b/libraries/fc/src/thread/thread_d.hpp:495
#29 0x000000000276ac10 in fc::thread_d::process_tasks (this=0x59a4410)
    at /app/bts/bts2-2.0.160103b/libraries/fc/src/thread/thread_d.hpp:544
#30 0x000000000276a283 in fc::thread_d::start_process_tasks (my=93996048)
    at /app/bts/bts2-2.0.160103b/libraries/fc/src/thread/thread_d.hpp:475
#31 0x0000000002aceb41 in make_fcontext () at libs/context/src/asm/make_x86_64_sysv_elf_gas.S:64
#32 0x0000000000000000 in ?? ()

Sure it's related to clear_expired_orders
BitShares committee member: abit
BitShares witness: in.abit

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
@iHashFury some questions:
* did you start it with --replay-blockchain?
* was your witness_node running in gdb? if yes, a backtrace would be helpful.

--replay-blockchain run automatically after the update.

I believe it's different.

I just found that my backup node has been in stuck status for quite some time, and CPU is 100%. It's a debug build of version 2.0.160103b, running in gdb, started without --replay-blockchain, however no automatically replay when it was started (database was considered clean).

Code: [Select]
1701380ms th_a       application.cpp:497           handle_block         ] Got block: #2367989 time: 2016-01-04T12:28:21 latency: 380 ms from: verbaltech2  irreversible: 2367972 (-17)
1702000ms th_a       witness.cpp:194               block_production_loo ] Not producing block because slot has not yet arrived
1703000ms th_a       witness.cpp:194               block_production_loo ] Not producing block because slot has not yet arrived
1704000ms th_a       witness.cpp:191               block_production_loo ] Not producing block because it isn't my turn
1704347ms th_a       application.cpp:497           handle_block         ] Got block: #2367990 time: 2016-01-04T12:28:24 latency: 347 ms from: spectral  irreversible: 2367972 (-18)
1152420ms ntp        ntp.cpp:177                   read_loop            ] ntp_delta_time updated to 88 us
258788ms ntp        ntp.cpp:177                   read_loop            ] ntp_delta_time updated to 36647 us
1164747ms ntp        ntp.cpp:177                   read_loop            ] ntp_delta_time updated to -109792 us
2070280ms ntp        ntp.cpp:177                   read_loop            ] ntp_delta_time updated to -216 us
3570604ms ntp        ntp.cpp:177                   read_loop            ] ntp_delta_time updated to -1283 us
Instructions needed. Wish I can help.

//Edit:
Both my primary witness node and seed node work fine, both were started with --replay-blockchain.
« Last Edit: January 06, 2016, 12:22:59 am by abit »
BitShares committee member: abit
BitShares witness: in.abit

iHashFury

  • Guest
@iHashFury some questions:
* did you start it with --replay-blockchain?
* was your witness_node running in gdb? if yes, a backtrace would be helpful.

--replay-blockchain run automatically after the update.
I was not running in gdb.

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
@iHashFury some questions:
* did you start it with --replay-blockchain?
* was your witness_node running in gdb? if yes, a backtrace would be helpful.
BitShares committee member: abit
BitShares witness: in.abit

iHashFury

  • Guest
My witness node froze and maxed out a CPU - Not sure why but there is alot of network_connect_loop action after 2016-01-05T18:20:35 ?

Code: [Select]
2016-01-05T18:48:33 p2p:dump_node_status_task     dump_node_status ] --------- END MEMORY USAGE ------------ node.cpp:4684
2016-01-05T18:48:35 p2p:p2p_network_connect_loop p2p_network_connect_ ] Starting an iteration of p2p_network_connect_loop(). node.cpp:892
2016-01-05T18:48:35 p2p:p2p_network_connect_loop display_current_conn ] Currently have 19 of [20/200] connections

lost a connection

Code: [Select]
2016-01-05T18:48:35 p2p:p2p_network_connect_loop display_current_conn ]    handshaking: [ip-addres-removed]:1776 with 000000000000000000000000000000000000000000000000000000000000000000  [unknown] node.cpp:1743
2016-01-05T18:48:35   p2p:connect_to_task           connect_to ] fatal: error connecting to peer [ip-addres-removed]:1776: 0 exception: unspecified
Connection refused
    {"message":"Connection refused"}
    asio  asio.cpp:60 error_handler peer_connection.cpp:257

Connection refused

Code: [Select]
2016-01-05T18:48:35   p2p:connect_to_task trigger_p2p_network_ ] Triggering connect loop now node.cpp:985
2016-01-05T18:48:35   p2p:connect_to_task schedule_peer_for_de ] scheduling peer for deletion: [ip-addres-removed]:1776 (this will not block) node.cpp:1637
2016-01-05T18:48:35   p2p:connect_to_task schedule_peer_for_de ] asyncing delayed_peer_deletion_task to delete 1 peers node.cpp:1642
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task delayed_peer_deletio ] beginning an iteration of delayed_peer_deletion_task with 1 in queue node.cpp:1601
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] calling close_connection() peer_connection.cpp:124
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] close_connection completed normally peer_connection.cpp:126
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] canceling _send_queued_messages task peer_connection.cpp:139
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] cancel_and_wait completed normally peer_connection.cpp:141
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] canceling accept_or_connect_task peer_connection.cpp:154
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task              destroy ] Unexpected exception from peer_connection's accept_or_connect_task : {"code":0,"name":"exception","message":"unspecified","stack":[{"context":{"level":"error","file":"asio.cpp","line":60,"method":"error_handler","hostname":"","thread_name":"asio","timestamp":"2016-01-05T18:48:35"},"format":"${message} ","data":{"message":"Connection refused"}}]} peer_connection.cpp:160
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task   destroy_connection ] in destroy_connection() for message_oriented_connection.cpp:283
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task   destroy_connection ] in destroy_connection() for message_oriented_connection.cpp:283
2016-01-05T18:48:35 p2p:delayed_peer_deletion_task delayed_peer_deletio ] leaving delayed_peer_deletion_task node.cpp:1604

Not sure what I'm looking for but have the p2p.log if required.

Offline bytemaster

This is very helpful.  We are looking into it closer.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline alt

  • Hero Member
  • *****
  • Posts: 2821
    • View Profile
  • BitShares: baozi
I have repeat this bug with a debug version, it run from db_update.cpp:286 in a endless loop,
and the order's id is always 0,
Code: [Select]
(gdb) c
Continuing.

Breakpoint 1, graphene::chain::database::clear_expired_orders (this=0x603fe60) at /home/alt/workspace/dac/bitshares-2/libraries/chain/db_update.cpp:290
290              const force_settlement_object& order = *itr;
(gdb) n
291              auto order_id = order.id;
(gdb) print order
$11 = (const graphene::chain::force_settlement_object &) @0xcc9ea00: {<graphene::db::abstract_object<graphene::chain::force_settlement_object>> = {<graphene::db::object> = {_vptr.object = 0x2cce790 <vtable for graphene::chain::force_settlement_object+16>, static space_id = 0 '\000', static type_id = 0 '\000', id = {
        number = 0}}, <No data fields>}, static space_id = 1 '\001', static type_id = 4 '\004', owner = {static space_id = <optimized out>,
    static type_id = <optimized out>, instance = {value = 32183}}, balance = {amount = {value = 14500000}, asset_id = {static space_id = <optimized out>,
      static type_id = <optimized out>, instance = {value = 113}}}, settlement_date = {utc_seconds = 1451910504}}
(gdb) c
Continuing.

Breakpoint 1, graphene::chain::database::clear_expired_orders (this=0x603fe60) at /home/alt/workspace/dac/bitshares-2/libraries/chain/db_update.cpp:290
290              const force_settlement_object& order = *itr;
(gdb) n
291              auto order_id = order.id;
(gdb) print order
$12 = (const graphene::chain::force_settlement_object &) @0xcc9ea00: {<graphene::db::abstract_object<graphene::chain::force_settlement_object>> = {<graphene::db::object> = {_vptr.object = 0x2cce790 <vtable for graphene::chain::force_settlement_object+16>, static space_id = 0 '\000', static type_id = 0 '\000', id = {
        number = 0}}, <No data fields>}, static space_id = 1 '\001', static type_id = 4 '\004', owner = {static space_id = <optimized out>,
    static type_id = <optimized out>, instance = {value = 32183}}, balance = {amount = {value = 14500000}, asset_id = {static space_id = <optimized out>,
      static type_id = <optimized out>, instance = {value = 113}}}, settlement_date = {utc_seconds = 1451910504}}
(gdb)
I think the order's id shouldn't be 0, or it will never execute
Code: [Select]
342          // Match against the least collateralized short until the settlement is finished or we reach max settlements
343          while( settled < max_settlement_volume && find_object(order_id) )                                                                                         
344          {
345             auto itr = call_index.lower_bound(boost::make_tuple(price::min(mia_object.bitasset_data(*this).options.short_backing_asset,
346                                                                            mia_object.get_id())));
347             // There should always be a call order, since asset exists!
348             assert(itr != call_index.end() && itr->debt_type() == mia_object.get_id());
349             asset max_settlement = max_settlement_volume - settled;

@Bytemаsteг what else infomation do you need?I am keep the gdb env
« Last Edit: January 05, 2016, 02:23:00 pm by alt »

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
I've spent a lot of time investigating this and going over every line of code in the clear_expired_orders().  I can't find the problem from inspecting the code, and I do not have any way to reproduce it reliably.  If someone can provide specific steps or a specific datadir to reproduce it, I might be able to make progress in the investigation.
Today Alt and I did some tests with latest Release build and most recent chain data.
* When witness node is properly shut down, and then start without --replay-blockchain, it will get stuck.
* when start with --replay-blockchain, it won't get stuck.
Maybe something wrong with the 'object_database'.

//Edit: tried some more times but the issue doesn't appear again.
« Last Edit: January 04, 2016, 08:19:40 am by abit »
BitShares committee member: abit
BitShares witness: in.abit

Xeldal

  • Guest
Whats interesting is once the hardfork took place.  It started running smoothly again, no hangs at all since then.

I'm guessing, if they're related, we won't see the witness hanging issue again.

Well, looks like I was wrong.  My windows-cli-tools is back to hanging every 20 seconds.

Code: [Select]
300939ms th_a       fork_database.cpp:60          push_block           ] Pushing
 block to fork database that failed to link: 001d519567d4329c3252cb11f240597c931
23de3, 1921429
300939ms th_a       fork_database.cpp:61          push_block           ] Head: 1
921427, 001d5193c4139878e5ab7863158a9aa7042e44ae
300940ms th_a       application.cpp:524           handle_block         ] Error w
hen pushing block:
3080000 unlinkable_block_exception: unlinkable block
block does not link to known chain
    {}
    th_a  fork_database.cpp:81 graphene::chain::fork_database::_push_block

    {"new_block":{"previous":"001d51948178e5a9af007a5dd9727d2d3c50383b","timesta
mp":"2015-12-19T18:04:48","witness":"1.6.13","transaction_merkle_root":"00000000
00000000000000000000000000000000","extensions":[],"witness_signature":"20310dfe1
8f3fc85f6c8419a43c016a843dacd20a6c0ebbc438a8fc1c7bb76ebb666fba65dedeb732478b1238
9cc5353501f8808be92d28beb08badedb67ecff8c","transactions":[]}}
    th_a  db_block.cpp:201 graphene::chain::database::_push_block
« Last Edit: December 19, 2015, 06:06:46 pm by Xeldal »

Offline pc

  • Hero Member
  • *****
  • Posts: 1530
    • View Profile
    • Bitcoin - Perspektive oder Risiko?
  • BitShares: cyrano
I've spent a lot of time investigating this and going over every line of code in the clear_expired_orders().  I can't find the problem from inspecting the code, and I do not have any way to reproduce it reliably.  If someone can provide specific steps or a specific datadir to reproduce it, I might be able to make progress in the investigation.

I'll pm you a download link for a datadir that reliably causes my node to hang while syncing. It gets stuck around this point in the output:

Code: [Select]
3298620ms th_a       db_market.cpp:468             check_call_orders    ] Feed protected margin call executing (HARDFORK_436_TIME not here yet)
3298620ms th_a       db_market.cpp:469             check_call_orders    ] *call_itr: {"id":"1.8.1069","borrower":"1.2.23568","collateral":"12757500000","debt":180102,"call_price":{"base":{"amount":1215000000,"asset_id":"1.3.0"},"quote":{"amount":30017,"asset_id":"1.3.105"}}}
3298621ms th_a       db_market.cpp:470             check_call_orders    ] *limit_itr: {"id":"1.7.12966","expiration":"2020-12-16T11:08:58","seller":"1.2.89808","for_sale":180100,"sell_price":{"base":{"amount":180100,"asset_id":"1.3.105"},"quote":{"amount":"7292609200","asset_id":"1.3.0"}},"deferred_fee":0}
« Last Edit: December 18, 2015, 03:42:04 pm by pc »
Bitcoin - Perspektive oder Risiko? ISBN 978-3-8442-6568-2 http://bitcoin.quisquis.de

Xeldal

  • Guest
I've spent a lot of time investigating this and going over every line of code in the clear_expired_orders().  I can't find the problem from inspecting the code, and I do not have any way to reproduce it reliably.  If someone can provide specific steps or a specific datadir to reproduce it, I might be able to make progress in the investigation.

I have no idea if this is related. Just guessing.

When the 1209 release was first put out. I noticed the windows-cli-toolsx64.exe I was using(from Nov 11th) started acting strange. It would hang for about 20 seconds and then catch-up, over and over again.  Eventually it would just permanently hang and not respond.   -resync and -replay had no effect.  It would immediately start hanging again.  This behavior was the same with every subsequent release upto and including 1216b.   

Whats interesting is once the hardfork took place.  It started running smoothly again, no hangs at all since then.

I'm guessing, if they're related, we won't see the witness hanging issue again. 

Offline theoretical

I've spent a lot of time investigating this and going over every line of code in the clear_expired_orders().  I can't find the problem from inspecting the code, and I do not have any way to reproduce it reliably.  If someone can provide specific steps or a specific datadir to reproduce it, I might be able to make progress in the investigation.
BTS- theoretical / PTS- PZxpdC8RqWsdU3pVJeobZY7JFKVPfNpy5z / BTC- 1NfGejohzoVGffAD1CnCRgo9vApjCU2viY / the delegate formerly known as drltc / Nothing said on these forums is intended to be legally binding / All opinions are my own unless otherwise noted / Take action due to my posts at your own risk

Offline pc

  • Hero Member
  • *****
  • Posts: 1530
    • View Profile
    • Bitcoin - Perspektive oder Risiko?
  • BitShares: cyrano
I think the hangs only happen when you restart the node without rescanning. At least my nodes never had problems after rescanning.
Bitcoin - Perspektive oder Risiko? ISBN 978-3-8442-6568-2 http://bitcoin.quisquis.de

Offline abit

  • Committee member
  • Hero Member
  • *
  • Posts: 4664
    • View Profile
    • Abit's Hive Blog
  • BitShares: abit
  • GitHub: abitmore
Why only some witnesses have this issue
BitShares committee member: abit
BitShares witness: in.abit

Offline alt

  • Hero Member
  • *****
  • Posts: 2821
    • View Profile
  • BitShares: baozi
I'm sure this frozen happend because of settlement. my 3 node all frozen at block 1849845
Code: [Select]
2478528ms th_a       application.cpp:496           handle_block         ] Got block: #1849845 time: 2015-12-17T05:41:18 latency: 577 ms from: delegate.taolje  irreversi
ble: 1849827 (-18)

and  the settlement happend at 05:41:18
Code: [Select]
PRICE CNY/BTS      AMOUNT(BTS)      VALUE(CNY)      DATE
0.02404            0.0042           0.000           13:42:42
0.02400            207,186.7375     4,972.482       13:41:18

Offline alt

  • Hero Member
  • *****
  • Posts: 2821
    • View Profile
  • BitShares: baozi
this happend very often,
everytime this happened, it's run in the infinite loop: clear_expired_orders()
maybe it's caused by the expired propose?
Code: [Select]
285       // At each iteration, we either consume the current order and remove it, or we move to the next asset
286       for( auto itr = settlement_index.lower_bound(current_asset);
287            itr != settlement_index.end();
288            itr = settlement_index.lower_bound(current_asset) )                                                                                                     
289       {
290          const force_settlement_object& order = *itr;
291          auto order_id = order.id;
292          current_asset = order.settlement_asset_id();
293          const asset_object& mia_object = get(current_asset);
294          const asset_bitasset_data_object mia = mia_object.bitasset_data(*this);
295
.......................
......................
« Last Edit: December 17, 2015, 12:13:33 am by alt »

Offline merivercap

  • Hero Member
  • *****
  • Posts: 661
    • View Profile
    • BitCash
I was just trying to start up a full node with some help, but one time it froze at block 587,634.  The rest of the time it wouldn't get past the first block so I'm stuck right now.   Does my problem have anything to do with this issue or is it probably something else?

Sounds like a separate issue... mine froze well after resync, and yours froze during.

Which block are you stuck on now?

What command are you using to launch the witness?

How much free space (disk/mem) do you have?

Ahhh... You know I think I might not have that much disk space on this laptop so that might be the problem.  I'll have to try another machine.    Thanks! 
BitCash - http://www.bitcash.org 
Beta: bitCash Wallet / p2p Gateway: (https://m.bitcash.org)
Beta: bitCash Trade (https://trade.bitcash.org)

Offline roadscape

I was just trying to start up a full node with some help, but one time it froze at block 587,634.  The rest of the time it wouldn't get past the first block so I'm stuck right now.   Does my problem have anything to do with this issue or is it probably something else?

Sounds like a separate issue... mine froze well after resync, and yours froze during.

Which block are you stuck on now?

What command are you using to launch the witness?

How much free space (disk/mem) do you have?
http://cryptofresh.com  |  witness: roadscape

Offline alt

  • Hero Member
  • *****
  • Posts: 2821
    • View Profile
  • BitShares: baozi
I have met this froze several times, cpu usage is  100%, here is some info from gdb
Code: [Select]
(gdb) bt
#0  0x0000000000dcecf5 in void fc::divide<fc::uint128>(fc::uint128 const&, fc::uint128 const&, fc::uint128&, fc::uint128&) ()
#1  0x0000000000dcedf4 in fc::uint128::operator/=(fc::uint128 const&) ()
#2  0x0000000000be6d9a in graphene::chain::database::clear_expired_orders() ()
#3  0x0000000000beda0f in graphene::chain::database::_apply_block(graphene::chain::signed_block const&) ()
#4  0x0000000000bee835 in graphene::chain::database::apply_block(graphene::chain::signed_block const&, unsigned int) ()
#5  0x0000000000beed34 in graphene::chain::database::_push_block(graphene::chain::signed_block const&) ()
#6  0x0000000000bf0af3 in graphene::chain::database::push_block(graphene::chain::signed_block const&, unsigned int) ()
#7  0x0000000000a3783d in graphene::app::detail::application_impl::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&) ()
#8  0x0000000000f3f9d7 in fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>::run(void*, fc::detail::functor_run<graphene::net::detail::statistics_gathering_node_delegate_wrapper::handle_block(graphene::net::block_message const&, bool, std::vector<fc::ripemd160, std::allocator<fc::ripemd160> >&)::{lambda()#1}>) ()
#9  0x0000000000dedfa4 in fc::task_base::run_impl() ()
#10 0x0000000000debbaf in fc::thread_d::process_tasks() ()
#11 0x0000000000debdf1 in fc::thread_d::start_process_tasks(long) ()
#12 0x0000000001061621 in make_fcontext ()
#13 0x0000000000000000 in ?? ()
seems froze at here
Code: [Select]
285       // At each iteration, we either consume the current order and remove it, or we move to the next asset
286       for( auto itr = settlement_index.lower_bound(current_asset);
287            itr != settlement_index.end();
288            itr = settlement_index.lower_bound(current_asset) )                                                                                                     
289       {
290          const force_settlement_object& order = *itr;
291          auto order_id = order.id;
292          current_asset = order.settlement_asset_id();
293          const asset_object& mia_object = get(current_asset);
294          const asset_bitasset_data_object mia = mia_object.bitasset_data(*this);
295
.......................
......................
« Last Edit: December 15, 2015, 12:34:35 am by alt »

Offline merivercap

  • Hero Member
  • *****
  • Posts: 661
    • View Profile
    • BitCash
I was just trying to start up a full node with some help, but one time it froze at block 587,634.  The rest of the time it wouldn't get past the first block so I'm stuck right now.   Does my problem have anything to do with this issue or is it probably something else?
BitCash - http://www.bitcash.org 
Beta: bitCash Wallet / p2p Gateway: (https://m.bitcash.org)
Beta: bitCash Trade (https://trade.bitcash.org)

Offline roadscape

This has happened to two other witnesses and a bug report was submitted.

It is unclear what is causing it so far. The same thing has caused cryptofresh to stop loading twice in the last few days.

Correction: only once due to this mysterious hang

The other time was a HD full of log files..
http://cryptofresh.com  |  witness: roadscape

Offline monsterer

It's something in the latest version which is causing it. Unclear what, though.
My opinions do not represent those of metaexchange unless explicitly stated.
https://metaexchange.info | Bitcoin<->Altcoin exchange | Instant | Safe | Low spreads

Offline BunkerChainLabs-DataSecurityNode

This has happened to two other witnesses and a bug report was submitted.

It is unclear what is causing it so far. The same thing has caused cryptofresh to stop loading twice in the last few days.
+-+-+-+-+-+-+-+-+-+-+
www.Peerplays.com | Decentralized Gaming Built with Graphene - Now with BookiePro and Sweeps!
+-+-+-+-+-+-+-+-+-+-+

Offline monsterer

Unfortunately, no really log information available, but the latest witness_node just froze overnight; this is the first time this has happened.
My opinions do not represent those of metaexchange unless explicitly stated.
https://metaexchange.info | Bitcoin<->Altcoin exchange | Instant | Safe | Low spreads