BitShares Forum
Main => General Discussion => Topic started by: puppies on October 02, 2014, 05:33:34 am
-
My delegate node just ran into a seg fault and I dropped a block. Unfortunately I was not running in gdb, and the executable was stripped, so I can't get any debug info. I am currently rebuilding to see if I can reproduce. Has anyone else run into any problems?
* Edit by Bytemaster
- this issue has been fixed in the latest toolkit and DAC Sun Limited has been notified that they should merge the fix and provide an update.
-
I have same issue, some other delegates have the same problem ,looks like a serious bug,only 4600 blocks now.
-
% participation dropped to 88% (red alert)
so I suggest to wait a little bit to upgrade until they give further directions (?)
-
my node just crashed on 0.4.19
>>>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff87fff700 (LWP 4915)]
std::_Hashtable<bts::net::item_id, bts::net::item_id, std::allocator<bts::net::item_id>, std::__detail::_Identity, std::equal_to<bts::net::item_id>, std::hash<bts::net::item_id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find (
this=this@entry=0x45944, __k=...) at /usr/include/c++/4.8/bits/hashtable.h:1024
1024 std::size_t __n = _M_bucket_index(__k, __code);
(gdb) bt
#0 std::_Hashtable<bts::net::item_id, bts::net::item_id, std::allocator<bts::net::item_id>, std::__detail::_Identity, std::equal_to<bts::net::item_id>, std::hash<bts::net::item_id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find (
this=this@entry=0x45944, __k=...) at /usr/include/c++/4.8/bits/hashtable.h:1024
#1 0x0000000000a4ede5 in find (__x=..., this=0x45944) at /usr/include/c++/4.8/bits/unordered_set.h:517
#2 bts::net::detail::node_impl::process_block_during_normal_operation (this=this@entry=0x5a7c2920, originating_peer=originating_peer@entry=0x7fff659dd210,
block_message_to_process=..., message_hash=...) at /root/bitsharesx/libraries/net/node.cpp:2828
#3 0x0000000000a5068b in bts::net::detail::node_impl::process_block_message (this=this@entry=0x5a7c2920, originating_peer=originating_peer@entry=0x7fff659dd210,
message_to_process=..., message_hash=...) at /root/bitsharesx/libraries/net/node.cpp:2880
#4 0x0000000000a51d03 in bts::net::detail::node_impl::on_message (this=0x5a7c2920, originating_peer=0x7fff659dd210, received_message=...)
at /root/bitsharesx/libraries/net/node.cpp:1598
#5 0x0000000000ac034a in bts::net::detail::message_oriented_connection_impl::read_loop (this=0x7fff65a19490) at /root/bitsharesx/libraries/net/message_oriented_connection.cpp:157
#6 0x0000000000ac271c in operator() (__closure=<optimized out>) at /root/bitsharesx/libraries/net/message_oriented_connection.cpp:100
#7 fc::detail::void_functor_run<bts::net::detail::message_oriented_connection_impl::accept()::__lambda0>::run(void *, void *) (functor=<optimized out>, prom=0x7fff65ab2120)
at /root/bitsharesx/libraries/fc/include/fc/thread/task.hpp:83
#8 0x00000000006bf323 in fc::task_base::run_impl (this=this@entry=0x7fff65ab2130) at /root/bitsharesx/libraries/fc/src/thread/task.cpp:43
#9 0x00000000006bf9d5 in fc::task_base::run (this=this@entry=0x7fff65ab2130) at /root/bitsharesx/libraries/fc/src/thread/task.cpp:32
#10 0x00000000006bd95b in run_next_task (this=0x7fff7c0008c0) at /root/bitsharesx/libraries/fc/src/thread/thread_d.hpp:415
#11 fc::thread_d::process_tasks (this=this@entry=0x7fff7c0008c0) at /root/bitsharesx/libraries/fc/src/thread/thread_d.hpp:439
#12 0x00000000006bdbe6 in fc::thread_d::start_process_tasks (my=140735273765056) at /root/bitsharesx/libraries/fc/src/thread/thread_d.hpp:395
#13 0x0000000000f4628e in make_fcontext ()
#14 0x00007fff7c0008c0 in ?? ()
#15 0x00007fff7c068be0 in ?? ()
#16 0x0000000000000000 in ?? ()
-
I wanted to update a price feed and it segfaulted :(
luckily it's just a 'control' node and not the delegate ... but the hardfork will happen today 19pm UTC .. :-\
- crashing on debian
- crashing on archlinux
- independent of price feed publishing ..
-
Mine crash on 0.4.19 too.
-
Crash Confirmed!
-
Should we panic?
-
Should we panic?
na ..
delegates can still run version 0.4.18 ....
actually if the delegates were to choose not to update to 0.4.19 (which they shouldn't do currently) there will not be the hardfork at 640000:)
-
Same here, three segfaults so far, one on the delegate machine, two on the bitsharesblocks machine..
-
0.4.19 is not safe to run. It keeps crashing. I am switching back to 0.4.18
-
Should we panic?
(http://media.fakeposters.com/results/2013/03/15/10jhhke9gf.jpg)
-
mine is crashing out all the time also
-
Btw .. it seems init delegates also crashed ..
-
I've reduced my number of max connections to 10, running OK so far but monitoring it..
-
I've reduced my number of max connections to 10, running OK so far but monitoring it..
How to do?
-
I'm running with --accept-incoming-connections 0 flag and
> network_set_advanced_node_parameters {"desired_number_of_connections":20, "maximum_number_of_connections":20}
No crash of 0.4.19 until now after 1 hour...monitoring
-
I've reduced my number of max connections to 10, running OK so far but monitoring it..
How to do?
network_set_advanced_node_parameters {"maximum_number_of_connections":10}
-
Thanks very much +5%
-
I suggest monitor memory frequently. I have a delegate that got low on memory just running overnight so I rebooted it. Haven't crashed or missed blocks yet but I would for sure keep a close eye on it.
Edit: Linux command to check memory: free
-
Just crashed in my "seed" node that accept incoming connections:
(wallet closed) >>>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb3fff700 (LWP 41682)]
0x0000000000a55777 in std::_Hashtable<bts::net::item_id, bts::net::item_id, std::allocator<bts::net::item_id>, std::__detail::_Identity, std::equal_to<bts::net::item_id>, std::hash<bts::net::item_id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_find_before_node(unsigned long, bts::net::item_id const&, unsigned long) const () at /usr/include/c++/4.8/bits/hashtable.h:1159
1159 __node_base* __prev_p = _M_buckets[__n];
(gdb) bt
#0 0x0000000000a55777 in std::_Hashtable<bts::net::item_id, bts::net::item_id, std::allocator<bts::net::item_id>, std::__detail::_Identity, std::equal_to<bts::net::item_id>, std::hash<bts::net::item_id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_find_before_node(unsigned long, bts::net::item_id const&, unsigned long) const () at /usr/include/c++/4.8/bits/hashtable.h:1159
#1 0x0000000000a55840 in std::_Hashtable<bts::net::item_id, bts::net::item_id, std::allocator<bts::net::item_id>, std::__detail::_Identity, std::equal_to<bts::net::item_id>, std::hash<bts::net::item_id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find(bts::net::item_id const&) ()
at /usr/include/c++/4.8/bits/hashtable.h:604
#2 0x0000000000a3dbc3 in bts::net::detail::node_impl::process_block_during_normal_operation(bts::net::peer_connection*, bts::client::block_message const&, fc::ripemd160 const&) ()
at /usr/include/c++/4.8/bits/unordered_set.h:517
#3 0x0000000000a3f48b in bts::net::detail::node_impl::process_block_message(bts::net::peer_connection*, bts::net::message const&, fc::ripemd160 const&) ()
at /home/fabiux/ssd/bitsharesx/libraries/net/node.cpp:2880
#4 0x0000000000a40bd3 in bts::net::detail::node_impl::on_message(bts::net::peer_connection*, bts::net::message const&) () at /home/fabiux/ssd/bitsharesx/libraries/net/node.cpp:1598
#5 0x0000000000aaef3a in bts::net::detail::message_oriented_connection_impl::read_loop() () at /home/fabiux/ssd/bitsharesx/libraries/net/message_oriented_connection.cpp:157
#6 0x0000000000ab130c in fc::detail::void_functor_run<bts::net::detail::message_oriented_connection_impl::accept()::{lambda()#1}>::run(void*, fc::detail::void_functor_run<bts::net::detail::message_oriented_connection_impl::accept()::{lambda()#1}>) () at /home/fabiux/ssd/bitsharesx/libraries/net/message_oriented_connection.cpp:100
#7 0x00000000006adc53 in fc::task_base::run_impl() () at /home/fabiux/ssd/bitsharesx/libraries/fc/src/thread/task.cpp:43
#8 0x00000000006ac47b in fc::thread_d::process_tasks() () at /home/fabiux/ssd/bitsharesx/libraries/fc/src/thread/thread_d.hpp:415
#9 0x00000000006ac716 in fc::thread_d::start_process_tasks(long) () at /home/fabiux/ssd/bitsharesx/libraries/fc/src/thread/thread_d.hpp:395
#10 0x0000000000f1f9ce in make_fcontext ()
#11 0x00007fffa80008c0 in ?? ()
#12 0x00007fff8d02a530 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb)
My delegate node (without incoming connection and limit to 20 connection) is OK until now.
-
if run the client with
--server --accept-incoming-connections 0 --max-connections 10
however .. I eventually lose ALL connections .. :(
-
I'm running under this setting:
network_set_advanced_node_parameters { "peer_connection_retry_timeout": 10, "desired_number_of_connections": 10, "maximum_number_of_connections": 10 }
if that helps anyone.
-
I have been running 0.4.19 for a few hours now. So far being stable. No crash.
-
Don't upgrade to .19. Clearly it has issues.
-
Don't upgrade to .19. Clearly it has issues.
What if you already did and it's working?
-
Don't upgrade to .19. Clearly it has issues.
Should we use RC1 ? It seems stable.
-
Don't upgrade to .19. Clearly it has issues.
What if you already did and it's working?
or doesnt? :)
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
All delegates should be on the same version. Unless we want forks.
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
All delegates should be on the same version. Unless we want forks.
Can't they delay planned forks ? Not sure if that is what you are referring to or not. I only notice that when we do upgrades their are days when people are on different versions.
-
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
well, 3 crashes in less than 1 hour, reported here: https://github.com/BitShares/bitshares_toolkit/issues/834
good thing is, it helps me debug my monitoring script ;)
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
All delegates should be on the same version. Unless we want forks.
Can't they delay planned forks ? Not sure if that is what you are referring to or not. I only notice that when we do upgrades their are days when people are on different versions.
Forks are hardcoded in the code for block number. Current v0.4.19 fork is scheduled for block 640000. All v0.4.19 will fork at that block. Any previous versions might not accept blocks produced by v0.4.19. That is why either all (most) of the delegates should upgrade by block 640000 or none (very few) of them should upgrade. Note that v0.4.19-RC1 is different version -> it should not accept blocks by both v0.4.19 and v0.4.18.
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I would suggest that if it is working for you, you should not have to downgrade but hopefully BM will weigh in.
All delegates should be on the same version. Unless we want forks.
Can't they delay planned forks ? Not sure if that is what you are referring to or not. I only notice that when we do upgrades their are days when people are on different versions.
Forks are hardcoded in the code for block number. Current v0.4.19 fork is scheduled for block 640000. All v0.4.19 will fork at that block. Any previous versions might not accept blocks produced by v0.4.19. That is why either all (most) of the delegates upgrade by block 640000 or none (very few) of them upgrade. Note that v0.4.19-RC1 is different version -> it should not accept blocks by both v0.4.19 and v0.4.18.
ugh, looks like we're "f--ked"...:( Seems like approx. 68 active delegates are on 0.4.19.
-
640000 hasn't come yet (current block is ~637725). Currently all delegates are on the same fork and there are no issues (except the segfault).
By block 640000 all(most) delegates should be on the same version.
-
Don't upgrade to .19. Clearly it has issues.
any advice on how to downgrade smoothly? Should we just relaunch 0.4.18 on an empty DB and reimport keys? I imagine that as everything is already re-indexed for 0.4.19 there is no way to downgrade this to 0.4.18 directly without wiping some indices first, right?
I downgraded to 0.4.18, but had to delete ~/.BitSharesX/chain to get it working again.
-
I've had a single segfault but 0.4.19 has been stable otherwise.
Since the majority has already upgraded to 0.4.19 it's probably easier to commit to it. The segfault is a hassle but it seems to be harmless otherwise.
The only problem is just restarting your delegate after a segfault but that can be done automatically with HackFisher's expect script (or a slightly modified version of it)
This way we can just wait for patch releases of 0.4.19 as they are made available.
Thoughts?
https://github.com/Bitsuperlab/operation_tools.git
restart/run_wallet.exp
#!/usr/bin/expect -f
set timeout -1
set default_port 1776
set port $default_port
### change wallet_name here
set wallet_name "delegate"
send_user "wallet name is: $wallet_name\n"
send_user "wallet passphrase: "
stty -echo
expect_user -re "(.*)\n"
stty echo
set wallet_pass $expect_out(1,string)
proc run_wallet {} {
global wallet_name wallet_pass default_port port
### change command line here
spawn ./bitshares_client --data-dir=delegate --p2p-port $port --server --httpport 9989 --rpcuser user --rpcpassword pass
expect -exact "(wallet closed) >>> "
send -- "about\r"
expect -exact "(wallet closed) >>> "
send -- "wallet_open $wallet_name\r"
expect -exact "$wallet_name (locked) >>> "
send -- "wallet_unlock 99999999\r"
expect -exact "passphrase: "
send -- "$wallet_pass\r"
expect -exact "$wallet_name (unlocked) >>> "
send -- "wallet_delegate_set_block_production ALL true\r"
expect -exact "$wallet_name (unlocked) >>> "
send -- "info\r"
expect -exact "$wallet_name (unlocked) >>> "
send -- "wallet_list_my_accounts\r"
interact
wait
if { $port == $default_port } {
set port [expr $port+1]
} else {
set port [expr $port-1]
}
}
while true {
run_wallet
}
-
Updating to v0.4.19 seems reasonable to me. Segfault by itself isn't that much of a deal.
However bytemaster recommended not to update. Maybe he has other reasons?
There is still plenty of time for an informed decision. I suggest waiting for bytemaster's input and update to v0.4.19 as default option (if he is silent about this).
-
I think:
we all :network_set_advanced_node_parameters {"maximum_number_of_connections":10}
and wait v0.4.20
-
BTW:
$ python3 timeofblock.py 640000
block 640000 to appear in <= 6:02:20
UTC time: 2014-10-02 19:36:10
6 hours left
-
It was clear enough... everybody back to v0.4.18. Anybody not doing it will heart his reliability statistics... enough time for downgrading... delete your hidden directory ".Bitsharesx " also. After all v0.4.18 is proven stable at least since we all run it a few days...
Sent from my ALCATEL ONE TOUCH 997D
-
Mine's been stable since the initial segfault, running v0.4.19 with 8 connections max.
I think it'll be hard to get everyone over on .18 in time, but I suggest those who want to do so re-publish their version so we can monitor the number of delegates on each side. Currently .19 is in the majority, but I'm ready to switch over if necessary.
-
Updating to v0.4.19 seems reasonable to me. Segfault by itself isn't that much of a deal.
However bytemaster recommended not to update. Maybe he has other reasons?
There is still plenty of time for an informed decision. I suggest waiting for bytemaster's input and update to v0.4.19 as default option (if he is silent about this).
I agree. everyone downgrading would be very unpleasant.
-
Average confirmation time is 5 secs. That's good.
-
Updating to v0.4.19 seems reasonable to me. Segfault by itself isn't that much of a deal.
However bytemaster recommended not to update. Maybe he has other reasons?
There is still plenty of time for an informed decision. I suggest waiting for bytemaster's input and update to v0.4.19 as default option (if he is silent about this).
I agree. everyone downgrading would be very unpleasant.
There are no blockchain level bugs... we suspect that it may be low-memory machines that are crashing which is why we did not experience crashes in our own tests.
-
Because downgrading is so unpleasant and we want the blockchain updates asap
Dan & Eric are looking into the crash, but those that are experiencing it could you please report the specs of the machine you are running on?
Lets update to 0.4.19...
-
that's a good example why the hard fork must have a bigger block distance from the initial announcement time ...
Sent from my ALCATEL ONE TOUCH 997D
-
hmmm .. 2GB is already low memory .. oha
three machines
XEN DomU - 2GB RAM - debian/archlinux - 64 bit
-
0.4.19 ok
-
hmmm .. 2GB is already low memory .. oha
Yes it is because we haven't taken time to optimize memory usage.
We are keeping large portions of the blockchain database *IN RAM* for performance reasons and will probably continue to keep it in RAM long term. Long-term light weight clients will not have the full chain and will therefore have a much smaller memory footprint and delegates can run machines with 128 GB of memory. This will allow us to grow transaction volume and keep block latencies low.
-
Lets update to 0.4.19...
it will be an exciting night in Greece again ! Another pizza session will not heart me I guess :)
Sent from my ALCATEL ONE TOUCH 997D
-
experienced segfault
4GB RAM
-
experienced segfault
4GB RAM
So much for that theory... 4 GB should be more than enough. What OS?
-
Experienced segfaults on two different VPS: 1gb and 2gb of memory with 4gb of swap on both. Ubuntu 14.04
-
The segfaults I experienced were on the testing machine.
What I observe is about 2 GB memory consumption of v0.4.19 (RC1 had about 800 MB).
My primary configurations didn't have crashes so far.
UPDATE: Increased memory consumption might be related to DB reindexing. If you restart the client after reindexing is completed memory consumption goes back to normal
UPDATE2: Main system crashed also.
-
Ubuntu linux 14.04 x64
-
" ...and delegates can run machines with 128 GB of memory."
128 GB ?
Sent from my ALCATEL ONE TOUCH 997D
-
# free
total used free shared buffers cached
Mem: 4048312 3068232 980080 332 152304 2587972
-/+ buffers/cache: 327956 3720356
-
" ...and delegates can run machines with 128 GB of memory."
128 GB ?
Sent from my ALCATEL ONE TOUCH 997D
This is years from now... when we are processing 1000 transactions per second.
-
It is not a memory issue, we have fixed the crash and notified DAC Sun.
-
Because downgrading is so unpleasant and we want the blockchain updates asap
Dan & Eric are looking into the crash, but those that are experiencing it could you please report the specs of the machine you are running on?
Lets update to 0.4.19...
my delegate server is under real time monitoring, the RAM did NOT reached to 80% yet before the client crashed,it may not due to the RAM size.
-
my delegate server is under real time monitoring, the RAM did reached to 80% yet before the client crashed,it may not due to the RAM size.
what software do you use for this? Couldn't get started monitoring my delegate machine(s) in more details
-
It is not a memory issue, we have fixed the crash and notified DAC Sun.
That is good news .. I almost feared weekend hicking tour in fear of segfaults :)
-
It is not a memory issue, we have fixed the crash and notified DAC Sun.
read again folks :)
Sent from my ALCATEL ONE TOUCH 997D
-
UPDATE: Increased memory consumption might be related to DB reindexing. If you restart the client after reindexing is completed memory consumption goes back to normal
-
UPDATE: Increased memory consumption might be related to DB reindexing. If you restart the client after reindexing is completed memory consumption goes back to normal
Yes... this is true. Reindexing fills all of the database caches.
-
UPDATE: Increased memory consumption might be related to DB reindexing. If you restart the client after reindexing is completed memory consumption goes back to normal
+5% - matches my experience. After upgrade (kept it running) memory was low the next morning. Rebooted today and memory usage has not been high.
-
my delegate server is under real time monitoring, the RAM did reached to 80% yet before the client crashed,it may not due to the RAM size.
what software do you use for this? Couldn't get started monitoring my delegate machine(s) in more details
the Alibaba VPS provides monitoring service,i can monitor the CPU,RAM,Disk,P2P connection etc. usage in real time, i will be notified if any of the statistic is abnormal
-
my cloud is fine
-
Do we expect increased CPU time consumption?
I see some higher values.
-
Do we expect increased CPU time consumption?
I see some higher values.
We are auditing all resource usage right now.
-
It is not a memory issue, we have fixed the crash and notified DAC Sun.
Is it wise to upgrade already to master or should we wait for an announcement from DAC Sun?
-
look like mine is find on cloud with 4gb ram
Do we expect increased CPU time consumption?
I see some higher values.
We are auditing all resource usage right now.
-
It is not a memory issue, we have fixed the crash and notified DAC Sun.
Is it wise to upgrade already to master or should we wait for an announcement from DAC Sun?
It hasn't been pushed to master yet.
-
seems to work now....
-
0.4.20 should hopefully fix the problem: https://bitsharestalk.org/index.php?topic=7067.msg124598#msg124598
-
I have job interview in a bit, I just saw we need to upgrade. I will do so ASAP when I get home.
Just FYI in case I can't make it.
-
no issues here yet ...
-
0.4.20 seems to hold up fine. Great job on the quick fix, thanks! I will be able to sleep tight tonight :P
-
0.4.20 seems to hold up fine. Great job on the quick fix, thanks! I will be able to sleep tight tonight :P
Just updated to 0.4.20. It has been stable. Thanks for the fast response.
-
I have job interview in a bit, I just saw we need to upgrade. I will do so ASAP when I get home.
Just FYI in case I can't make it.
Ok updated to 0.4.20