Arachni RPC running with many bugs.

Kevin's Avatar

Kevin

19 Jun, 2017 10:29 AM

Hi,

Currently using arachni-1.5-0.5.11.
We have been using the grid mode with the balanced options but we can never get it running even somewhat stable. A lot of the first issues were around RAM issues etc but all these are fixed.
So our current setup is: 5x Servers with 16 gb ram each and a master with also 16 gb ram.

We are using the arachni_rpc in order to initiate the scans.

From arachni master:
ruby arachni_rpcd --address 10.20.50.8 --external-address 10.20.50.8 --port 7331 --port-range 17331-27331 --nickname arachni-master --pool-size 1 --pipe-id "Pipe 8" --weight 1000 --reroute-to-logfile

From dispatcher1:
ruby arachni_rpcd --address 10.20.50.10 --external-address 10.20.50.10 --port 7331 --port-range 17331-27331 --nickname arachni-dispatcher-xx1 --pool-size 10 --pipe-id="Pipe 10" --reroute-to-logfile --neighbour arachni-master:7331

From dispatcher2:
ruby arachni_rpcd --address 10.20.50.11 --external-address 10.20.50.11 --port 7331 --port-range 17331-27331 --nickname arachni-dispatcher-xx2 --pool-size 10 --pipe-id="Pipe 11" --reroute-to-logfile --neighbour arachni-master:7331

One thing i noticed is that we have all our neighbour's as the same. aracnhi_master. Would this create a problem ?

The command we run from the arachni_master
sudo /opt/arachni/current/bin/arachni_rpc --dispatcher-url=10.20.50.10:7331 --grid --spawns=1 --browser-cluster-ignore-images --scope-auto-redundant=4 --report-save-path=/opt/arachni/reports/testsite.afr --timeout 48:00:00

Is there anything wrong in our setup ? All boxes are fully updated ubuntu.

  1. 1 Posted by Kevin on 19 Jun, 2017 10:31 AM

    Kevin's Avatar

    One thing i might note that is not clear:
    One of the problems can be the arachni_rpc stops working and everytime you try to start it, it will just hang without doing any actual work.

    Other times it seems as the connection is lost and arachni_rpc client is unable to get any reports. I can see that the scans are still running through arachni_rpcd_monitor.

  2. Support Staff 2 Posted by Tasos Laskos on 19 Jun, 2017 03:39 PM

    Tasos Laskos's Avatar

    The way you've setup the grid no other Dispatcher than arachni-master will be used due to the high weight you've assigned to it.
    Also, try removing the --spawns option, it's unstable and will be removed.

  3. Support Staff 3 Posted by Tasos Laskos on 19 Jun, 2017 04:38 PM

    Tasos Laskos's Avatar

    My bad, got it backwards, arachni-master will never be used.

  4. 4 Posted by Kevin on 19 Jun, 2017 09:30 PM

    Kevin's Avatar

    Thanks. Will try without the --spawns. It just stated that i had to specify it.

    In regards to the arachni-master thing. We did that specifically in order to not get a heavy load on the arachni-master as it is managing the scans.

  5. Support Staff 5 Posted by Tasos Laskos on 19 Jun, 2017 10:07 PM

    Tasos Laskos's Avatar

    Don't specify --grid either so that you won't have to specify --spawns, the Dispatchers will load balance the scans amongst themselves still.

    Also, arachni-master isn't managing anything, no one node is more important that the others. Whichever one you ask it'll search the Grid for the one with the lowest workload score, ask it for an Instance and then pass that information back to you.

  6. 6 Posted by Kevin on 22 Jun, 2017 02:28 PM

    Kevin's Avatar

    Still very unstable. It is like the arachni_rpc is loosing it's connection to the dispatcher(After pressing ctrl+c it will hang forever)
    And then some of the new scans started with rpc will not start.

    I don't know if it is caused by ram consumption but some of our servers will also just become completely unresponsive in some cases. I just think it's hard to control ram when it is balancing a lot of scans by itself.

    Do you have any ideas or any debug info that i could provide that would be helpfull ?

  7. Support Staff 7 Posted by Tasos Laskos on 22 Jun, 2017 02:31 PM

    Tasos Laskos's Avatar

    I think you should run fewer scans, sounds like the servers are having a pretty hard time.
    Out of curiosity, how many scans are you running on these machines?

  8. 8 Posted by Kevin on 23 Jun, 2017 08:01 AM

    Kevin's Avatar

    Maybe one or 2 scans each. So around 5-10 scans for 5 servers with good specs.

  9. 9 Posted by Kevin on 23 Jun, 2017 08:24 AM

    Kevin's Avatar

    Also when using rpcd monitor i can see a scan is running but the arachni_rpc is dead.
    Can i stop the scanning on the rpcd on a dispatcher without restarting the whole thing ?

  10. 10 Posted by Kevin on 23 Jun, 2017 08:28 AM

    Kevin's Avatar

    Also noticed that the timeout feature is not working. If i state 48 hours it does not help and the scan just continues

  11. Support Staff 11 Posted by Tasos Laskos on 23 Jun, 2017 08:35 AM

    Tasos Laskos's Avatar
    1. 2 scans per machine is really low, I can run 12 scans easy, one for each CPU core. Are you sure it's not a network issue? Also, how much CPU % are the scans using when things start to lag?
    2. The scan can be killed just like any other process, the monitor should give you the PID.
    3. If connectivity is lost like you mentioned then the time-out won't work as it's controlled by the client.
  12. 12 Posted by Kevin on 23 Jun, 2017 12:09 PM

    Kevin's Avatar
    1. That is good to hear. Around 60% to 80%. I would not consider it as lagging but more as the client unable to contact the dispatcher. Also some of the boxes are completely dead meaning i can't even SSH into them. In google cloud console they show ~0% cpu at that time though.

    2. Good to know thanks.

    3. I understand. It is hard for me to believe that it is a network issue though, given that it is build in google cloud and the servers are next to eachother..

  13. Support Staff 13 Posted by Tasos Laskos on 23 Jun, 2017 12:17 PM

    Tasos Laskos's Avatar

    The boxes being completely dead is worrisome., can you perform an identical scan and periodically check the amount of running processes and disk usage in addition to RAM and CPU?

    Theoretically there could be a bug in the way browsers are spawned, leading to basically a fork bomb or the tmp files Arachni creates to offload workload to disk could be taking up all the space.

  14. 14 Posted by Kevin on 23 Jun, 2017 12:23 PM

    Kevin's Avatar

    Just to make a quick note. By completely dead i mean hard reset is the only way.
    I have just started ~20 scans and will monitor the RAM + CPU consumption + amount of running processes.

    The tmp files taking all the space is definately a worthy shot. I have 4 scans running for 10-20 minutes on one of the machines and 1.7gigs left of diskspace. Will it exceed that ?

    Also thanks for the quick replys. They are greatly appreciated.

  15. Support Staff 15 Posted by Tasos Laskos on 23 Jun, 2017 12:29 PM

    Tasos Laskos's Avatar

    Yeah tmp files can easily exceed 1.7GB.

    Recommended system requirements state 10GB of available disk space and that's per scan -- that's on the very generous side I'll grant you, but still.

    There are cases where disk space can grow even past that and that's a sign of trouble, but it can be mitigated via configuration. We'll cross that bridge when we come to it though.

  16. 16 Posted by Kevin on 23 Jun, 2017 12:34 PM

    Kevin's Avatar

    So for now it would be okay to up the servers to 40gigs og disk space each in order to run 4 scans per server ?

  17. Support Staff 17 Posted by Tasos Laskos on 23 Jun, 2017 12:35 PM

    Tasos Laskos's Avatar

    Yep, give that a shot and see if it makes a difference.

  18. 18 Posted by Kevin on 26 Jun, 2017 09:08 AM

    Kevin's Avatar

    Tried with increasing all the disks to 40 gigs. Ran 5 scans each per server with 4 cores each. I am now currently unable to contact any of the 5 servers here monday morning.

    As they are in google cloud i cannot currently see their disk or ram usage but i'm just assuming that disk errors are the problem.

    Do you think i pressed them too much with a total of 30 scans?

  19. Support Staff 19 Posted by Tasos Laskos on 26 Jun, 2017 09:15 AM

    Tasos Laskos's Avatar

    Yeah, better stick with one scan per core.
    Also, while the scans are running can you try watch -n1 dfand watch -n1 free over SSH? At the point where it gets stuck we'll know how things look resource-wise.

  20. 20 Posted by Kevin on 26 Jun, 2017 10:56 AM

    Kevin's Avatar

    total used free shared buff/cache available
    Mem: 15400392 14751468 518416 8768 130508 398496
    Swap: 0 0 0

    Managed to SSH into one of them again, so they are not completely dead. Looks like memory is close to zero though.
    Will try to monitor disk and memory with only 1 scan per core.

    I can see that arachni_rpc produces no output anymore but on the arachni-dispatcher it is still scanning. Can i get the reports somewhere when it finishes or are these lost?

  21. 21 Posted by Kevin on 27 Jun, 2017 08:34 AM

    Kevin's Avatar

    Hi again,

    Currently scanning only 3 applications per dispatcher.
    Have set up a 5 gig swapfile so we are currently on 21 gig ram.
    But i then got some error logs returned and grepped after memory
    10.20.50.10_20992.error.log:[2017-06-27 03:51:39 +0000] [Errno::ENOMEM] Cannot allocate memory - /opt/arachni/arachni-1.5-0.5.11/system/usr/bin/ruby
    10.20.50.15_18939.error.log:[2017-06-27 08:16:21 +0000] [Errno::ENOMEM] Cannot allocate memory - /opt/arachni/arachni-1.5-0.5.11/system/usr/bin/ruby
    10.20.50.15_23423.error.log:[2017-06-27 06:27:49 +0000] [Errno::ENOMEM] Cannot allocate memory - /opt/arachni/arachni-1.5-0.5.11/system/usr/bin/ruby

    Could mem leaks be the error ?
    This is our current scan:
    arachni_rpc --dispatcher-url=10.20.50.8:7331 --browser-cluster-ignore-images --scope-auto-redundant=4 --timeout=48:00:00 --report-save-path=/opt/arachni/reports/uuid.afr --http-request-queue-size=50 --browser-cluster-pool-size=4 --checks=,-common_,-backup_*,-backdoors

  22. Support Staff 22 Posted by Tasos Laskos on 27 Jun, 2017 02:05 PM

    Tasos Laskos's Avatar

    That's a lot of RAM, there could be a leak in the scanner or it could be that one of the scans just needs a lot of memory, it depends on the web application.

  23. 23 Posted by Kevin on 28 Jun, 2017 11:39 AM

    Kevin's Avatar

    So if there is a leak in the scanner how do i fix it ?

  24. Support Staff 24 Posted by Tasos Laskos on 28 Jun, 2017 02:41 PM

    Tasos Laskos's Avatar

    You can try playing with the --scope options, especially the --scope-dom ones.

    Unfortunately as far as the scanner is concerned I've gotten Arachni as far as I can take it, which is why I've been working on a new engine which will solve these kinds of issues.

    And since you're using the Grid, one new feature of the new engine that was added after I made the blog post, is a much smarter Grid that is aware of available system resources and automatically calculates the amount of scans that can be safely performed in parallel so it won't let you shoot yourself in the foot.

    Also, a new queue system has been implemented to which you can post as many scan jobs as you wish and it will safely distribute and manage them for you.

    Unfortunately, I haven't got an ETA for the new engine, it will probably be a while before a beta is available.

    Until then, try experimenting with the available options and have a look at this article: http://support.arachni-scanner.com/kb/general-use/optimizing-for-fa...

  25. 25 Posted by Kevin on 29 Jun, 2017 08:33 AM

    Kevin's Avatar

    Okay fair enough.
    I will be trying the --scope-dom ones. A bit clueless as i have no idea how many js events are needed in --scope-dom-event-limit

  26. 26 Posted by Kevin on 29 Jun, 2017 08:58 AM

    Kevin's Avatar

    Also one last thing:
    I asked before but no answer: I can see that arachni_rpc produces no output anymore but on the arachni-dispatcher it is still scanning. I can see that through arachni_rpcd_monitor. Can i get the reports somewhere when it finishes or are these lost if arachni_rpc cannot gather the result ?

    Best regards,
    Kevin

  27. 27 Posted by Kevin on 29 Jun, 2017 09:31 AM

    Kevin's Avatar

    And are there any point in using the 2.0 development or nightlies or is that far fetched to fix the problem ?

  28. 28 Posted by Kevin on 29 Jun, 2017 02:24 PM

    Kevin's Avatar

    Memory output
    8698 root 20 0 6741740 2.655g 26148 S 0.3 18.1 21:12.44 phantomjs
    5767 root 20 0 4594416 1.360g 9504 S 19.3 9.3 8:21.87 phantomjs
    6202 root 20 0 4621088 1.253g 10932 S 19.6 8.5 8:02.36 phantomjs
    10137 root 20 0 3673920 777200 9428 S 20.3 5.0 4:53.24 phantomjs
    9885 root 20 0 3578212 728336 10660 R 18.9 4.7 5:08.96 phan

  29. 29 Posted by Kevin on 29 Jun, 2017 10:08 PM

    Kevin's Avatar

    FIXED
    For anyone else wondering what the issue was, it was phantomjs. The setting "ignore images" should not be used.

    The bug is years old and described in https://github.com/ariya/phantomjs/issues/12903

    Should be noted in the docs that the "ignore images" contains a bug. Anyway thanks for all the other suggestions Tasos.

  30. Support Staff 30 Posted by Tasos Laskos on 01 Jul, 2017 11:19 AM

    Tasos Laskos's Avatar

    Glad you identified the issue, I hadn't heard of this before. I may need to disable this option in Arachni.

  31. Tasos Laskos closed this discussion on 01 Jul, 2017 11:19 AM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac