Arachni RPC running with many bugs.

Kevin's Avatar

Kevin

19 Jun, 2017 10:29 AM

Hi,

Currently using arachni-1.5-0.5.11.
We have been using the grid mode with the balanced options but we can never get it running even somewhat stable. A lot of the first issues were around RAM issues etc but all these are fixed.
So our current setup is: 5x Servers with 16 gb ram each and a master with also 16 gb ram.

We are using the arachni_rpc in order to initiate the scans.

From arachni master:
ruby arachni_rpcd --address 10.20.50.8 --external-address 10.20.50.8 --port 7331 --port-range 17331-27331 --nickname arachni-master --pool-size 1 --pipe-id "Pipe 8" --weight 1000 --reroute-to-logfile

From dispatcher1:
ruby arachni_rpcd --address 10.20.50.10 --external-address 10.20.50.10 --port 7331 --port-range 17331-27331 --nickname arachni-dispatcher-xx1 --pool-size 10 --pipe-id="Pipe 10" --reroute-to-logfile --neighbour arachni-master:7331

From dispatcher2:
ruby arachni_rpcd --address 10.20.50.11 --external-address 10.20.50.11 --port 7331 --port-range 17331-27331 --nickname arachni-dispatcher-xx2 --pool-size 10 --pipe-id="Pipe 11" --reroute-to-logfile --neighbour arachni-master:7331

One thing i noticed is that we have all our neighbour's as the same. aracnhi_master. Would this create a problem ?

The command we run from the arachni_master
sudo /opt/arachni/current/bin/arachni_rpc --dispatcher-url=10.20.50.10:7331 --grid --spawns=1 --browser-cluster-ignore-images --scope-auto-redundant=4 --report-save-path=/opt/arachni/reports/testsite.afr --timeout 48:00:00

Is there anything wrong in our setup ? All boxes are fully updated ubuntu.

  1. 1 Posted by Kevin on 19 Jun, 2017 10:31 AM

    Kevin's Avatar

    One thing i might note that is not clear:
    One of the problems can be the arachni_rpc stops working and everytime you try to start it, it will just hang without doing any actual work.

    Other times it seems as the connection is lost and arachni_rpc client is unable to get any reports. I can see that the scans are still running through arachni_rpcd_monitor.

  2. Support Staff 2 Posted by Tasos Laskos on 19 Jun, 2017 03:39 PM

    Tasos Laskos's Avatar

    The way you've setup the grid no other Dispatcher than arachni-master will be used due to the high weight you've assigned to it.
    Also, try removing the --spawns option, it's unstable and will be removed.

  3. Support Staff 3 Posted by Tasos Laskos on 19 Jun, 2017 04:38 PM

    Tasos Laskos's Avatar

    My bad, got it backwards, arachni-master will never be used.

  4. 4 Posted by Kevin on 19 Jun, 2017 09:30 PM

    Kevin's Avatar

    Thanks. Will try without the --spawns. It just stated that i had to specify it.

    In regards to the arachni-master thing. We did that specifically in order to not get a heavy load on the arachni-master as it is managing the scans.

  5. Support Staff 5 Posted by Tasos Laskos on 19 Jun, 2017 10:07 PM

    Tasos Laskos's Avatar

    Don't specify --grid either so that you won't have to specify --spawns, the Dispatchers will load balance the scans amongst themselves still.

    Also, arachni-master isn't managing anything, no one node is more important that the others. Whichever one you ask it'll search the Grid for the one with the lowest workload score, ask it for an Instance and then pass that information back to you.

  6. 6 Posted by Kevin on 22 Jun, 2017 02:28 PM

    Kevin's Avatar

    Still very unstable. It is like the arachni_rpc is loosing it's connection to the dispatcher(After pressing ctrl+c it will hang forever)
    And then some of the new scans started with rpc will not start.

    I don't know if it is caused by ram consumption but some of our servers will also just become completely unresponsive in some cases. I just think it's hard to control ram when it is balancing a lot of scans by itself.

    Do you have any ideas or any debug info that i could provide that would be helpfull ?

  7. Support Staff 7 Posted by Tasos Laskos on 22 Jun, 2017 02:31 PM

    Tasos Laskos's Avatar

    I think you should run fewer scans, sounds like the servers are having a pretty hard time.
    Out of curiosity, how many scans are you running on these machines?

  8. 8 Posted by Kevin on 23 Jun, 2017 08:01 AM

    Kevin's Avatar

    Maybe one or 2 scans each. So around 5-10 scans for 5 servers with good specs.

  9. 9 Posted by Kevin on 23 Jun, 2017 08:24 AM

    Kevin's Avatar

    Also when using rpcd monitor i can see a scan is running but the arachni_rpc is dead.
    Can i stop the scanning on the rpcd on a dispatcher without restarting the whole thing ?

  10. 10 Posted by Kevin on 23 Jun, 2017 08:28 AM

    Kevin's Avatar

    Also noticed that the timeout feature is not working. If i state 48 hours it does not help and the scan just continues

  11. Support Staff 11 Posted by Tasos Laskos on 23 Jun, 2017 08:35 AM

    Tasos Laskos's Avatar
    1. 2 scans per machine is really low, I can run 12 scans easy, one for each CPU core. Are you sure it's not a network issue? Also, how much CPU % are the scans using when things start to lag?
    2. The scan can be killed just like any other process, the monitor should give you the PID.
    3. If connectivity is lost like you mentioned then the time-out won't work as it's controlled by the client.
  12. 12 Posted by Kevin on 23 Jun, 2017 12:09 PM

    Kevin's Avatar
    1. That is good to hear. Around 60% to 80%. I would not consider it as lagging but more as the client unable to contact the dispatcher. Also some of the boxes are completely dead meaning i can't even SSH into them. In google cloud console they show ~0% cpu at that time though.

    2. Good to know thanks.

    3. I understand. It is hard for me to believe that it is a network issue though, given that it is build in google cloud and the servers are next to eachother..

  13. Support Staff 13 Posted by Tasos Laskos on 23 Jun, 2017 12:17 PM

    Tasos Laskos's Avatar

    The boxes being completely dead is worrisome., can you perform an identical scan and periodically check the amount of running processes and disk usage in addition to RAM and CPU?

    Theoretically there could be a bug in the way browsers are spawned, leading to basically a fork bomb or the tmp files Arachni creates to offload workload to disk could be taking up all the space.

  14. 14 Posted by Kevin on 23 Jun, 2017 12:23 PM

    Kevin's Avatar

    Just to make a quick note. By completely dead i mean hard reset is the only way.
    I have just started ~20 scans and will monitor the RAM + CPU consumption + amount of running processes.

    The tmp files taking all the space is definately a worthy shot. I have 4 scans running for 10-20 minutes on one of the machines and 1.7gigs left of diskspace. Will it exceed that ?

    Also thanks for the quick replys. They are greatly appreciated.

  15. Support Staff 15 Posted by Tasos Laskos on 23 Jun, 2017 12:29 PM

    Tasos Laskos's Avatar

    Yeah tmp files can easily exceed 1.7GB.

    Recommended system requirements state 10GB of available disk space and that's per scan -- that's on the very generous side I'll grant you, but still.

    There are cases where disk space can grow even past that and that's a sign of trouble, but it can be mitigated via configuration. We'll cross that bridge when we come to it though.

  16. 16 Posted by Kevin on 23 Jun, 2017 12:34 PM

    Kevin's Avatar

    So for now it would be okay to up the servers to 40gigs og disk space each in order to run 4 scans per server ?

  17. Support Staff 17 Posted by Tasos Laskos on 23 Jun, 2017 12:35 PM

    Tasos Laskos's Avatar

    Yep, give that a shot and see if it makes a difference.

  18. 18 Posted by Kevin on 26 Jun, 2017 09:08 AM

    Kevin's Avatar

    Tried with increasing all the disks to 40 gigs. Ran 5 scans each per server with 4 cores each. I am now currently unable to contact any of the 5 servers here monday morning.

    As they are in google cloud i cannot currently see their disk or ram usage but i'm just assuming that disk errors are the problem.

    Do you think i pressed them too much with a total of 30 scans?

  19. Support Staff 19 Posted by Tasos Laskos on 26 Jun, 2017 09:15 AM

    Tasos Laskos's Avatar

    Yeah, better stick with one scan per core.
    Also, while the scans are running can you try watch -n1 dfand watch -n1 free over SSH? At the point where it gets stuck we'll know how things look resource-wise.

  20. 20 Posted by Kevin on 26 Jun, 2017 10:56 AM

    Kevin's Avatar

    total used free shared buff/cache available
    Mem: 15400392 14751468 518416 8768 130508 398496
    Swap: 0 0 0

    Managed to SSH into one of them again, so they are not completely dead. Looks like memory is close to zero though.
    Will try to monitor disk and memory with only 1 scan per core.

    I can see that arachni_rpc produces no output anymore but on the arachni-dispatcher it is still scanning. Can i get the reports somewhere when it finishes or are these lost?

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac