Distributed crawl

Beunwa's Avatar

Beunwa

28 Nov, 2012 09:49 AM

Hi,

I'm kind of lost with the new distributed crawl feature, does the is_distributable plugin feature work with it ?
Is it available by default over the HPC grid ?

  1. Support Staff 1 Posted by Tasos Laskos on 28 Nov, 2012 12:36 PM

    Tasos Laskos's Avatar

    When a plugin says that it is_distributable it means that it should be run in all instances, not just the master one.
    And if that plugin stores results it should also provide a self.merge method to merge the results it has logged across all instances.

    Does that make sense?

  2. 2 Posted by Beunwa on 28 Nov, 2012 12:50 PM

    Beunwa's Avatar

    Yes I understand that, I do it that way.

    But I have trouble to understand how to manage RPC instance, RPC dispatchers. I would like to use the HPC for distributed crawl but unfortunatly I'm lost !

    Could you write something about that in the knowledge base ?

  3. Support Staff 3 Posted by Tasos Laskos on 28 Nov, 2012 12:57 PM

    Tasos Laskos's Avatar

    I can't do that yet since it's not part of the stable branch and it could lead to confusion but I'd be glad to explain it to you 1-1.

    First of all, do you want to use instances on multiple machines to perform the scans or multiple instances on one machine?

  4. 4 Posted by beunwa on 28 Nov, 2012 01:12 PM

    beunwa's Avatar

    I want to run multiple instances across different machine.

  5. Support Staff 5 Posted by Tasos Laskos on 28 Nov, 2012 01:30 PM

    Tasos Laskos's Avatar

    Ok, have a look at:
    1. https://github.com/Arachni/arachni/wiki/RPC-server#wiki-grid
    2. https://github.com/Arachni/arachni/wiki/RPC-API

    The only extra thing you need to do is call framework.set_as_master before calling framework.run or service.scan, whichever you prefer.

    And the master Instance will use one Instance from each Grid Dispatcher as slaves.

  6. 6 Posted by Beunwa on 29 Nov, 2012 12:47 PM

    Beunwa's Avatar

    Thank you for this precision.

    When I try to launch a script using Arachni::RPC::Pure::Client with bundle exec I always get cannot load such file -- arachni/rpc/pure (of course I have put require 'arachni/rpc/pure' at the top of my script)

    It seems like arachni-rpc-pure cannot satisfy the dependencies of arachni-rpc-em 0.1.3dev

    If I'm write I need to use bundle exec to use the distributed feature located in experimental branch.

  7. Support Staff 7 Posted by Tasos Laskos on 29 Nov, 2012 12:54 PM

    Tasos Laskos's Avatar

    The Pure client was written for lightweight cases when Arachni isn't installed on the client side.

    You can go ahead and require 'arachni/rpc/client', like here: https://github.com/Arachni/arachni/issues/207#issuecomment-10066220

    And actually, like you said, the pure gem is not compatible with the dev branches anyway -- I had to make some protocol optimizations to accommodate the distributed crawler.

  8. Tasos Laskos closed this discussion on 29 Nov, 2012 09:41 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac