Arachni: Discussion

Arachni exits silently

2016-01-18T11:41:56Z

I'm pretty sure you're running out of RAM and that causes the kernel to kill the scanner.
If that's the case then the output of dmesg will include a log of it, increasing the amount of RAM should solve the exit problem.

About the missing pages, I'm afraid I'm going to need more information, what are these 700 pages, are they of a specific type of is there a specific route to them?
What pages does Arachni find? Is there a missing feature that the missing pages require?

About the workload, that's how it usually goes, lots of new data at first and then nothing or very little new workload until the end of the scan.

Of course, I've no idea how your site is setup so I can only speculate.

Cheers

Arachni exits silently

2016-01-18T12:25:32Z

Thanks for a quick reply :)

It was indeed a memory problem that caused the process to stop, I'll add more.

About the missing pages:
The webshop has a few basic pages (front page, about us, terms, etc.) that are all found.
Dynamic pages like cart and payment are found to.

The rest of the 700 pages are the product pages, these are technically similar, all linked from the same index page, with only the content differing.
Of these pages only 39 is found, and this is without any exclude or redundancy pattern in the scan profile.
How can I get all pages in the scan?

Thanks again for you help.

Arachni exits silently

2016-01-18T12:30:10Z

Are there any clear paths to them without requiring JS?
Because if so then they should have been found and included in the sitemap.

If browser interaction is necessary and the pages are pretty much the same then they won't be forwarded to the Framework to be audited.

If I could see an example that could help, you can send the info privately at tasos[dot]laskos[at]arachni-scanner[dot]com

Arachni exits silently

2016-01-18T13:08:20Z

You mentioned that it takes 26 hours to scan 120 pages, that's an incredibly large amount of time for that type of workload.
Do you have any runtime statistics for the scans?
Request/second, response times, max concurrency etc. are visible in the scan progress page.

I'm guessing that the server takes a long time to response, thus resulting in time-outs leading to decreased coverage or maybe outright server failure or some intermediate security appliance cutting you off.
Have you spotted any indication for the above?

Arachni exits silently

2016-01-18T13:26:07Z

I think so too.

Stats from Arachni:
Requests per. sec: ≈ 60
Concurrency: 4 (also tried 6 and 2)
Total number of requests: > 1.000.000
I can't remember the response time from Arachni, but the average server response is <100 ms according to newrelic.com during the scans.

There were a few timeouts but it was less than 100 out of >1.000.000 requests.

The metrics above is reached within the first 8 hours of scanning, and then only minor changes occur in the last 18 hours.

Arachni exits silently

2016-01-19T16:44:34Z

The server seems pretty quick so I guess there's just a lot of workload.

About the coverage issue, turns out that the response size for the page with the large listing exceeds the default maximum of 500,000KB so its body is discarded.
You can increase the maximum size with this option: --http-response-max-size=1000000

In the meantime, I'll add some helpful messages for when this happens.

Let me know if this fixes your issue.

Arachni exits silently

2016-01-20T08:34:03Z

I’ve set the http-response-max-size=1000000 and started a new scan with all checks and audits disabled, it’s found 737 pages so far, so that's much better than before. Thanks for pointing that out.

However it has been running for more than 12 hours now, most of them using only 5% of the available CPU resources with a load average of 0.1.
How can I make the scanner use more of the available resources and finish faster?

Thanks a lot for you help.

Arachni exits silently

2016-01-20T11:31:34Z

Usually you can't, but just to be safe, can you show me the runtime stats and the progress output for when Arachni is using low resources?

Arachni exits silently

2016-01-20T12:21:00Z

Sure, I've attached a screenshot of the runtime stats and one of the system stats.

The scan was started at approx 21:10 on the timeline, it is still running and hasn't found any new pages for the last 15 hours or so.
All checks and audits were disabled.

Let me know if you need anything else.

Arachni exits silently

2016-01-20T12:37:07Z

I was hoping that you were using the CLI, it's pretty hard to tell what's going on from the WebUI.

I do have a few observations though, the max request concurrency of 5 is pretty low, the default is 20, so that will severely slow down the scan.
However, in this case the server seems quite responsive and the resulting throughput of 70 req/s isn't that bad.

If I had to guess, based on these stats, the outstanding workload is browser jobs, triggering events on elements which however do not generate any new workload (i.e. pages with new elements), the system does however need to process them to provide adequate coverage.

May I suggest using the nightlies to run a new scan from the CLI?
There have been massive performance improvements when it comes to event triggering and the CLI's output will help pinpoint the issue.
It's unlikely but there could also be a loop of some kind which may require further scope configuration.

I'm not 100% on the nightlies' stability so if you see any errors please let me know.

Thanks a lot for all the feedback, I really appreciate it.

Cheers

Arachni exits silently

2016-01-20T12:40:04Z

Also, I like that system monitor, what is it?

Arachni exits silently

2016-01-20T13:05:06Z

Yes, I will run the scan with the latest version in CLI and get back to you.

The Screen shot is from http://newrelic.com, a truly amazing saas monitoring product, even the free version is very useful.

Arachni exits silently

2016-01-20T14:32:58Z

I've downloaded the 2.0 nightly and it completed the crawl in a few minutes !! Much much faster than the 1.3.

I'm currently running a full scan with default settings, it uses much less ram than the old version so that's great. Looking forward to the report.

I will keep you posted about the results.

Best regards

Arachni exits silently

2016-01-21T09:43:11Z

The nightly build runs a lot better.
I've only tried the CLI version, but that works like a charm, the scans finish much faster and uses less memory.
A full scan of my site now runs for ≈ 4hours (was 26 hours in the ).
This is surely the way to go.

Do you have any idea when the current nightly will make in into the stable branch?

Arachni exits silently

2016-01-21T13:26:27Z

I was expecting them to make a big difference but not that much, that's great news.

I'm waiting for a couple of more issues to be resolved before releasing, shouldn't be too long.

Arachni exits silently

2016-01-21T13:39:24Z

Cool, I'm looking forward to it.

Thank you very much for your assistance.

Arachni exits silently

2016-01-21T13:41:17Z

No problem at all, btw, do you happen to have kept the stable vs nightly RAM usage by any chance?

Arachni exits silently

2016-01-21T13:51:42Z

I don't have screenshots, but it was starting at 60% growing to 90% of the 2 GB memory when running arachni 1.3.
Arachni nightly (2.0dev) was using just below 20% (of the same 2GB) throughout the scan, which also completed much faster.
I'm running Debian 8.2.

A difference is that the 1.3 was triggered from the web UI with the default profile where the 2.0dev was triggered from CLI, with just the http_max_request_size set to 1000000.

I don't know if there is a difference of the web UI and the CLI default.

Arachni exits silently

2016-01-21T13:58:17Z

No there's no difference.

That's great news, I didn't have a complex target to benchmark against and it seems that my changes had a much bigger effect than I was hoping for.

Thanks a lot man. :)