Ok, what I noticed, the endpoint that is provided, if it returns a 404, Arachni doesn't spider it.
http://test-endpoint:8081/ has nothing, so we return a 404. It makes logical sense to start the scan from this start point because it can then spider into all areas. Whereas, starting from test1 will mean that it may not reach test2/test3/ so on. How can I still get Arachni to run even when response code is 404?
Yeah i'm sure, I changed the microservice to return a 200 instead of a 404 on the "/" resource. And it spiders but this is hacky, I don't want to change the application to make it scannable. Is the original behaviour of Arachni to not run on a 404?
I thought it would've been able to spider into other parts of the url, I'm not too sure how the underlying implementation works but I thought it would just pump in different url endings /AB, /AC etc etc... even if a 404 was present. I think you're right, if there is a 404, it doesn't expose any usable paths, thus, Arachni only scans one page. I wish this was more explicitly stated, maybe it is just my understanding.
How should I do this then? Is there a way of automatically spidering into URL's from a 404 "/"?
The response code is irrelevant, if the response has paths that can be followed then they will be followed.
I think what confused you are the checks that discover directories, those aren't part of the crawl and will not run on non-200 codes, so you will see a difference in behavior.
The important part is that the scan will not run if the seed URLs hasn't got any usable paths, you either need to provide a different target URL or extend the paths manually.