Top Rated Apache Nutch Alternatives

Its feature to crawl complete web with inlinks and out links which make it forever crawl. Review collected by and hosted on G2.com.
We need to have a very strong knowledge of Apache Hadoop, Hbase, Zookeeper, and complete environment setup. We have to be very efficient in it for using this. Moreover, we can not view Hbase data easily which is also very difficult. Review collected by and hosted on G2.com.
19 out of 20 Total Reviews for Apache Nutch
Overall Review Sentiment for Apache Nutch
Log in to view review sentiment.

When I used apache Nutch I was amazed with the speed it crawls data and the libraries and data structures provided to customise your crawling and reading the data in desired format. I was crawling the whole IBM data to get the insights and do text analytics on it. The kind of support I got from the forums was also great. So overall it was nice experience using apache Nutch crawler. Review collected by and hosted on G2.com.
What I disliked was the video support it provides in the Internet. Review collected by and hosted on G2.com.

Open Source
Scalable
Parsing and indexing techniques.
Easy Integration with elastic search and solr.
Different plugins to parse various content types. Review collected by and hosted on G2.com.
Nothing much in my list of dislike because we really enjoyed it very much and it fulfilled our organization needs. But based on experience I can say some cons like it requires good infrastructure in place and consumes good amount of memory and cpu utilization. We also feel if nutch provide good dashboard and kind of admin panel then it would have very helpful to us. Review collected by and hosted on G2.com.
It was an open source tool that you can add your own plugins. You can change it own code as you wish. It was very easy to use. It can be run with different tools also. Review collected by and hosted on G2.com.
You should know which version of nutch is suitable to other tools you work with. Review collected by and hosted on G2.com.

I have been using apache nutch since 3 or 4 years, I like it as an open source tool which can run on a system with normal specs and crawl millions of millions pages. Review collected by and hosted on G2.com.
* I don't like its seed creation algorightm, it makes cluster and then went to a loop to crawl the same webesites when it has crawled million of pages.
* Its configuration not easy.
* job Automations not provided
* Documentation is not good.
* Support is not good. Review collected by and hosted on G2.com.

-Easy to configure
-Stable backend store Review collected by and hosted on G2.com.
Use of Java makes it a little bulky
One has to be careful of heap size otherwise OOM errors are inevitable. Review collected by and hosted on G2.com.

Crawl of URL is excellent function to read the content. Nutch is very useful tool to read the content in the document of various depth. Review collected by and hosted on G2.com.
Bit hard to customize the crawl function. Review collected by and hosted on G2.com.

Plugins for indexing and searching.
Integration with solar and other tools.
It finely work in Hadoop clusters as well. Review collected by and hosted on G2.com.
Lack of community to discuss any issue or concern.
Lack of documents for the implementation and integration of nutch. Review collected by and hosted on G2.com.

I have deployed Nutch on several times when I needed to stand up a crawler quickly. It is free, straightforward, reliable, well documented, and comes with an OTS integration with Apache Solr for search. Review collected by and hosted on G2.com.
The directory and file partioning scheme for the crawler can be a bit confusing. Review collected by and hosted on G2.com.

Multidepth crawling capabilities are really good. Data extraction from web pages is remarkable. Review collected by and hosted on G2.com.
Based on Map reduce, hence slower. Adding customisations included writing plugins and building it, no feature for dependency injection. Review collected by and hosted on G2.com.