Save to My Lists
Unclaimed
Unclaimed

Top Rated Apache Nutch Alternatives

Apache Nutch Reviews & Product Details

Verified User in Internet
AI
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

Provides an in-depth list of features, html tags, site maps Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Didn't have a lot of documentation at the time I was using it which made it hard to use. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Crawled our domain urls and got useful revelant information Review collected by and hosted on G2.com.

Apache Nutch Overview

What is Apache Nutch?

Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing.

Apache Nutch Details
Show LessShow More
Product Description

Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing.


Seller Details
Year Founded
1999
HQ Location
Wakefield, MA
Twitter
@TheASF
66,228 Twitter followers
LinkedIn® Page
www.linkedin.com
2,291 employees on LinkedIn®
Description

Community-led development since 1999. FoundationProjectsPeopleGet InvolvedDownloadSupport ApacheHome. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Recent Apache Nutch Reviews

Narendra A.
NA
Narendra A.Enterprise (> 1000 emp.)
5.0 out of 5
"Apache Nutch is Rockstar in terms of huge data crawling."
When I used apache Nutch I was amazed with the speed it crawls data and the libraries and data structures provided to customise your crawling and r...
Verified User
I
Verified UserSmall-Business (50 or fewer emp.)
5.0 out of 5
"Best for web crawling"
I like the default index generation for crawler
SA
Sinem A.Mid-Market (51-1000 emp.)
5.0 out of 5
"Web Crawling Tool"
It was an open source tool that you can add your own plugins. You can change it own code as you wish. It was very easy to use. It can be run with d...
Security Badge
This seller hasn't added their security information yet. Let them know that you'd like them to add it.
0 people requested security information

Apache Nutch Media

Answer a few questions to help the Apache Nutch community
Have you used Apache Nutch before?
Yes

19 out of 20 Total Reviews for Apache Nutch

4.0 out of 5
The next elements are filters and will change the displayed results once they are selected.
Search reviews
Popular Mentions
The next elements are radio elements and sort the displayed results by the item selected and will update the results displayed.
Hide FiltersMore Filters
The next elements are filters and will change the displayed results once they are selected.
The next elements are filters and will change the displayed results once they are selected.

Overall Review Sentiment for Apache NutchQuestion

Time to Implement
<1 day
>12 months
Return on Investment
<6 months
48+ months
Ease of Setup
0 (Difficult)
10 (Easy)
Log In
Want to see more insights from verified reviewers?
Log in to view review sentiment.
G2 reviews are authentic and verified.
Narendra A.
NA
Senior Software Engineer
Enterprise(> 1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
(Original )Information
What do you like best about Apache Nutch?

When I used apache Nutch I was amazed with the speed it crawls data and the libraries and data structures provided to customise your crawling and reading the data in desired format. I was crawling the whole IBM data to get the insights and do text analytics on it. The kind of support I got from the forums was also great. So overall it was nice experience using apache Nutch crawler. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

What I disliked was the video support it provides in the Internet. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

It's nice to use and provides lots of flexibility. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

I was solving the problem in my organisation for data analytics. Where we automate the whole process of bidding with text analytics. Review collected by and hosted on G2.com.

Jaydip L.
JL
Senior Software Engineer
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Open Source

Scalable

Parsing and indexing techniques.

Easy Integration with elastic search and solr.

Different plugins to parse various content types. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Nothing much in my list of dislike because we really enjoyed it very much and it fulfilled our organization needs. But based on experience I can say some cons like it requires good infrastructure in place and consumes good amount of memory and cpu utilization. We also feel if nutch provide good dashboard and kind of admin panel then it would have very helpful to us. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

When we had requirement for crawling we went with different tools like StormCrawler, scrapy etc. But we found this tool as very much reliable and most importantly open source. It's various features like automatic crawling, finding inner links to crawl, parse different kind of contents, various integrations etc. made us to go for this tool and believe me we never felt regret after using it. Best crawling tool. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Our business need is to develop search engine where we provide list of URLs to nutch and it will crawl all those URLs as well as find its inner URL and crawl them as well. We were storing these crawled data to cassandra db and then there was elastic search in place to fulfill our search query. These was actually working perfectly and nutch really helped us to provide crawling with their abilities to parse different content types and store them. Review collected by and hosted on G2.com.

SA
Quality Assurance Test Engineer
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: Seller invite
What do you like best about Apache Nutch?

It was an open source tool that you can add your own plugins. You can change it own code as you wish. It was very easy to use. It can be run with different tools also. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

You should know which version of nutch is suitable to other tools you work with. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

I used it while i was doing my thesis to crawl Turkish web pages for my improved search engine algorithm. Also i used it at work in a Turkish search engine project. Review collected by and hosted on G2.com.

Naser A.
NA
Research Officer
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

I have been using apache nutch since 3 or 4 years, I like it as an open source tool which can run on a system with normal specs and crawl millions of millions pages. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

* I don't like its seed creation algorightm, it makes cluster and then went to a loop to crawl the same webesites when it has crawled million of pages.

* Its configuration not easy.

* job Automations not provided

* Documentation is not good.

* Support is not good. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Not easy at early days but once you set it up it goes beyond your expectation. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

I have fetched large number of websites which contain specific language to build a local search engine Review collected by and hosted on G2.com.

Prafulla R.
PR
Technical Architect
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

-Easy to configure

-Stable backend store Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Use of Java makes it a little bulky

One has to be careful of heap size otherwise OOM errors are inevitable. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Be careful about the Heap size setting in the configuration file. Also, use HBase like NoSQL data store to store crawled data. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Implementation of eCommerce product comparison engine.

Nutch enables data crawling in ethical ways. Review collected by and hosted on G2.com.

Krishnan S.
KS
Software Engineer
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Crawl of URL is excellent function to read the content. Nutch is very useful tool to read the content in the document of various depth. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Bit hard to customize the crawl function. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Very nice tool to use. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Prepared the content for search engine for a static we page. Review collected by and hosted on G2.com.

Ruchika J.
RJ
Hadoop Developer
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

Plugins for indexing and searching.

Integration with solar and other tools.

It finely work in Hadoop clusters as well. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Lack of community to discuss any issue or concern.

Lack of documents for the implementation and integration of nutch. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

For web crawling and data mining you can easily implement nutch with other big data technologies. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Crawl and parse data from XML data from urls.Apache Tika used for parsing , indexed and filter data from solar and created SEO tool and ppc tool.

I got domain specific materials but it doesn't have batch mode.

It work fine on clusters Review collected by and hosted on G2.com.

Usama T.
UT
Python Developer
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Its feature to crawl complete web with inlinks and out links which make it forever crawl. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

We need to have a very strong knowledge of Apache Hadoop, Hbase, Zookeeper, and complete environment setup. We have to be very efficient in it for using this. Moreover, we can not view Hbase data easily which is also very difficult. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

I am working on Search Engine and for it, Crawling is the basic need which I am getting through Apache Nutch. I can crawl complete web data by providing few links and make it to crawl through in-links and out-links. Review collected by and hosted on G2.com.

Fred Z.
FZ
Founder
Enterprise(> 1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

I have deployed Nutch on several times when I needed to stand up a crawler quickly. It is free, straightforward, reliable, well documented, and comes with an OTS integration with Apache Solr for search. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

The directory and file partioning scheme for the crawler can be a bit confusing. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

consider Google Programmable Search Engine Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

It is an excellent solution if you need a quick, simple, free crawler. Review collected by and hosted on G2.com.

Verified User in Pharmaceuticals
IP
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

I like the default index generation for crawler Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

When working with Ubuntu OS I find hard to setting the directory paths Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

I have successfully integrated Apache Nutch to Hadoop and hive eco systems and sets the rule based contents in the web pages Review collected by and hosted on G2.com.