Save to My Lists
Unclaimed
Unclaimed

Top Rated Apache Nutch Alternatives

Apache Nutch Reviews & Product Details - Page 2

Apache Nutch Overview

What is Apache Nutch?

Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing.

Apache Nutch Details
Show LessShow More
Product Description

Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing.


Seller Details
Year Founded
1999
HQ Location
Wakefield, MA
Twitter
@TheASF
66,228 Twitter followers
LinkedIn® Page
www.linkedin.com
2,291 employees on LinkedIn®
Description

Community-led development since 1999. FoundationProjectsPeopleGet InvolvedDownloadSupport ApacheHome. We consider ourselves not simply a group of projects sharing a server, but rather a community of developers and users.

Recent Apache Nutch Reviews

Narendra A.
NA
Narendra A.Enterprise (> 1000 emp.)
5.0 out of 5
"Apache Nutch is Rockstar in terms of huge data crawling."
When I used apache Nutch I was amazed with the speed it crawls data and the libraries and data structures provided to customise your crawling and r...
Verified User
I
Verified UserSmall-Business (50 or fewer emp.)
5.0 out of 5
"Best for web crawling"
I like the default index generation for crawler
SA
Sinem A.Mid-Market (51-1000 emp.)
5.0 out of 5
"Web Crawling Tool"
It was an open source tool that you can add your own plugins. You can change it own code as you wish. It was very easy to use. It can be run with d...
Security Badge
This seller hasn't added their security information yet. Let them know that you'd like them to add it.
0 people requested security information

Apache Nutch Media

Answer a few questions to help the Apache Nutch community
Have you used Apache Nutch before?
Yes

20 Apache Nutch Reviews

4.0 out of 5
The next elements are filters and will change the displayed results once they are selected.
Search reviews
Popular Mentions
The next elements are radio elements and sort the displayed results by the item selected and will update the results displayed.
Hide FiltersMore Filters
The next elements are filters and will change the displayed results once they are selected.
The next elements are filters and will change the displayed results once they are selected.
20 Apache Nutch Reviews
4.0 out of 5
20 Apache Nutch Reviews
4.0 out of 5

Overall Review Sentiment for Apache NutchQuestion

Time to Implement
<1 day
>12 months
Return on Investment
<6 months
48+ months
Ease of Setup
0 (Difficult)
10 (Easy)
Log In
Want to see more insights from verified reviewers?
Log in to view review sentiment.
G2 reviews are authentic and verified.
Navom S.
NS
Software Developer
Enterprise(> 1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Multidepth crawling capabilities are really good. Data extraction from web pages is remarkable. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Based on Map reduce, hence slower. Adding customisations included writing plugins and building it, no feature for dependency injection. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Map reduce based implementation in previous implementation is slower. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Crawling web pages and government websites to get insight of data related to geographical change. Review collected by and hosted on G2.com.

Verified User in Internet
AI
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

Provides an in-depth list of features, html tags, site maps Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Didn't have a lot of documentation at the time I was using it which made it hard to use. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Crawled our domain urls and got useful revelant information Review collected by and hosted on G2.com.

Imtiaz S.
IS
Senior Software Engineer
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Easy to use.

Can crawl almost all kinds of contents.

Excellent plugin system .

Supports different storage backends. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Hard to master. Requires Stiff knowledge curve.

Poor documentation. Many are outdated or broken.

Difficult to setup for production system. Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Use Apache Storm Crawler instead. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

We Used Apache Nutch to crawl websites and index them with Solr. Review collected by and hosted on G2.com.

Verified User in Computer Software
AC
Enterprise(> 1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
(Original )Information
What do you like best about Apache Nutch?

I used apache nutch in crawling using cygwin, in easy steps it managed to be configured and helped in collecting the desired data. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

I didn't see any disadvantage of it to be honest. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

It helped to configure the database in easy steps Review collected by and hosted on G2.com.

Verified User in Computer & Network Security
AC
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Apache Nutch is an easy configuration application that we can used for research Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Its very difficult to find article about apache nutch Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Because the resource are very difficult to find, mostly about the configuration Review collected by and hosted on G2.com.

Verified User in Higher Education
CH
Enterprise(> 1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Easy to use, support from big community of devs Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

The default interface of the search engine is very outdated Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Building an Arabic search engine Review collected by and hosted on G2.com.

Verified User in Newspapers
IN
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

Nutch support distributed fetching, and Hadoop support, can be multi-machine distributed fetching, storage and indexing.

Another attractive point is that it provides a plug-in framework, make it of all kinds of web content parsing, a variety of data collection, query, cluster, filtering, and other functions can be convenient to extend, it is because of this framework, the Nutch plug-in development is very easy, third-party plug-in also emerge in endlessly, greatly enhanced the function of Nutch and reputation. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Nutch's crawler customization ability is relatively weak.

If the secondary development of Nutch crawler is carried out, the compilation time and debugging time of crawler will take a lot of time. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Massive amounts of data can be obtained from specific websites, which can be screened and analyzed purposefully, and the results of these data can be clearly displayed in front of us through a certain service. Review collected by and hosted on G2.com.

Justin C.
JC
CTO
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

I love how easy to configure and run it is and how it performs at scale. Storing in Hadoop is a breeze. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Not quite as easy to use as tools like Scrapy. Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Distributed batch web crawling. Review collected by and hosted on G2.com.

Verified User in Computer & Network Security
UC
Small-Business(50 or fewer emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
Business partner of the seller or seller's competitor, not included in G2 scores.
What do you like best about Apache Nutch?

HTTP proxy support so my IP does not get block

Nice file size filter with advanced control of network bandwidth

I heard that many big companies and government agencies are using nutch in production

Nutch has parallel reducer to make use of multiple network connections and multi-core CPU Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

I wish nutch has built-in rate limiting support

Implemented in Java which is a bit memory hungry Review collected by and hosted on G2.com.

Recommendations to others considering Apache Nutch:

Use parallel reducer to decrease crawling time Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

Crawl leaked credentials on github Review collected by and hosted on G2.com.

Verified User in Information Technology and Services
UI
Mid-Market(51-1000 emp.)
More Options
Validated Reviewer
Review source: G2 invite
Incentivized Review
What do you like best about Apache Nutch?

Fetching and parsing are done separately by default, this reduces the risk of an error corrupting the fetch parse stage of a crawl with Nutch.

* Plugins have been overhauled as a direct result of removal of legacy Lucene dependency for indexing and search.

* The number of plugins for processing various document types being shipped with Nutch has been refined.

The only parser plugins shipped with Nutch now are Feed (RSS/Atom), HTML, Ext, JavaScript, SWF, Tika & ZIP.

Nutch has had scoring plugins for quite a while, and has supported things like Adaptive Fetch schedules, and all of the Nutch data is in databases and so forth that are interrogated through the command line tools, Java, and now there is an emerging REST interface and also work to create a Python client for this as well. Review collected by and hosted on G2.com.

What do you dislike about Apache Nutch?

Nutch doesn't have to be batch mode.

So lets say that as a Nutch crawl administrator your client has tasked you with the following "Get me domain specific material from a database such as NTIS" (NTIS; the National Technical Information Service, serves as the largest central resource for government-funded scientific, technical, engineering, and business related information available today.) What this really translates to is the following:

Review collected by and hosted on G2.com.

What problems is Apache Nutch solving and how is that benefiting you?

This page provides commentary and thoughts on adapting Nutch not only to fetch AJAX/JavaScript driven dynamic HTML content, but also for interacting with that content (potentially a number of times) within a fetching scenario.

Review collected by and hosted on G2.com.