I have used Apache Beam to design my pipeline on Google cloud platform. My experience was seamless. I have designed for both batch and streaming data.
deployment and production instructions and case studies should be improved
1.Storing of file in sequential format, using key value pair.- Stores file as key and content as value and encrypts them. 2.128 mb block size.- Previously it was 64 which was less. But still some people like the old block size. 3.storing multiple copies...
Inflexible, data needs to be copied to HDFS from other places, one cannot do real-time access from HDFS. This survey is not well-written if its primarily HDFS that you need feedback on.
I have used Apache Beam to design my pipeline on Google cloud platform. My experience was seamless. I have designed for both batch and streaming data.
1.Storing of file in sequential format, using key value pair.- Stores file as key and content as value and encrypts them. 2.128 mb block size.- Previously it was 64 which was less. But still some people like the old block size. 3.storing multiple copies...
deployment and production instructions and case studies should be improved
Inflexible, data needs to be copied to HDFS from other places, one cannot do real-time access from HDFS. This survey is not well-written if its primarily HDFS that you need feedback on.