Questions tagged [hadoop]

Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.

Hadoop from Apache, includes the modules:

  • Hadoop Common: support utilities
  • Hadoop Distributed File System (HDFS)
  • Hadoop YARN: job scheduling and resource management
  • Hadoop MapReduce: parallel processing of large data sets

It is used by by such heavyweights as Facebook and Twitter.

Apache projects based on Hadoop should probably use their own tag, namely Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, Spark, Tez, and ZooKeeper.

68 questions
20
votes
4 answers

SSH into VirtualBox on Mac

I just installed VirtualBox on my mac, created a new Ubuntu Virtual Machine with "Use an existing virtual hard disk file" of the Cloudera Hadoop disk image. I'm able to start and run the virtual machine, however, I'd prefer to ssh into from my…
jKraut
  • 301
  • 1
  • 2
  • 3
9
votes
4 answers

RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)

I am trying to mount hdfs on my local machine(ubuntu) using nfs by following the below link:-- https://www.cloudera.com/documentation/enterprise/5-2-x/topics/cdh_ig_nfsv3_gateway_configure.html#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7ef4 So,at…
Bhavya Jain
  • 333
  • 1
  • 5
  • 11
5
votes
0 answers

How do you get Hadoop commands to work when you get the error "Invalid HADOOP_COMMON_HOME"?

I had a Hadoop version 1.x installed on Linux SUSE 12.3. I moved the directory somewhere else to back it up. I tried to install Hadoop 3.0. I expect Hadoop commands to work based on what I did. I used open source Hadoop 3.0 files. Hadoop…
Jermoe
  • 111
  • 1
  • 6
4
votes
2 answers

LD_LIBRARY_PATH lost when using mount command

TL;DR When a fuse filesystem is mounted via the mount command, the environment variables are not passed to the fuse script. Why? Context I am trying to mount hdfs (hadoop file system) via fuse. This is easy on the command line: # Short example…
Guillaume
  • 201
  • 1
  • 8
3
votes
2 answers

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this error:- mount.nfs: mount system call failed Output…
Bhavya Jain
  • 333
  • 1
  • 5
  • 11
3
votes
1 answer

Change column datatypes in Hive database

Can I change the datatype in Hive database? Below is a complete information about the same. I have a database named "test". It has a table "name". Below is a query I had used while creating a column in name table. create table name(custID…
Nitesh B.
  • 563
  • 2
  • 7
  • 20
3
votes
3 answers

bind failure, address in use: Unable to use a TCP port for both source and destination?

I'm debugging Hadoop DataNodes that won't start. We are using saltstack and also elasticsearch on the machines. The Hadoop DataNode error is pretty clear: java.net.BindException: Problem binding to [0.0.0.0:50020] java.net.BindException:…
kei1aeh5quahQu4U
  • 452
  • 5
  • 13
2
votes
0 answers

ssh: connect to host localhost port 22: Connection refused

I have installed hadoop and ssh. hadoop was working fine, then today I am getting the error below when I run the command sbin/start-dfs.sh: Starting namenodes on [localhost] localhost: ssh: connect to host localhost port 22: Connection…
Sanaya
  • 31
  • 2
2
votes
1 answer

Copy files from a hdfs folder to another hdfs location by filtering with modified date using shell script

I have 1 year data in my hdfs location and i want to copy data for last 6 months into another hdfs location. Is it possible to copy data only for 6 months directly from hdfs command or do we need to write shell script for copying data for last 6…
Antony
  • 131
  • 1
  • 5
2
votes
0 answers

Hadoop cluster not listening on port that I configured. What is wrong?

I set up a Hadoop cluster with RHEL 7.4 servers. There is no firewall between them. I am running Hadoop 3.0. On the namenode the core-site.xml file is configured to use port 54310. I run this command: hdfs dfsadmin -report But I get this error…
Jermoe
  • 111
  • 1
  • 6
2
votes
0 answers

The command "hdfs dfsadmin -report" fails because "failed to connect to server"

I am trying to configure a multi-node cluster of open source Hadoop. I have Hadoop 3.0 installed on the namenode and the data node. Both are running Linux (SUSE and Ubuntu). None are CentOS, RedHat or Fedora. I have tried different settings with…
Jermoe
  • 111
  • 1
  • 6
2
votes
1 answer

False error starting Hue

I have installed Hue in CentOS 7 from Cloudera CDH5 repository. Upon starting it reports an error: # systemctl status hue hue.service - SYSV: Hue web server Loaded: loaded (/etc/rc.d/init.d/hue) Active: failed (Result: resources) since sob…
Kombajn zbożowy
  • 215
  • 1
  • 11
2
votes
1 answer

Which OS should I use for Hadoop cluster?

I have a client setting up a Hadoop cluster. We have all used and are very familiar with CentOS 7. I was told Scientific Linux maybe better optimized for Hadoop. Is there any truth to that?
Dovid Bender
  • 439
  • 7
  • 17
2
votes
1 answer

Validate start-dfs.sh

I am trying to setup a Hadoop cluster, where master is my laptop and slave is the virtualbox, following this guide. So, I did, from master: gsamaras@gsamaras:/home/hadoopuser/hadoop/sbin$ sudo ./start-dfs.sh Starting namenodes on…
gsamaras
  • 191
  • 1
  • 4
  • 12
2
votes
1 answer

How much space for /home on a hadoop cluster?

What is a reasonable size to provide for a /home partition for 100 users on a hadoop cluster? Assume that a landing zone has been provided to store files/data for ingestion into the cluster, so the /home partition would be non-project type storage.
ChuckCottrill
  • 1,027
  • 1
  • 11
  • 15
1
2 3 4 5