What is Hadoop?

  • hadoop is an open source framework to store and process big data in a distributed system.
  • its contain two models
    • map reduce - the programming model
    • HDFS - store and process data
  • there are several tools that in hadoop eco system to perform specific tasks
    • Sqoop - used to import and export data between HDFS and RDBMS
    • Pig - language for develop script for MapReduce operations
    • Hive - platform to develop SQL type scripts to MapReduce operations