pinboard October 21, 2013

  • ubuntu – Redis – Connect to Remote Server – Stack Overflow
    check your /etc/redis/redis.conf, and make sure to change the default


    Then restart your service (service redis-server restart)

    You can then now check that redis is listening on non-local interface with

    redis-cli -h 192.168.x.x ping
    (replace 192.168.x.x with your IP adress)

  • Twitter / History_Pics: Google employees, 1999 …
    RT @History_Pics: Google employees, 1999
  • Databricks raises $14M from Andreessen Horowitz, wants to take on MapReduce with Spark — Tech News and Analysis
    A team of professors behind the open source Spark and Shark in-memory big data projects has raised $13.9 million to commercialize the products via a company called Databricks. Spark and Shark are designed to be much faster and more flexible than Hadoop MapReduce and Hive.

    For those not familiar with Spark, it is a big data platform written in Scala and designed to run very fast. Stoica wasn’t much more forthcoming on details during a recent phone call, but he did explain the promise of Spark as compared with Hadoop MapReduce. Essentially, he said, it’s up to 100 times faster if your dataset can fit in memory, but it’s built to be significantly faster even on disk. It’s also architected differently than MapReduce in ways that make it ideal for machine learning algorithms and data mining workloads, where users might want to iterate on on existing results or repeatedly query a dataset with low latency.

    Spark is also quite popular among web companies. It’s used by Yahoo, Airbnb, ClearStory Data and others, and more than 20 companies have contributed code to the project, according to its Apache page.

    Shark is shorthand for “Hive on Spark,” which really means it’s a data warehousing framework compatible with Apache Hive but designed to run atop Spark rather than Hadoop MapReduce. Hive has become very popular as the de facto method of running SQL-like queries over data stored in Hadoop, but recently Hadoop vendors Cloudera and Hortonworks have undertaken their own efforts to either speed up Hive (which is slow because it relies on MapReduce) or eliminate it altogether for interactive queries. The Shark team claims it’s up to 100 times faster than Hive when running in memory.

  • Hadoop YARN
    Apache™ Hadoop® YARN is a sub-project of Hadoop at the Apache Software Foundation introduced in Hadoop 2.0 that separates the resource management and processing components. YARN was born of a need to enable a broader array of interaction patterns for data stored in HDFS beyond MapReduce. The YARN-based architecture of Hadoop 2.0 provides a more general processing platform that is not constrained to MapReduce.
  • Go Hadoop! Err, Hadoop and Go. | Hortonworks

Digest powered by RSS Digest