Spark

Spark

Technology
Apache Spark: Its fast cluster computing system. Ready to use & scalable cluster framework. It is available in different sets as per requirement – processing, streaming, SQL, graph etc. How to install: (Useful Links) http://spark.apache.org/docs/latest/ http://nishutayaltech.blogspot.in/2015/04/how-to-run-apache-spark-on-windows7-in.html OR 1. You can run it without installing as well - just needed jars for spark 2. You can add maven dependency <dependency> <groupId>spark</groupId> <artifactId>spark</artifactId> <version>....</version> </dependency> Example in Java: Aim - Process file and get longest line length Code: Dependencies : import org.apache.spark.api.java.*; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; Creating RDD: String testFile = "C://disk//filename"; SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> testData = sc.textFile(testFile).cache(); //Process data and get maximum line length in variable num System.out.println(num); With inline functions:   int num = testData.map(new Function<String, Integer>() {public Integer call(String s)…
Read More