spark 分析日志文件(key,value)
2016-04-12 17:00
330 查看
Spark读取日志,统计每个service所用的平均时间
发布时间:2015-12-10 9:54:15来源:分享查询网
获取log日志,每个service以“#*#”开头。统计每个service所需的平均时间。
import java.io.{File, PrintWriter} import org.apache.spark.{SparkContext, SparkConf} object SimpleApp { def main(args: Array[String]) { System.setProperty("hadoop.home.dir","D://spark-1.3.1-bin-hadoop-2.3.0-cdh5.0.2"); val logFile = "d://Debug.2015-06-12_1556.log" // Should be some file on your system val conf = new SparkConf().setAppName("Simple Application").setMaster("local") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val result = logData.filter(line => line.contains("#*#")) println("********统计开始**********") //转化为key-value形式的RDD。 val jobNameAndTime = result.map(line => (line.split("#*#").last.split(" ").head, line.split("#*#").last.split(" ").last.toInt/1000)) val jobNameTimes = jobNameAndTime.map(line => (line._1, 1)).reduceByKey((x, y) => x + y) val jobAvgTime = jobNameAndTime.reduceByKey((x, y) => (x + y)/2) //join方法 val jobTimesAndAvgTime = jobNameTimes.join(jobAvgTime).sortBy(x => x._2._2) println("********************************************************************") jobTimesAndAvgTime.map(x => println(s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s")).collect val writer = new PrintWriter(new File("d://test.txt" )) writer.write(jobTimesAndAvgTime.map(x => s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s\n").collect.toList.mkString(",").replace(",", "")) writer.close println(s"一共 ${result.count} 统计条数据") println("********************************************************************") println("********统计结束**********") } }
------------------------------
每个service以“#*#”开头,后面接上所用的时间。
log日志片段:
2015-06-11 00:05:32.23423742063 [Worker-88] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - Spark useDatabase =use ran 2015-06-11 00:05:32.82023742649 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 109 2015-06-11 00:05:35.18423745013 [Worker-88] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 110 2015-06-11 00:05:35.18423745013 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 102 2015-06-11 00:05:35.18523745014 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 778 2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 96 2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 42 2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 83 2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 40 2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 26993 2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - database config: DatabaseInfo(jdbc:hive2://192.168.2.110:11000,mr,mr,org.apache.hive.jdbc.HiveDriver,ran) 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@c0770c 2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 36993 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.t.t.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 6 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 6 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Start message handled 2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete 2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 336993 2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 236993
参考链http://m.fx114.net/qa-177-352127.aspx
相关文章推荐
- Android Volley完全解析(三),定制自己的Request
- UIPickerView
- UITextField的总结
- 使用Arduino读取水流速传感器的脉冲数
- C#根据日期范围过滤IQueryable<T>集合
- js根据select-option的value或者text来选中....等知识点更新
- gulp打包requirejs
- 验证码2和其中 StringBuilder
- 《Java程序设计基础教程》第19讲 JavaGUI编程高级
- [原创]升级SOUI WKE以支持_blank
- Android——String.IndexOf 方法 (value, [startIndex], [count])
- uilmit 优化
- 运用link query特性query自己的Scope中department或其它scope中的department
- rebuild 工程 莫名其妙的问题 一定要先rebuild工程 然后再解决问题,
- 异常 org.jetbrains.android.uipreview.RenderingException android studio
- 1.1.3 Building a Simple User Interface
- HDU 1503 Advanced Fruits 由两个字符串组成一个最短新串 (最长公共子串变形)
- ServletRequest接口的主要方法
- UEditor Flash文件上传-crossdomain.xml文件配置
- UIScrollView 的 delaysContentTouches