对于T(n) = a*T(n/b)+c*n^k;T(1) = c 这样的递归关系,有这样的结论: if (a > b^k) T(n) = O(n^(logb(a)));logb(a)b为底a的对数 if (a = b^k) T(n) = O(n^k*logn); if (a < b^k) T(n) = O(n^k); a=25; b = 5 ; k=2 a==b^k 故T(n)=O(n^k*logn)=O(n^2*logn)
回归分析(regression analysis)是确定两种或两种以上变数间相互依赖的定量关系的一种统计分析方法。运用十分广泛,回归分析按照涉及的自变量的多少,可分为一元回归分析和多元回归分析;按照自变量和因变量之间的关系类型,可分为线性回归分析和非线性回归分析
AR模型
AR模型,即自回归(AutoRegressive, AR)模型又称为时间序列模型,数学表达式为
AR : y(t)=a1y(t-1)+...any(t-n)+e(t) 其中,e(t)为均值为0,方差为某值的白噪声信号。
AR模型是一种线性预测,即已知N个数据,可由模型推出第N点前面或后面的数据(设推出P点),所以其本质类似于插值,其目的都是为了增加有效数据,只是AR模型是由N点递推,而插值是由两点(或少数几点)去推导多点,所以AR模型要比插值方法效果更好。
假设有C++程序boss.exe, 其执行格式如下(第一个参数是输入文件,第二个参数是输出文件):
./boss.exe ADDRESS_BOOK_FILE NEW_ADDRESS_BOOK_FILE
现在需要在hadoop的Map函数中启动boss.exe,其输入输出文件均在HDFS中,格式为:
hdfs://127.0.0.1:8020/user/donal/address1.txt 或者 hdfs:///user/donal/address1.txt
Map函数解决思路:
1.先将输入文件从HDFS拷贝到本地
hadoop fs -copyToLocal /user/donal/address1.txt /tmp/address1.txt
2.执行C++程序
./boss.exe /tmp/address1.txt /tmp/address2.txt
3.将结果文件拷贝到HDFS
hadoop fs -copyFromLocal /tmp/address2.txt /user/donal/address2.txt
具体的代码如下:
Map.java
import java.lang.Runtime;
import java.util.Arrays;
class Map{
public static int RunProcess(String[] args){
int exitcode = -1;
System.out.println(Arrays.toString(args));
try{
Runtime runtime=Runtime.getRuntime();
final Process process=runtime.exec(args);
// any error message?
new StreamGobbler(process.getErrorStream(), "ERROR").start();
// any output?
new StreamGobbler(process.getInputStream(), "OUTPUT").start();
process.getOutputStream().close();
exitcode=process.waitFor();
}catch (Throwable t){
t.printStackTrace();
}
return exitcode;
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Map ADDRESS_BOOK_FILE NEW_ADDRESS_BOOK_FILE");
System.exit(-1);
}
String inFileName=args[0];
String outFileName=args[1];
String localInFileName="";
String localOutFileName="";
//String[] commandArgs;
try{
if(args[0].startsWith("hdfs://")){
inFileName=args[0].substring(args[0].indexOf('/',6)+1);
localInFileName="/tmp/" + inFileName.substring(inFileName.lastIndexOf('/')+1);
//copy the input file from HDFS
RunProcess(new String[]{"/bin/sh","-c","/usr/lib/hadoop/bin/hadoop fs -copyToLocal "+inFileName+" "+localInFileName});
}
if(args[1].startsWith("hdfs://")){
outFileName=args[1].substring(args[1].indexOf('/',6)+1);
localOutFileName="/tmp/" + outFileName.substring(outFileName.lastIndexOf('/')+1);
}
String[] commandArgs={"./boss.exe",localInFileName,localOutFileName};
int exitcode = RunProcess(commandArgs);
if(args[1].startsWith("hdfs://")){
//copy the result file to HDFS
RunProcess(new String[]{"/bin/sh","-c","/usr/lib/hadoop/bin/hadoop fs -copyFromLocal "+localOutFileName+" "+outFileNam
e});
}
System.out.println("finish:"+exitcode);
}catch (Throwable t){
t.printStackTrace();
}
}
}
StreamGobbler.java:
import java.util.*;
import java.io.*;
class StreamGobbler extends Thread
{
InputStream is;
String type;
StreamGobbler(InputStream is, String type)
{
this.is = is;
this.type = type;
}
public void run()
{
try
{
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line=null;
while ((line = br.readLine()) != null)
System.out.println(type + ">" + line);
} catch (IOException ioe)
{
ioe.printStackTrace();
}
}
}
说明:
1.HDFS文件的拷贝可以使用Java API实现,本例中调用了shell命令
2.StreamGobbler.java是为了输出子进程的ErrorSteam和InputSteam输出
前面测试完了protobuf的c++tutorial,接着测试一下java tutorial,已经ant的build.xml文件编写。
根据./proto/目录下的addressbook.proto( 内容请参考tutorial)生成.java文件,放到./java/src目录下
$ protoc -I=./proto --java_out=./java/src ./proto/addressbook.proto
在 ./java/src 目录下
$ cd ./java/src
编写Reader.java 和 Writer.java( 内容请参考tutorial )
在.java/目录下,编写build.xml文件,关于build.xml的编写,请参考这里
$ cd ..
$ cat build.xml
<project name="ProtoTest" basedir="." default="main">
<property name="src.dir" value="src"/>
<property name="build.dir" value="build"/>
<property name="classes.dir" value="${build.dir}/classes"/>
<property name="jar.dir" value="${build.dir}/jar"/>
<property name="lib.dir" value="lib"/>
<path id="classpath">
<fileset dir="${lib.dir}" includes="**/*.jar"/>
</path>
<target name="clean">
<delete dir="${build.dir}"/>
</target>
<target name="compile">
<mkdir dir="${classes.dir}"/>
<javac srcdir="${src.dir}" destdir="${classes.dir}" classpathref="classpath"/>
</target>
<target name="jar" depends="compile">
<mkdir dir="${jar.dir}"/>
<jar destfile="${jar.dir}/${ant.project.name}.jar" basedir="${classes.dir}">
</jar>
</target>
<target name="clean-build" depends="clean,jar"/>
</project>
生成ProtoTest.jar
$ant jar
测试 ListPeople和AddPerson类
$ java -classpath ./build/jar/ProtoTest.jar:./lib/protobuf-java-2.4.1.jar AddPerson address.txt Enter person ID number: 01 Enter name: donal Enter email address (blank for none): donal0412@gmail.com Enter a phone number (or leave blank to finish): 88236017 Is this a mobile, home, or work phone? work Enter a phone number (or leave blank to finish):
$ java -classpath ./build/jar/ProtoTest.jar:./lib/protobuf-java-2.4.1.jar ListPeople address.txt Person ID: 1 Name: donal E-mail address: donal0412@gmail.com Work phone #: 88236017
测试java和c++程序通过文件通信
$ java -classpath ./build/jar/ProtoTest.jar:./lib/protobuf-java-2.4.1.jar ListPeople ../cpp/address.txt Person ID: 1 Name: donal E-mail address: donal0412@gmail.com Work phone #: 88236017
$ ../cpp/reader address.txt Person ID: 1 Name: donal E-mail address: donal0412@gmail.com Work phone #: 88236017
很久没有写C++程序了,很多东西都忘了,几天试了一下protobuf的tutorial,顺便写了Makefile:
根据./proto/目录下的addressbook.proto( 内容请参考tutorial)生成.cc和.h文件,放到./cpp/proto/目录下
$ protoc -I=./proto/ --cpp_out=./cpp/proto/ ./proto/addressbook.proto
在 ./cpp/ 目录下
$ cd ./cpp
编写reader.cc 和 writer.cc( 内容请参考tutorial )
编写Makefile文件,关于gcc编译过程和makefile中的特殊符号,请参考这里和这里
$ cat Makefile
LIBS=-lprotobuf
CC=gcc
ARGS=-Wall -c
all:reader writer
reader:reader.o addressbook.pb.o
$(CC) $^ $(LIBS) -o $@
writer:writer.o addressbook.pb.o
$(CC) $^ $(LIBS) -o $@
reader.o:reader.cc proto/addressbook.pb.h
$(CC) $(ARGS) $*.cc -o $@
writer.o:writer.cc proto/addressbook.pb.h
$(CC) $(ARGS) $*.cc -o $@
addressbook.pb.o:proto/addressbook.pb.cc proto/addressbook.pb.h
$(CC) $(ARGS) proto/$*.cc -o $@
clean:
rm *.o -f
rm reader writer -f
生成reader和writer程序
$make
测试 writer和reader
$ ./writer address.txt Enter person ID number: 01 Enter name: donal Enter email address (blank for none): donal0412@gmail.com Enter a phone number (or leave blank to finish): 88236017 Is this a mobile, home, or work phone? work Enter a phone number (or leave blank to finish):
$ ./reader address.txt Person ID: 1 Name: donal E-mail address: donal0412@gmail.com Work phone #: 88236017

