Java脚本处理数据

目录

一、 流程

 二、 代码


一、 流程

        在继承kettle的类之前,先去看spoon中有一个脚本的组件,可以使用java代码或者js等处理数据,

         当把这个流程配置起来的时候,是如下所示,

         当双击这个main时,会出现一个方法,这就是处理行数据的方法,

         其中有参考示例,设置值的示例,如下图所示,

        当运行后结果如下图,确实,通过Java代码处理了数据,

 

 二、 代码

        在代码中可以定义Java节点,用来执行对应的代码,而代码就是界面工具中的processRow方法,那么也就是说可以通过processRow这个方法来处理数据。

/**
     * 获取java 脚本
     * @param transMeta
     * @param registry
     * @return
     */
    private StepMeta getJavaStep(TransMeta transMeta, PluginRegistry registry){
        UserDefinedJavaClassMeta javaClassMeta = new UserDefinedJavaClassMeta();

        //Java代码
        String sourceCode = "public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {\n" +
                "  if (first) {\n" +
                "    first = false;\n" +
                "\n" +
                "    /* TODO: Your code here. (Using info fields)\n" +
                "\n" +
                "    FieldHelper infoField = get(Fields.Info, \"info_field_name\");\n" +
                "\n" +
                "    RowSet infoStream = findInfoRowSet(\"info_stream_tag\");\n" +
                "\n" +
                "    Object[] infoRow = null;\n" +
                "\n" +
                "    int infoRowCount = 0;\n" +
                "\n" +
                "    // Read all rows from info step before calling getRow() method, which returns first row from any\n" +
                "    // input rowset. As rowMeta for info and input steps varies getRow() can lead to errors.\n" +
                "    while((infoRow = getRowFrom(infoStream)) != null){\n" +
                "\n" +
                "      // do something with info data\n" +
                "      infoRowCount++;\n" +
                "    }\n" +
                "    */\n" +
                "  }\n" +
                "\n" +
                "  Object[] r = getRow();\n" +
                "\n" +
                "  if (r == null) {\n" +
                "    setOutputDone();\n" +
                "    return false;\n" +
                "  }\n" +
                "\n" +
                "  // It is always safest to call createOutputRow() to ensure that your output row's Object[] is large\n" +
                "  // enough to handle any new fields you are creating in this step.\n" +
                "  r = createOutputRow(r, data.outputRowMeta.size());\n" +
                "\n" +
                "  /* TODO: Your code here. (See Sample)\n" +
                "\n" +
                "  // Get the value from an input field\n" +
                "  String foobar = get(Fields.In, \"a_fieldname\").getString(r);\n" +
                "\n" +
                "  foobar += \"bar\";\n" +
                "    \n" +
                "  // Set a value in a new output field\n" +
                "  get(Fields.Out, \"output_fieldname\").setValue(r, foobar);\n" +
                "\n" +
                "  */\n" +
                "\tString name = get(Fields.In,\"name\").getString(r);\n" +
                "\tif(null!=name){\n" +
                "\t\tname = name+\"_new\";\n" +
                "\t}\n" +
                "\tget(Fields.Out,\"new_name\").setValue(r,name);\n" +
                "\n" +
                "  // Send the row on to the next step.\n" +
                "  putRow(data.outputRowMeta, r);\n" +
                "\n" +
                "  return true;\n" +
                "}";

        UserDefinedJavaClassDef classDef = new UserDefinedJavaClassDef(UserDefinedJavaClassDef.ClassType.TRANSFORM_CLASS,"Processor",sourceCode);

        List<UserDefinedJavaClassDef> classDefs = new ArrayList<>();
        classDefs.add(classDef);
        //添加Java脚本到节点中
        javaClassMeta.replaceDefinitions(classDefs);

        List<UserDefinedJavaClassMeta.FieldInfo> fields = new ArrayList<>();

        //定义目标输出字段
        UserDefinedJavaClassMeta.FieldInfo fieldInfo =
                new UserDefinedJavaClassMeta.FieldInfo("new_name",ValueMetaInterface.TYPE_STRING,-1,-1);

        fields.add(fieldInfo);
        javaClassMeta.setFieldInfo(fields);


        String javaClassPluginId = registry.getPluginId(StepPluginType.class, javaClassMeta);
        StepMeta javaClassStep = new StepMeta(javaClassPluginId, "Java 代码", (StepMetaInterface) javaClassMeta);

        javaClassStep.setDraw(true);
        javaClassStep.setLocation(560,304);

        transMeta.addStep(javaClassStep);

        return javaClassStep;
    }

             首先以TableInput和TableOutput这两个kettle中常用组件来说。

        打开这两个的源码,发现都有processRow这个方法,那么也就是说表输入和表输出的数据处理都可以在此进行,

        那么是否可以继承TableInput和TableOutput,并重写processRow来定义自己的处理方式呢?

        TableInput

public class TableInput extends BaseStep implements StepInterface {
	private TableInputMeta meta;
	private TableInputData data;
	
	public boolean processRow( StepMetaInterface smi, StepDataInterface sdi ) throws KettleException {
	
		//表查询
		boolean success = doQuery( parametersMeta, parameters );
		
		//设置数据
		putRow( data.rowMeta, data.thisrow );
	}
	
	private boolean doQuery( RowMetaInterface parametersMeta, Object[] parameters ) throws KettleDatabaseException {
	
	}
}

        TableOutput

public class TableOutput extends BaseStep implements StepInterface {
	private TableOutputMeta meta;
	private TableOutputData data;
	
	  public boolean processRow( StepMetaInterface smi, StepDataInterface sdi ) throws KettleException {
			meta = (TableOutputMeta) smi;
			data = (TableOutputData) sdi;

			//获取数据
			Object[] r = getRow();
			try {
				  //写数据到表
				  Object[] outputRowData = writeToTable( getInputRowMeta(), r );
				  if ( outputRowData != null ) {
					putRow( data.outputRowMeta, outputRowData ); // in case we want it go further...
				  }
			} catch ( KettleException e ) {
			  
			}
	  }

}

        后面就以这两个为例,来写自己的处理方式。

posted @ 2021-11-23 15:42  伟衙内  阅读(104)  评论(0)    收藏  举报