Pipline是抓取结果序列化的组件,用来存储抓取结果。他和其他框架里面的Pipline没有什么区别。同样的,VS内置的默认pipline叫做consolePipline,作用只是将抓取结果打印到控制台

public class ConsolePipeline implements Pipeline {
    @Override
    public void saveItem(Collection<String> itemJson, Seed seed) {
        for (String str : itemJson) {
            System.out.println(str);
        }
    }
}

另外一个内置的叫做FilePipine,可以将结果写到一个文件

public class FilePipLine implements Pipeline {
    private String filepath;

    private PrintWriter printWriter;

    public FilePipLine(String filepath) throws FileNotFoundException {
        this.filepath = filepath;
        printWriter = new PrintWriter(new FileOutputStream(new File(filepath)));
    }

    @Override
    public void saveItem(Collection<String> itemJson, Seed seed) {
        for (String str : itemJson) {
            printWriter.println(str);
        }
        printWriter.flush();
    }
}

pipline接入VS很简单

VSCrawlerBuilder.create().addPipeline(new FilePipLine("~/Desktop/")).build();

results matching ""

    No results matching ""