2020-03-28 10:26:21 +08:00
|
|
|
|
<p align="center">
|
|
|
|
|
<img src="https://www.spiderflow.org/images/logo.svg" width="600">
|
|
|
|
|
</p>
|
|
|
|
|
<p align="center">
|
|
|
|
|
<a target="_blank" href="https://www.oracle.com/technetwork/java/javase/downloads/index.html"><img src="https://img.shields.io/badge/JDK-1.8+-green.svg" /></a>
|
|
|
|
|
<a target="_blank" href="https://www.spiderflow.org"><img src="https://img.shields.io/badge/Docs-latest-blue.svg"/></a>
|
2020-05-16 22:49:35 +08:00
|
|
|
|
<a target="_blank" href="https://github.com/ssssssss-team/spider-flow/releases"><img src="https://img.shields.io/github/v/release/ssssssss-team/spider-flow?logo=github"></a>
|
|
|
|
|
<a target="_blank" href='https://gitee.com/ssssssss-team/spider-flow'><img src="https://gitee.com/ssssssss-team/spider-flow/badge/star.svg?theme=white" /></a>
|
|
|
|
|
<a target="_blank" href='https://github.com/ssssssss-team/spider-flow'><img src="https://img.shields.io/github/stars/ssssssss-team/spider-flow.svg?style=social"/></a>
|
2020-03-28 10:26:21 +08:00
|
|
|
|
<a target="_blank" href="LICENSE"><img src="https://img.shields.io/:license-MIT-blue.svg"></a>
|
|
|
|
|
<a target="_blank" href="https://shang.qq.com/wpa/qunwpa?idkey=10faa4cf9743e0aa379a72f2ad12a9e576c81462742143c8f3391b52e8c3ed8d"><img src="https://img.shields.io/badge/Join-QQGroup-blue"></a>
|
|
|
|
|
</p>
|
|
|
|
|
|
2020-05-05 11:01:47 +08:00
|
|
|
|
[介绍](#介绍) | [特性](#特性) | [插件](#插件) | <a target="_blank" href="http://demo.spiderflow.org">DEMO站点</a> | <a target="_blank" href="https://www.spiderflow.org">文档</a> | <a target="_blank" href="https://www.spiderflow.org/changelog.html">更新日志</a> | [截图](#项目部分截图) | [其它开源](#其它开源项目) | [免责声明](#免责声明)
|
2020-03-28 10:26:21 +08:00
|
|
|
|
|
|
|
|
|
## 介绍
|
|
|
|
|
平台以流程图的方式定义爬虫,是一个高度灵活可配置的爬虫平台
|
|
|
|
|
|
|
|
|
|
## 特性
|
|
|
|
|
- [x] 支持Xpath/JsonPath/css选择器/正则提取/混搭提取
|
|
|
|
|
- [x] 支持JSON/XML/二进制格式
|
2020-03-15 20:34:18 +08:00
|
|
|
|
- [x] 支持多数据源、SQL select/selectInt/selectOne/insert/update/delete
|
2020-03-28 10:26:21 +08:00
|
|
|
|
- [x] 支持爬取JS动态渲染(或ajax)的页面
|
2019-07-24 11:35:51 +08:00
|
|
|
|
- [x] 支持代理
|
2020-03-28 10:26:21 +08:00
|
|
|
|
- [x] 支持自动保存至数据库/文件
|
2019-07-29 19:03:15 +08:00
|
|
|
|
- [x] 常用字符串、日期、文件、加解密等函数
|
2020-03-28 10:26:21 +08:00
|
|
|
|
- [x] 支持插件扩展(自定义执行器,自定义方法)
|
2020-03-15 20:34:18 +08:00
|
|
|
|
- [x] 任务监控,任务日志
|
2019-08-14 14:45:32 +08:00
|
|
|
|
- [x] 支持HTTP接口
|
2020-03-15 20:34:18 +08:00
|
|
|
|
- [x] 支持Cookie自动管理
|
|
|
|
|
- [x] 支持自定义函数
|
2019-07-29 19:03:15 +08:00
|
|
|
|
|
2020-03-28 10:26:21 +08:00
|
|
|
|
## 插件
|
2020-05-16 22:49:35 +08:00
|
|
|
|
- [x] [Selenium插件](https://gitee.com/ssssssss-team/spider-flow-selenium)
|
|
|
|
|
- [x] [Redis插件](https://gitee.com/ssssssss-team/spider-flow-redis)
|
|
|
|
|
- [x] [OSS插件](https://gitee.com/ssssssss-team/spider-flow-oss)
|
|
|
|
|
- [x] [Mongodb插件](https://gitee.com/ssssssss-team/spider-flow-mongodb)
|
|
|
|
|
- [x] [IP代理池插件](https://gitee.com/ssssssss-team/spider-flow-proxypool)
|
|
|
|
|
- [x] [OCR识别插件](https://gitee.com/ssssssss-team/spider-flow-ocr)
|
|
|
|
|
- [x] [电子邮箱插件](https://gitee.com/ssssssss-team/spider-flow-mailbox)
|
2019-07-29 19:03:15 +08:00
|
|
|
|
|
2020-03-28 10:26:21 +08:00
|
|
|
|
## 项目部分截图
|
2020-04-12 10:48:42 +08:00
|
|
|
|
### 爬虫列表
|
|
|
|
|
![爬虫列表](https://images.gitee.com/uploads/images/2020/0412/104521_e1eb3fbb_297689.png "list.png")
|
|
|
|
|
### 爬虫测试
|
|
|
|
|
![爬虫测试](https://images.gitee.com/uploads/images/2020/0412/104659_b06dfbf0_297689.gif "test.gif")
|
|
|
|
|
### Debug
|
|
|
|
|
![Debug](https://images.gitee.com/uploads/images/2020/0412/104741_f9e1190e_297689.png "debug.png")
|
|
|
|
|
### 日志
|
|
|
|
|
![日志](https://images.gitee.com/uploads/images/2020/0412/104800_a757f569_297689.png "logo.png")
|
2019-09-18 11:20:17 +08:00
|
|
|
|
|
2020-05-05 11:01:47 +08:00
|
|
|
|
## 其它开源项目
|
2020-05-16 22:49:35 +08:00
|
|
|
|
- [spider-flow-vue,spider-flow的前端](https://gitee.com/ssssssss-team/spider-flow-vue)
|
|
|
|
|
- [magic-api,一个以XML为基础自动映射为HTTP接口的框架](https://gitee.com/ssssssss-team/magic-api)
|
|
|
|
|
- [magic-api-spring-boot-starter](https://gitee.com/ssssssss-team/magic-api-spring-boot-starter)
|
2020-05-05 11:01:47 +08:00
|
|
|
|
|
|
|
|
|
|
2020-03-28 10:26:21 +08:00
|
|
|
|
## 免责声明
|
2020-04-12 10:53:48 +08:00
|
|
|
|
请勿将`spider-flow`应用到任何可能会违反法律规定和道德约束的工作中,请友善使用`spider-flow`,遵守蜘蛛协议,不要将`spider-flow`用于任何非法用途。如您选择使用`spider-flow`即代表您遵守此协议,作者不承担任何由于您违反此协议带来任何的法律风险和损失,一切后果由您承担。
|