Commit Graph

71 Commits

Author SHA1 Message Date
StefanYohansson
ef352b6030 facebook runner 2019-07-04 18:18:44 -03:00
StefanYohansson
5c3128cfef Adding events crawler 2019-06-29 22:01:04 -03:00
rugantio
be8a9c2f5f Adding profile crawler 2019-05-18 19:50:46 +02:00
rugantio
0cda010e64 Adding profile crawler 2019-05-18 19:50:33 +02:00
rugantio
dad1a19435 Merge branch 'master' of https://github.com/rugantio/fbcrawl 2019-05-11 15:23:54 +02:00
rugantio
d875e89c52 Adding features: crawl comments from page, crawl posts and comments from groups 2019-05-11 15:22:56 +02:00
Rugantio Costa
a0e90aa467
Merge pull request #25 from JoshuaKissoon/master
Fixed an issue where "Shar" was being returned as the number of comme…
2019-05-02 13:20:13 +02:00
Joshua Kissoon
f4900d6d29 Fixed an issue where "Shar" was being returned as the number of comments on posts without comments 2019-05-01 23:10:49 -04:00
rugantio
f6e8545236 [fb] restored reaction count "it"; cleanup old parse_date 2019-05-01 04:17:29 +02:00
rugantio
57060d7d71 [fb] restored reaction count "it"; cleanup old parse_date 2019-05-01 04:16:23 +02:00
rugantio
85d9b1af42 fix for issue #22 and #24, fix timestamp conversion, cleanup 2019-05-01 02:45:17 +02:00
rugantio
a117cae507 [fbcrawl] fixing date attribute parsing 2019-04-29 18:53:42 +02:00
rugantio
ea431c029c [fbcrawl] fixing date attribute parsing 2019-04-29 18:53:09 +02:00
rugantio
55dc799374 blocking mitigation 2019-04-25 23:41:33 +02:00
rugantio
4a379f3af4 added post_id column 2019-04-24 17:26:53 +02:00
rugantio
8baa108aab removing date pipeline 2019-04-23 08:22:48 +02:00
rugantio
c394575137 correct README for the new "date" attribute 2019-04-23 07:33:37 +02:00
rugantio
efda9a956e [fb] in items.py refactoring parse_date, introducing "date" attribute 2019-04-23 07:31:23 +02:00
rugantio
1acf5c2106 [comments.py] Added new source_url column 2019-04-23 04:18:44 +02:00
rugantio
3d32ab6054 [comments.py] Added new source_url column 2019-04-23 04:00:22 +02:00
rugantio
462cb0eff1 [comments.py] Added support for groups 2019-04-23 03:41:52 +02:00
rugantio
2d404a7667 docs for new spider 2019-02-18 18:51:52 +01:00
rugantio
dc1d0f29c0 refactoring comments spider 2019-02-18 07:18:34 +01:00
rugantio
069f64f61e refactoring comments spider 2019-02-18 05:09:21 +01:00
rugantio
f0cf9599e1 refactoring comments spider 2019-02-18 05:08:42 +01:00
rugantio
bd41255361 refactoring comments spider 2019-02-18 05:07:38 +01:00
rugantio
96d3423b8d Merge branch 'master' of https://github.com/rugantio/fbcrawl 2019-02-18 02:14:01 +01:00
rugantio
b3d12c4e6b refactoring comments spider 2019-02-18 02:12:52 +01:00
Rugantio Costa
98642cffd8
Update README.md 2019-02-05 04:49:57 +01:00
rugantio
bdeae9f4b5 parse_page refactoring complete 2019-02-05 03:48:00 +01:00
rugantio
71f80356dc fixed attribute parsing 2019-02-04 20:27:26 +01:00
rugantio
811f4e396d fixed attribute parsing 2019-02-04 20:25:54 +01:00
rugantio
d28d214993 fixed recursion on pages 2019-02-04 19:27:44 +01:00
rugantio
dafd01c8bd fixed recursion on pages 2019-02-04 19:26:00 +01:00
rugantio
918cd9ce64 added new features, simplified presentation 2019-01-31 07:28:08 +01:00
rugantio
a9982865d9 improved support for languages en, es, fr, it, pt 2019-01-31 06:54:31 +01:00
rugantio
fb32a4213e added experimental support for languages en, es, fr, it, pt 2019-01-30 20:34:25 +01:00
rugantio
9de51e0ce8 steady recursion implemented 2019-01-30 17:30:18 +01:00
rugantio
b24fc61dbb steady recursion implemented 2019-01-30 17:21:43 +01:00
rugantio
eaaa2a32e3 cleaning up 2019-01-29 22:27:55 +01:00
rugantio
17430a06a9 fixed datetime parser 2019-01-29 22:26:01 +01:00
rugantio
b06883dc3c fixed gitignore 2019-01-29 22:08:37 +01:00
rugantio
8d343b057c gitignore added 2019-01-29 21:57:48 +01:00
rugantio
b8d8444b3f fix xpath in comment crawler 2019-01-26 02:52:26 +01:00
rugantio
80a10f176f fix xpath in comment crawler 2019-01-26 02:51:18 +01:00
Rugantio Costa
0c0f3129cd
Update README.md 2018-12-27 02:20:46 +01:00
Rugantio Costa
e95c70c844
Update README.md 2018-12-27 01:47:29 +01:00
rugantio
c04509499f changed user-agent and fixed date parsers in items.py 2018-12-13 05:33:09 +01:00
rugantio
30c04c2fca changed user-agent and fixed date parsers in items.py 2018-12-13 05:31:18 +01:00
rugantio
168bc2c510 disabling pipeline 2018-11-22 21:22:18 +01:00