Commit Graph

7 Commits

Author SHA1 Message Date
attardi
62bdbe6106 Upgrade to Python 3.3+. 2020-07-22 14:12:37 +02:00
attardi
93cbcdb9df Merge branch 'add_extra_fields_to_cirrus_output' of https://github.com/nathj07/wikiextractor into nathj07-add_extra_fields_to_cirrus_output 2019-04-13 12:36:05 +02:00
Nathan Davies
1e4236de42 extract language and revion from cirrus search
This simple push extracts the langauge and the page review. These are then added to the XML
2019-03-25 14:28:43 +00:00
Karl Stratos
f9d57324c2 minimized complexity 2018-03-22 16:10:12 -05:00
Nathan Davies
663a3dea73 tidying up some of the code and adding comments. 2017-01-31 16:52:59 -08:00
Nathan Davies
e835e8c004 Added new flags
--discard_elements - allowing you to customise which elements are discarded
--ignored_tags - allowing you to customise which tags are ignored
--keep_tables - allows the contents of the tables in the original to articel to be retained. This does not render html tables
2017-01-23 14:18:21 +00:00
orangain
8749df0a81 Add .gitignore for python 2016-06-18 13:44:45 +09:00