Commit Graph

  • bb68ed7360
    Update README.md HjalmarrSv 2019-12-29 12:56:06 +0100
  • 49ad988149
    About cleaning HjalmarrSv 2019-12-29 12:54:51 +0100
  • 67a0ce7a15
    Update WikiExtractor.py HjalmarrSv 2019-12-29 10:41:32 +0100
  • 7198d0a736
    temporarily change --no-title to -no-title HjalmarrSv 2019-12-29 00:57:55 +0100
  • dc56429a41
    --no-templates as alias for --no_templates HjalmarrSv 2019-12-28 22:43:15 +0100
  • 8378de6bdd
    --no-title should work as --titlefree HjalmarrSv 2019-12-28 22:28:12 +0100
  • 22da7ea967
    --no-title as alias for --titlefree HjalmarrSv 2019-12-28 22:24:03 +0100
  • 866e073d53
    Update README.md HjalmarrSv 2019-12-23 10:15:53 +0100
  • de81a98a2d
    Update HjalmarrSv 2019-12-23 10:14:19 +0100
  • 7b2abd40ce
    Changed --spacefree to --squeeze-blank (same as in cat --squeeze-blank ) + magic word HjalmarrSv 2019-12-20 19:51:59 +0100
  • ef9c07a8ce
    -- spacefree now also removes line = " " HjalmarrSv 2019-12-14 09:07:46 +0100
  • 3b6a710768
    Added 2 examples HjalmarrSv 2019-12-13 12:58:38 +0100
  • 8d10750148
    Added Example HjalmarrSv 2019-12-13 12:33:55 +0100
  • eeda9548ca
    works but errors may exist as well as quirks HjalmarrSv 2019-12-13 12:08:11 +0100
  • d2e6330976
    fixed added errors HjalmarrSv 2019-12-13 11:40:00 +0100
  • 5870040640
    Update WikiExtractor.py HjalmarrSv 2019-12-13 02:47:18 +0100
  • 59e597cdb5
    Update WikiExtractor.py HjalmarrSv 2019-12-13 02:12:46 +0100
  • 053fde39dd
    Update WikiExtractor.py HjalmarrSv 2019-12-13 01:47:19 +0100
  • e2d332eb23
    Compacted Headers HjalmarrSv 2019-12-13 01:28:29 +0100
  • 124bcd6a42 Added option to include section title hierarchy Michał Bieroński 2019-10-28 15:50:50 +0100
  • 1305cf6746 add --category_surface Zae Myung Kim 2019-10-03 22:40:27 +0900
  • 511e7f13f1 move category extraction, usr argparse, exclude sortkey Zae Myung Kim 2019-10-03 16:26:49 +0900
  • 1ebc7701e3 extract page categories Zae Myung Kim 2019-10-03 15:34:29 +0900
  • fc48a575f9 Add --no-templates as an alias for --no_templates operator.name 2019-08-14 23:21:57 +0100
  • 7ab12f0606 Update syntax NICK ULVEN 2019-07-31 16:01:42 -0400
  • ff9a70cd6d Force 'utf-8' encoding without relying on platform-dependent default Albert Villanova del Moral 2019-07-13 18:21:43 +0200
  • 45662a5c91 Remove inline flags from the middle of a regex Albert Villanova del Moral 2019-07-13 12:21:03 +0200
  • f5942cdd05 removed 'import zip' from extract.py NICK ULVEN 2019-07-11 13:42:11 -0400
  • 41ae5246f1 modifying README.md mouhamadaboshokor 2019-07-09 14:26:07 +0300
  • 52f69e3263 pypi files mouhamadaboshokor 2019-07-09 14:18:14 +0300
  • daaee647eb pypi files mouhamadaboshokor 2019-07-09 13:55:18 +0300
  • fa60ec3f8e
    Merge branch 'master' into master Necmettin Çarkacı 2019-07-04 15:18:53 +0300
  • f664192bfa Updated README NICK ULVEN 2019-06-20 14:07:29 -0400
  • 557ef5ecf6 Updated library names NICK ULVEN 2019-06-20 12:42:14 -0400
  • 3162bb6c3c
    Merge pull request #137 from AriesLL/master Giuseppe Attardi 2019-04-13 12:41:15 +0200
  • 29e3a932dd
    Merge pull request #134 from dvzubarev/fix-crash Giuseppe Attardi 2019-04-13 12:40:09 +0200
  • f859630a20
    Merge branch 'master' into fix-crash Giuseppe Attardi 2019-04-13 12:39:41 +0200
  • 57a75c5f0a git push origin masterMerge branch 'nathj07-add_extra_fields_to_cirrus_output' attardi 2019-04-13 12:37:17 +0200
  • 93cbcdb9df Merge branch 'add_extra_fields_to_cirrus_output' of https://github.com/nathj07/wikiextractor into nathj07-add_extra_fields_to_cirrus_output attardi 2019-04-13 12:36:05 +0200
  • baa4794842 Merge branch 'zwChan-master' attardi 2019-04-13 12:22:59 +0200
  • 45c2212f64 Merge branch 'master' of https://github.com/zwChan/wikiextractor into zwChan-master attardi 2019-04-13 12:19:36 +0200
  • 5bf4df62fa
    Merge pull request #143 from danduma/master Giuseppe Attardi 2019-04-13 11:43:09 +0200
  • 275dcc9ac5
    Merge pull request #152 from karlstratos/master Giuseppe Attardi 2019-04-13 11:42:02 +0200
  • 1e4236de42 extract language and revion from cirrus search Nathan Davies 2019-03-25 14:28:43 +0000
  • f9d57324c2 minimized complexity Karl Stratos 2018-03-22 16:10:12 -0500
  • ecc7cef402 do not include title in text Karl Stratos 2018-03-22 12:51:47 -0500
  • e689ef3233 bash scripts for extraction commands Karl 2018-03-22 09:54:34 -0500
  • 4ba4e9f683 Augmented disambig regex to catch disambiguation pages marked by the switch __DISAMBIG__. Augmented key regex to catch plus/minus signs. Karl Stratos 2018-03-17 09:10:40 -0700
  • 5ab5595d68
    Merge 182a696a6a into 2a5e6aebc0 Olga Gureenkova 2018-02-15 12:33:59 +0000
  • 182a696a6a Fix broken disambiguation articles filtering Olga 2018-02-15 15:31:18 +0300
  • 5f59d5081b
    Merge eaef05232c into 2a5e6aebc0 Olga Gureenkova 2018-02-15 12:23:03 +0000
  • eaef05232c Merge branch 'fix_disambig' of https://github.com/my-master/wikiextractor into fix_disambig Olga 2018-02-15 15:22:27 +0300
  • ea46ae6802 Update regexp for disambig articles Olga 2018-02-15 15:21:52 +0300
  • fe24c8c1aa Fix pep8 style changes in prev commit Olga 2018-02-15 15:16:07 +0300
  • 5b9e2f9517 Update regexp for disambig articles Olga 2018-02-15 15:13:46 +0300
  • 533cac9001 Fix broken disambiguation filtering Olga 2018-02-15 14:30:03 +0300
  • cc7be51052 Merge 2656a1ea41 into 2a5e6aebc0 bz-intellimind 2018-01-12 07:50:26 +0000
  • 2656a1ea41 modified wikiextractor.py bz-intellimind 2018-01-11 23:48:03 -0800
  • 45e56d4e9e
    Update WikiExtractor.py Daniel 2017-11-08 14:28:36 +0000
  • 97e4d66c21 Store each article in exactly one file Necmettin Çarkacı 2017-09-13 08:23:01 +0300
  • 647d65d50e Merge 537ab9b4ef into 2a5e6aebc0 Versus 2017-08-16 10:23:41 +0000
  • 537ab9b4ef Handle broken pipe and keyboard interrupts Versus Void 2017-08-15 22:06:20 +0000
  • 209e2b422f change argument parser for no_templates Peipei Zhou 2017-08-10 14:51:54 -0700
  • 24db54b2c8 Fix crash on entry without namespace attribute. denin 2017-05-23 15:24:10 +0300
  • 169eaaf208 remove noisy print Zhiwei Chen 2017-04-29 12:53:19 -0400
  • e249508255 log categories statistics info Zhiwei Chen 2017-04-29 12:50:47 -0400
  • 397a92894b filter_categories use depth 4 under Health Zhiwei Chen 2017-04-29 12:44:13 -0400
  • 5274829e16 print friendly error msg Zhiwei Chen 2017-04-28 14:57:54 -0400
  • cc04dae71c log save to file; log page statistic info; Zhiwei Chen 2017-04-28 12:36:46 -0400
  • b8323a8efc encoding fix root 2017-04-28 02:04:29 -0400
  • 1f76fd9473 encoding fix Zhiwei Chen 2017-04-28 01:53:46 -0400
  • ef0af20178 fix category not utf8 error Zhiwei Chen 2017-04-28 01:42:21 -0400
  • 52ed1ef9ae fix category not utf8 error Zhiwei Chen 2017-04-28 01:23:31 -0400
  • 7903b739f5 fix category not utf8 error Zhiwei Chen 2017-04-28 01:17:45 -0400
  • 8e92f464cf add readme Zhiwei Chen 2017-04-27 20:15:17 -0400
  • 9cf2a2a883 add feature filtering by category of wiki Zhiwei Chen 2017-04-27 19:57:41 -0400
  • da5e626d9e Merge 56c37d69c2 into 2a5e6aebc0 Elias Zervudakis 2017-03-13 10:40:19 +0000
  • 1fa9815695 Merge b78a6934ee into 2a5e6aebc0 Eropi4 2017-03-13 10:40:19 +0000
  • 2a5e6aebc0 Merge pull request #119 from BrenBarn/compact-lists Giuseppe Attardi 2017-03-08 12:10:04 +0100
  • 674e9a0264 Fix problems that occurred when a list was the first thing in a section. BrenBarn 2017-03-08 01:01:31 -0800
  • 05cbe1502d Merge pull request #113 from nkruglikov/master Giuseppe Attardi 2017-03-04 13:26:26 +0100
  • ca93c03d87 Merge 0d616bef6d into 5414b7fda8 Michal Švamberg 2017-03-04 09:26:51 +0000
  • 5414b7fda8 Completed module String attardi 2017-03-04 04:22:30 +0100
  • c9432abcd0 Define #ifexists attardi 2017-03-03 19:44:48 +0100
  • 3ea2da809b Fix for empty templates. attardi 2017-03-03 18:52:17 +0100
  • aa6f567935 Update README.md Nikolai Kruglikov 2017-03-03 18:56:20 +0300
  • 0d616bef6d Better syntax about conditions root 2017-03-02 11:14:16 +0100
  • 8fd8da77f4 Updated version number. attardi 2017-03-02 05:58:05 +0100
  • 841576ec09 Fix URI escape syntax in href attribute root 2017-03-01 20:13:18 +0100
  • bc7823191c README.md: long lines split by newline Michal Svamberg 2017-03-01 17:08:56 +0100
  • 91130565d6 README.md: better format of usage Michal Svamberg 2017-03-01 17:04:51 +0100
  • defea1ac35 Fix typo in README.md Michal Svamberg 2017-03-01 16:59:09 +0100
  • f5d4237fbc Add numbered list in example usage in README.md Michal Svamberg 2017-03-01 16:55:15 +0100
  • 5d6d25d993 Fix formatting of README.md Michal Svamberg 2017-03-01 16:53:01 +0100
  • bddbcdc04c Changes for using makehtmlfiles.pl Michal Svamberg 2017-03-01 16:49:05 +0100
  • e8fdf5ca5f Add makehtmlfiles.pl script for offline browsing Michal Svamberg 2017-03-01 16:24:33 +0100
  • e3edc0c352 Merge pull request #108 from BrenBarn/globals-cleanup Giuseppe Attardi 2017-02-27 02:08:09 +0100
  • e7bb889e0e Removed some old comments BrenBarn 2017-02-26 12:41:58 -0800
  • ff51a19a1d Change to NextFile test so it will pass on Windows (use os.path.sep instead of /) BrenBarn 2017-02-26 12:02:11 -0800
  • 19d358eee8 Factor all info that needs to be passed to subprocesses into "options" variable BrenBarn 2017-02-26 11:49:00 -0800