Commit Graph

69 Commits

Author SHA1 Message Date
Giuseppe Attardi
113dab796c Fixed. 2015-11-20 00:06:23 +01:00
Giuseppe Attardi
e5720f5c52 See ChangeLog. 2015-11-20 00:04:59 +01:00
Giuseppe Attardi
9229e50bb3 Update README.md 2015-10-25 17:03:17 +01:00
orangain
02f9561100 Dropped redundant global declarations. 2015-10-17 11:48:16 +02:00
orangain
09a968809e Compliance to PEP 8. 2015-10-17 11:34:24 +02:00
Giuseppe Attardi
d7fc4788f4 See ChangeLog. 2015-09-29 15:31:19 +02:00
Giuseppe Attardi
90d1c1ebcf See ChangeLog. 2015-09-14 20:24:10 +02:00
Giuseppe Attardi
ecd24f3fc6 See ChangeLog. 2015-09-14 18:05:36 +02:00
Giuseppe Attardi
b8cd2574e0 See ChangeLog. 2015-08-30 21:52:02 +02:00
Giuseppe Attardi
bebdc8c899 Removed extra logging.debug. 2015-08-30 21:23:15 +02:00
Giuseppe Attardi
70956025f1 Minor fix contribution. 2015-08-30 21:18:17 +02:00
Giuseppe Attardi
d5b354597f See ChangeLog 2015-08-30 21:17:26 +02:00
Giuseppe Attardi
b7e676e1f5 Merge pull request #31 from orangain/fix-progress-report
Fix progress report
2015-08-13 10:25:28 +02:00
orangain
3cfa6dcee8 Fix progress report
Reported count and rate of processing were wrong:

* Reported number of extracted articles was fewer than the true value by 1.
* Reported rate of processing was completely different from the true value.
2015-08-13 01:11:57 +09:00
Giuseppe Attardi
5057c130cc Merge pull request #29 from Munzey/master
fix for #28 - discardElement tags should be case insensitive
2015-06-24 18:20:36 +02:00
tristan
7a1b552b0c fix for #28 - discardElement tags should be case insensitive 2015-06-24 17:55:20 +02:00
Giuseppe Attardi
d8a15dd0ba Merge pull request #27 from gojomo/multiprocessing
multiprocess speedup; stdin/stdout/single-file options; stable ordering; sparser progress logging
2015-06-20 08:58:10 +02:00
Gordon Mohr
55beb4a426 restore default section-handling 2015-06-19 18:06:34 -07:00
Gordon Mohr
d420d729e7 more summary/timing logging; less bulk/repeat logging 2015-06-19 17:42:21 -07:00
Gordon Mohr
5b647e2249 stable ordering; skip dups; accept compressed tempate-file 2015-06-19 03:15:45 -07:00
Gordon Mohr
190aae11a1 up processes default to # cores 2015-06-18 14:52:20 -07:00
Gordon Mohr
e3515e2ecf single-file; stdout; dir/multi-file 2015-06-18 14:49:03 -07:00
Gordon Mohr
5d32701400 messy 1st approach 2015-06-17 18:49:26 -07:00
Giuseppe Attardi
694cd5a7f4 Merge pull request #25 from dragoon/patch-2
multiline tag match fix
2015-06-14 09:12:30 +02:00
Roman Prokofyev
70ee947a8b multiline tag match fix
need to add re.DOTALL so that multiline tag definitions are also matched
2015-06-12 14:23:14 +02:00
Giuseppe Attardi
625d4b69b3 Merge pull request #24 from dragoon/patch-1
Fix regex for <ref> tag when it's not self-closing
2015-06-10 18:40:48 +02:00
Roman Prokofyev
b99aaf19aa Fix regex for <ref> tag when it's not self-closing
In some articles <ref> tag appears like this:

    <ref name="Ahmed Rashid/The Telegraph">{{cite

Previous regex breaks when it sees the forward slash ("Rashid/The"). New regex stops at the earliest occurrence of the closing bracket, no need to pre-filter characters.
2015-06-10 15:40:27 +02:00
Giuseppe Attardi
c15d93c40a See ChangeLog. 2015-06-03 00:06:35 +02:00
Giuseppe Attardi
147e36df5b See ChangeLog. 2015-06-03 00:01:45 +02:00
Giuseppe Attardi
5b0d88a16c See ChangeLog. 2015-05-29 20:52:27 +02:00
Giuseppe Attardi
f041f9143f Merge branch 'master' of https://github.com/attardi/wikiextractor 2015-05-06 16:09:12 +02:00
Giuseppe Attardi
d5cca5da43 See ChangeLog. 2015-05-06 16:08:27 +02:00
Giuseppe Attardi
45b4658e72 Update README.md 2015-04-26 08:57:25 +02:00
Giuseppe Attardi
b44b750056 Fixes for Chinese. 2015-04-26 08:47:53 +02:00
Giuseppe Attardi
f56f44caee Handle chinese characters in #expr. 2015-04-25 11:52:01 +02:00
Giuseppe Attardi
3524141cef Set UTF-8 as default. 2015-04-23 12:35:39 +02:00
Giuseppe Attardi
af68c87b3b See ChangeLog. 2015-04-22 17:07:08 +02:00
Giuseppe Attardi
55ac23ebe6 Fix to replaceInternalLinks. 2015-04-22 13:41:39 +02:00
Giuseppe Attardi
b949bdad5a Merge branch 'master' of https://github.com/attardi/wikiextractor
Edit done on site.
2015-04-22 12:43:24 +02:00
Giuseppe Attardi
363dd47666 See ChangeLog. 2015-04-22 12:42:42 +02:00
Giuseppe Attardi
4ecf09470d Update README.md 2015-04-21 17:27:34 +02:00
Giuseppe Attardi
094e601327 Update README.md 2015-04-21 17:22:41 +02:00
Giuseppe Attardi
4e07a6c149 See ChangeLog. 2015-04-20 21:19:05 +02:00
Giuseppe Attardi
2f35bcf9e0 See ChangeLog. 2015-04-20 07:14:29 +02:00
Giuseppe Attardi
cae955eb91 See ChangeLog. 2015-04-20 06:56:29 +02:00
Giuseppe Attardi
4c90d10860 See ChangeLog. 2015-04-20 06:19:32 +02:00
Giuseppe Attardi
66caade3ad See ChangeLog. 2015-04-19 13:17:48 +02:00
Giuseppe Attardi
858670beeb See ChangeLog/ 2015-04-19 11:32:39 +02:00
Giuseppe Attardi
1485e2b5bc See ChangeLog. 2015-04-19 00:18:48 +02:00
Giuseppe Attardi
54814eb806 See ChangeLog. 2015-04-17 00:37:30 +02:00