attardi
d167742d16
See ChangeLog.
2016-08-29 23:34:47 +02:00
attardi
5cb7da320e
See ChangeLog.
2016-08-19 14:11:37 +02:00
attardi
2942d1e19d
See ChangeLog.
2016-08-11 10:34:21 +02:00
Giuseppe Attardi
419fe97d7a
Merge pull request #67 from sethcleveland/remove_python2_string_encoding
...
Remove python2 extract utf8 encoding and log extract exceptions
2016-06-24 09:03:11 +02:00
Seth Cleveland
eacccbc6eb
Remove python2 extract utf8 encoding and log extract exceptions
2016-06-20 10:39:01 -05:00
attardi
0f703c0aae
Merged PR from Seth Cleveland.
2016-06-19 13:10:36 +02:00
attardi
f9b8e8ac02
Added support for Python 3.
2016-06-19 12:53:31 +02:00
Giuseppe Attardi
aee387b566
Merge pull request #66 from orangain/support-python3
...
Support Python 3
2016-06-19 12:46:52 +02:00
orangain
3ccd368aa6
Update README.md about Python 2/3 support
2016-06-18 13:44:45 +09:00
orangain
cb7d42d10f
Add tox.ini
2016-06-18 13:44:45 +09:00
orangain
7b21d10ccd
Seek StringIO to position 0 on every output
...
truncate(0) does not guarantee that the position is seeked to 0.
2016-06-18 13:44:45 +09:00
orangain
b19e341ce2
Use text type as a page text and encode them when writing to a file
2016-06-18 13:44:45 +09:00
orangain
6851fe4b3f
Use // instead of / for integer division
...
Add `division` to future import.
2016-06-18 13:44:45 +09:00
orangain
9322b7ba54
Make imports Python 2/3 compatible
...
* Use `from __future__ import unicode_literals` and replace `u''` literals
with `''`.
* Use `io.StringIO` instead of `cStringIO.StringIO` for Py2/3 compatibility.
* Define a const `PY2` which is True in Python 2 but False in Python 3.
* Import `quote` and `name2codepoint` from differenct modules between
Python 2 and 3.
* Use Python 3's name in Python 2 for `zip`, `zip_longest`, `range` and `chr`.
* Use `text_type` as a type for `unicode` in Python 2 and `str` in Python 3.
* Use `sorted()` to sort dict's `items()`.
* Implement `__next__` in NextFile and call next() built-in function.
2016-06-18 13:44:45 +09:00
orangain
8749df0a81
Add .gitignore for python
2016-06-18 13:44:45 +09:00
orangain
d44b8130b5
Add test cases for Python 2
2016-06-18 13:44:45 +09:00
attardi
60e4082440
See ChangeLog.
2016-03-23 15:14:13 +01:00
attardi
bcc3d124b4
See ChangeLog.
2016-03-19 11:36:24 +01:00
Giuseppe Attardi
9521b90c08
Merge pull request #56 from spyysalo/master
...
Match internal links in external links (fixes #55 )
2016-03-12 12:48:50 +01:00
Sampo Pyysalo
7a5b5e5765
Match internal links in external links
...
See attardi/wikiextractor/issues/#55
2016-03-12 11:42:06 +00:00
attardi
6af9c283eb
See ChangeLog.
2016-03-12 08:15:01 +01:00
attardi
ab0d008512
See ChangeLog.
2016-03-06 17:27:39 +01:00
attardi
3bdcf6a4ad
Reduce spool queue to 10%.
2016-02-20 12:13:23 +01:00
attardi
0726948142
Typo.
2016-02-20 10:49:21 +01:00
attardi
730cfc07f9
See ChangeLog.
2016-02-20 10:45:58 +01:00
attardi
ca2a34ccce
See ChangeLog.
2016-02-15 09:04:46 +01:00
attardi
6d0577ef10
See ChangeLog.
2016-02-15 01:22:38 +01:00
attardi
834cad6a35
See ChangeLog.
2016-02-12 23:31:21 +01:00
attardi
b04760ecd8
See ChangeLog.
2016-02-12 18:16:54 +01:00
attardi
6f22be4702
Merge branch 'master' of https://github.com/attardi/wikiextractor
2016-02-11 13:21:14 +01:00
attardi
911dacda3a
Added emulation of Lua module If_empty.
2016-02-11 13:20:23 +01:00
Giuseppe Attardi
36f9467c33
Merge pull request #48 from rom1504/patch-1
...
fix typo in Wikipedia Cirrus Extractor section
2016-02-11 10:28:47 +01:00
Romain Beaumont
3eb4c4e3c3
fix typo in Wikipedia Cirrus Extractor section
2016-02-11 09:55:55 +01:00
attardi
8dcf73bd3e
Merge branch 'master' of https://github.com/attardi/wikiextractor
2016-02-11 01:04:25 +01:00
attardi
b2c371678c
See ChangeLog.
2016-02-11 01:03:31 +01:00
Giuseppe Attardi
c8afa84e95
Merge pull request #46 from mrshu/mrshu/add-setup
...
update: Add setup.py
2016-02-06 02:42:52 +01:00
mr.Shu
22103664fc
update: Add setup.py
...
* Add the first version of setup.py in order to simplify creation of a
real `wikiextractor` command.
Signed-off-by: mr.Shu <mr@shu.io>
2016-02-05 23:56:36 +01:00
attardi
fc89e2514e
See ChangeLog.
2016-02-04 11:23:40 +01:00
attardi
49464c0210
Merge branch 'master' of https://github.com/attardi/wikiextractor
2016-02-04 11:09:31 +01:00
attardi
3cebfdd4c0
Updated Copyright.
2016-02-04 11:08:37 +01:00
Giuseppe Attardi
0bb3061e79
Update README.md
2015-12-03 13:00:12 +01:00
Giuseppe Attardi
a412c7e3ab
Merge pull request #37 from nathj07/escape_extracted_text
...
added a new flag and it's usage
2015-12-02 14:47:11 +01:00
Giuseppe Attardi
03e18ffbc8
See ChangeLog.
2015-11-20 00:34:23 +01:00
Giuseppe Attardi
285b119370
Remove DEBUG.
2015-11-20 00:07:50 +01:00
Giuseppe Attardi
113dab796c
Fixed.
2015-11-20 00:06:23 +01:00
Giuseppe Attardi
e5720f5c52
See ChangeLog.
2015-11-20 00:04:59 +01:00
Nathan Davies
811d32e98d
Updating the README with new help
2015-11-13 08:32:32 -08:00
Nathan Davies
d1e21c2b6a
added a new flag and it's usage
...
The new flag is --escapedoc and if set the clean function runs cgi.escape(text) before return this text to be included in <doc></doc>.
This is a non-breaking change
2015-11-13 03:01:44 -08:00
Giuseppe Attardi
9229e50bb3
Update README.md
2015-10-25 17:03:17 +01:00
orangain
02f9561100
Dropped redundant global declarations.
2015-10-17 11:48:16 +02:00