Import default_template.
This commit is contained in:
parent
87549a91a6
commit
3179a4c393
@ -1,5 +1,5 @@
|
|||||||
# WikiExtractor
|
# WikiExtractor
|
||||||
[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](http://download.wikimedia.org/).
|
[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](https://dumps.wikimedia.org/).
|
||||||
|
|
||||||
The tool is written in Python and requires Python 3 but no additional library.
|
The tool is written in Python and requires Python 3 but no additional library.
|
||||||
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
|
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
|
||||||
@ -27,9 +27,9 @@ In order to speed up processing:
|
|||||||
|
|
||||||
The script may be invoked directly:
|
The script may be invoked directly:
|
||||||
|
|
||||||
python -m wikiextractor.WikiExtractor
|
python -m wikiextractor.WikiExtractor <Wikipedia dump file>
|
||||||
|
|
||||||
however it can also be installed from `PyPi` by doing:
|
It can also be installed from `PyPi` by doing:
|
||||||
|
|
||||||
pip install wikiextractor
|
pip install wikiextractor
|
||||||
|
|
||||||
|
@ -59,7 +59,7 @@ from io import StringIO
|
|||||||
from multiprocessing import Queue, Process, cpu_count
|
from multiprocessing import Queue, Process, cpu_count
|
||||||
from timeit import default_timer
|
from timeit import default_timer
|
||||||
|
|
||||||
from .extract import Extractor, ignoreTag
|
from .extract import Extractor, ignoreTag, define_template
|
||||||
|
|
||||||
# ===========================================================================
|
# ===========================================================================
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user