Import default_template.

This commit is contained in:
attardi 2020-12-04 09:52:05 +01:00
parent 87549a91a6
commit 3179a4c393
2 changed files with 4 additions and 4 deletions

View File

@ -1,5 +1,5 @@
# WikiExtractor # WikiExtractor
[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](http://download.wikimedia.org/). [WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](https://dumps.wikimedia.org/).
The tool is written in Python and requires Python 3 but no additional library. The tool is written in Python and requires Python 3 but no additional library.
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows. **Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
@ -27,9 +27,9 @@ In order to speed up processing:
The script may be invoked directly: The script may be invoked directly:
python -m wikiextractor.WikiExtractor python -m wikiextractor.WikiExtractor <Wikipedia dump file>
however it can also be installed from `PyPi` by doing: It can also be installed from `PyPi` by doing:
pip install wikiextractor pip install wikiextractor

View File

@ -59,7 +59,7 @@ from io import StringIO
from multiprocessing import Queue, Process, cpu_count from multiprocessing import Queue, Process, cpu_count
from timeit import default_timer from timeit import default_timer
from .extract import Extractor, ignoreTag from .extract import Extractor, ignoreTag, define_template
# =========================================================================== # ===========================================================================