diff --git a/README.md b/README.md index 54dda91..b4fce85 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # WikiExtractor -[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](http://download.wikimedia.org/). +[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database dump](https://dumps.wikimedia.org/). The tool is written in Python and requires Python 3 but no additional library. **Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows. @@ -27,9 +27,9 @@ In order to speed up processing: The script may be invoked directly: - python -m wikiextractor.WikiExtractor + python -m wikiextractor.WikiExtractor -however it can also be installed from `PyPi` by doing: +It can also be installed from `PyPi` by doing: pip install wikiextractor diff --git a/wikiextractor/WikiExtractor.py b/wikiextractor/WikiExtractor.py index 87759c9..c8d1cd5 100755 --- a/wikiextractor/WikiExtractor.py +++ b/wikiextractor/WikiExtractor.py @@ -59,7 +59,7 @@ from io import StringIO from multiprocessing import Queue, Process, cpu_count from timeit import default_timer -from .extract import Extractor, ignoreTag +from .extract import Extractor, ignoreTag, define_template # ===========================================================================