This commit is contained in:
Giuseppe Attardi 2022-03-07 10:40:46 +01:00
commit f0ca16c3e9
2 changed files with 2 additions and 2 deletions

View File

@ -1,5 +1,5 @@
# WikiExtractor # WikiExtractor
[WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database backup dump](https://dumps.wikimedia.org/). [WikiExtractor.py](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) is a Python script that extracts and cleans text from a [Wikipedia database backup dump](https://dumps.wikimedia.org/), e.g. https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 for English.
The tool is written in Python and requires Python 3 but no additional library. The tool is written in Python and requires Python 3 but no additional library.
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows. **Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.

View File

@ -68,7 +68,7 @@ from .extract import Extractor, ignoreTag, define_template, acceptedNamespaces
# =========================================================================== # ===========================================================================
# Program version # Program version
__version__ = '3.0.5' __version__ = '3.0.6'
## ##
# Defined in <siteinfo> # Defined in <siteinfo>