Update README.md

This commit is contained in:
Giuseppe Attardi 2020-12-04 19:12:01 +01:00 committed by GitHub
parent 2ba214ab99
commit d2732b1477
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -4,7 +4,7 @@
The tool is written in Python and requires Python 3 but no additional library.
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
For further information, see the [project Home Page](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) or the [Wiki](https://github.com/attardi/wikiextractor/wiki).
For further information, see the [Wiki](https://github.com/attardi/wikiextractor/wiki).
# Wikipedia Cirrus Extractor
@ -50,7 +50,7 @@ The script is invoked with a Wikipedia dump file as an argument:
python -m wikiextractor.WikiExtractor <Wikipedia dump file>
The output is stored in several files of similar size in a given directory.
Each file will contains several documents in this [document format](wiki/File-Format).
Each file will contains several documents in this [document format](https://github.com/attardi/wikiextractor/wiki/File-Format).
usage: WikiExtractor.py [-h] [-o OUTPUT] [-b n[KMG]] [-c] [--json] [--html]
[-l] [-s] [--lists] [-ns ns1,ns2]