Merge branch 'master' of https://github.com/attardi/wikiextractor
This commit is contained in:
commit
5b4302bca0
@ -4,7 +4,7 @@
|
|||||||
The tool is written in Python and requires Python 3 but no additional library.
|
The tool is written in Python and requires Python 3 but no additional library.
|
||||||
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
|
**Warning**: problems have been reported on Windows due to poor support for `StringIO` in the Python implementation on Windows.
|
||||||
|
|
||||||
For further information, see the [project Home Page](http://medialab.di.unipi.it/wiki/Wikipedia_Extractor) or the [Wiki](https://github.com/attardi/wikiextractor/wiki).
|
For further information, see the [Wiki](https://github.com/attardi/wikiextractor/wiki).
|
||||||
|
|
||||||
# Wikipedia Cirrus Extractor
|
# Wikipedia Cirrus Extractor
|
||||||
|
|
||||||
@ -50,7 +50,7 @@ The script is invoked with a Wikipedia dump file as an argument:
|
|||||||
python -m wikiextractor.WikiExtractor <Wikipedia dump file>
|
python -m wikiextractor.WikiExtractor <Wikipedia dump file>
|
||||||
|
|
||||||
The output is stored in several files of similar size in a given directory.
|
The output is stored in several files of similar size in a given directory.
|
||||||
Each file will contains several documents in this [document format](wiki/File-Format).
|
Each file will contains several documents in this [document format](https://github.com/attardi/wikiextractor/wiki/File-Format).
|
||||||
|
|
||||||
usage: WikiExtractor.py [-h] [-o OUTPUT] [-b n[KMG]] [-c] [--json] [--html]
|
usage: WikiExtractor.py [-h] [-o OUTPUT] [-b n[KMG]] [-c] [--json] [--html]
|
||||||
[-l] [-s] [--lists] [-ns ns1,ns2]
|
[-l] [-s] [--lists] [-ns ns1,ns2]
|
||||||
|
Loading…
Reference in New Issue
Block a user