I used a php script to grab all the wiki pages from the DB and save them as flat files (note: I forgot to add the title at the top of each page in this version):
<?php
$servername = "localhost";
$username = "user";
$password = "pa$$";
$dbname = "mediaWiki";
function slugify($text)
{
// replace non letter or digits by -
$text = preg_replace('~[^\pL\d]+~u', '-', $text);
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
// trim
$text = trim($text, '-');
// remove duplicate -
$text = preg_replace('~-+~', '-', $text);
// lowercase
$text = strtolower($text);
if (empty($text)) {
return 'n-a';
}
return $text;
}
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$sql = "SELECT page_title, page_touched, old_text FROM revision,page,text WHERE revision.rev_id=page.page_latest AND text.old_id=revision.rev_text_id AND page.page_namespace=0 AND substring(text.old_text,2,8) NOT IN ('REDIRECT')";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
// output data of each row
while($row = $result->fetch_assoc()) {
$myfile = fopen(slugify($row["page_title"]).".mw", "w") or die("Unable to open file!");
fwrite($myfile, $row["old_text"]);
fclose($myfile);
}
} else {
echo "0 results";
}
$conn->close();
?>
I also found some commands on StackOverflow to download all the images as a zip file.
I used pandoc to convert the mediaWiki syntax to RST syntax (you could do markdown or whatever here and go in a different direction):
files=($(find . -type f -name '*.mw'))
for item in ${files[*]}
do
filename=${item##*/}
#printf " %s\n" $filename
pandoc $filename -f mediawiki -t rst -o ./output/${filename%.*}.rst || { printf " %s conversion failed\n" $filename ; }
done
Then massaged all that into the desired sphinx formatting that I wanted...it took several custom python/bash scripts to clean up the pandoc conversion (it isn't perfect).
Then I built the sphinx docs as HTML and ran the npm package broken-link-checker-local against it to check for broken links (more python scripts involved to fix them).
I also used git as a backup so I could git checkout
if my scripts blew anything up along the way.
That's about all the advise I can offer... It's a lot of work, but worth it in the end. Good Luck!