Quantcast
Viewing all articles
Browse latest Browse all 8

Reply to Moving the pfSense® Documentation to GitHub on Wed, 06 Jun 2018 17:54:26 GMT

I used a php script to grab all the wiki pages from the DB and save them as flat files (note: I forgot to add the title at the top of each page in this version):

<?php
$servername = "localhost";
$username = "user";
$password = "pa$$";
$dbname = "mediaWiki";

function slugify($text)
{
  // replace non letter or digits by -
  $text = preg_replace('~[^\pL\d]+~u', '-', $text);

  // remove unwanted characters
  $text = preg_replace('~[^-\w]+~', '', $text);

  // trim
  $text = trim($text, '-');

  // remove duplicate -
  $text = preg_replace('~-+~', '-', $text);

  // lowercase
  $text = strtolower($text);

  if (empty($text)) {
    return 'n-a';
  }

  return $text;
}

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);

// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 

$sql = "SELECT page_title, page_touched, old_text FROM revision,page,text WHERE revision.rev_id=page.page_latest AND text.old_id=revision.rev_text_id AND page.page_namespace=0 AND substring(text.old_text,2,8) NOT IN ('REDIRECT')";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
    // output data of each row
    while($row = $result->fetch_assoc()) {
        $myfile = fopen(slugify($row["page_title"]).".mw", "w") or die("Unable to open file!");
        fwrite($myfile, $row["old_text"]);
        fclose($myfile);
    }
} else {
    echo "0 results";
}
$conn->close();

?>

I also found some commands on StackOverflow to download all the images as a zip file.

I used pandoc to convert the mediaWiki syntax to RST syntax (you could do markdown or whatever here and go in a different direction):

files=($(find . -type f -name '*.mw'))
for item in ${files[*]}
do
  filename=${item##*/}
  #printf "   %s\n" $filename
  
  pandoc $filename -f mediawiki -t rst -o ./output/${filename%.*}.rst || {  printf "   %s conversion failed\n" $filename ; }
done

Then massaged all that into the desired sphinx formatting that I wanted...it took several custom python/bash scripts to clean up the pandoc conversion (it isn't perfect).

Then I built the sphinx docs as HTML and ran the npm package broken-link-checker-local against it to check for broken links (more python scripts involved to fix them).

I also used git as a backup so I could git checkout if my scripts blew anything up along the way.

That's about all the advise I can offer... It's a lot of work, but worth it in the end. Good Luck!


Viewing all articles
Browse latest Browse all 8

Trending Articles