11:51:46 pm on 3/20/10
Menu
» Home
» About Scott
» QRSS VD
» Old Stuff
» Archive
» Contact

Categories
» C/C++
» Circuitry
» Dentistry
» DIY ECG
» General
» Linux
» Microcontrollers
» Molecular Biology
» My Website
» PHP
» Prime Numbers
» Python
» Radio
» UCF Lab
» Everything
Writings
» MD Labels
» Streamrip
» AIM Thoughts
» WindowsXP?
» Partitioning
» CD/DVD Repair
» Monitor Info
» CRT Deflection
» Venomcrack
» Flash Thing
» Heart/Brain
» Diabetes
» Triops
» Biomed

Friends
» Fred
» Kyle W
» Nick
» Louis
» Tom
» Kyle H




Archives
» March 2010
» February 2010
» January 2010
» December 2009
» September 2009
» August 2009
» July 2009
» June 2009
» May 2009
» April 2009
» March 2009
» February 2009
» January 2009
» December 2008
» November 2008
» October 2008
» September 2008
» September 2007
» December 2006
» August 2006
» January 2006
» August 2005
» July 2005
» June 2005
» May 2005
» April 2005
» March 2005
» February 2005
» January 2005
» December 2004
» November 2004
» October 2004
» September 2004
» August 2004
» July 2004
» June 2004
» May 2004
» April 2004
» March 2004
» February 2004
» January 2004
» December 2003
» November 2003
» October 2003
» September 2003
» August 2003
» July 2003
» June 2003
» May 2003
» April 2003
» March 2003
» February 2003
» January 2003
» December 2002
» November 2002
» October 2002
» September 2002
» June 2001
« PHP-Generated Access.log is a Success
ATTiny2313 Controlling a HD44780 LCD via AVR-GCC »


Removing Textile From Wordpress
660 wordsPosted by Scott on May 15th, 2009
Filed under: General, My Website, PHP, Python

I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. The app converted blog code from ["text":http://www.SWHarden.com/ *like* _this_] to [text like this. ] While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as _delay_) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML.
I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:

## This Python (1.0) script removes *SOME* textile formatting from Wordpress
## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically,
## it corrects bold and itallic fonts and corrects links. It should be easy
## to expand if you need to do something else with it.
## Enjoy! --Scott Harden (www.SWHarden.com)

infile = 'x.sql' # < < THIS IS THE INPUT FILE NAME!

replacements=   ["\r"," "],["\n"," \n "],["*:","* :"],["_:","_ :"],
                ["\n","<br>\n"],[">*","> *"],["*< ","* <"],
                [">_","> _"],["_< ","_ <"],
                [" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "]
                #These are the easy replacements

def fixLinks(line):
    ## replace ["links":URL] with [<a href="URL">links</a>]. ##
    words = line.split(" ")
    for i in range(len(words)):
        word = words[i]
        if '":' in word:
            upto=1
            while (word.count('"')&lt;2):
                word = words[i-upto]+" "+word
                upto+=1
            word_orig = word
            extra=""
            word = word.split('":')
            word[0]=word[0][1:]
            for char in ".),'":
                if word[1][-1]==char: extra=char
            if len(extra)>0: word[1]=word[1][:-1]
            word_new='<a href="%s">%s</a>'%(word[1],word[0])+extra
            line=line.replace(word_orig,word_new)
    return line

def stripTextile(orig):
    ## Handle the replacements and link fixing for each line. ##
    if not orig.count("', '") == 13: return orig #non-normal post
    line=orig
    temp = line.split
    line = line.split("', '",5)[2]
    if len(line)&lt;10:return orig #non-normal post
    origline = line
    line = " "+line
    for replacement in replacements:
        line = line.replace(replacement[0],replacement[1])
    line=fixLinks(line)
    line = orig.replace(origline,line)
    return line

f=open(infile)
raw=f.readlines()
f.close
posts=0
for raw_i in range(len(raw)):
    if raw[raw_i][:11]=="INSERT INTO":
        if "wp_posts" in raw[raw_i]: #if it's a post, handle it!
            posts+=1
            print "on post",posts
            raw[raw_i]=stripTextile(raw[raw_i])

print "WRITING..."
out = ""
for line in raw:
    out+=line
f=open('o.sql','w')
f.write(out)
f.close()

I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)

hostingwork





This entry was posted on Friday, May 15th, 2009 at 5:56 pmand is filed under General, My Website, PHP, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.



One Response to “Removing Textile From Wordpress”

gevv wrote the following at 12:58:48 PM on June 2nd, 2009

Good explanation thanks.

Leave a Reply




copyright © 2006 swharden@gmail.com