Removing Textile From Wordpress
660 words | Posted by Scott on May 15th, 2009
Scott was 23.64 years old when he wrote this!
Filed under: General, My Website, PHP, Python
I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. The app converted blog code from ["text":http://www.SWHarden.com/ *like* _this_] to [text like this. ] While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as _delay_) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML.
I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:
## This Python (1.0) script removes *SOME* textile formatting from Wordpress
## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically,
## it corrects bold and itallic fonts and corrects links. It should be easy
## to expand if you need to do something else with it.
## Enjoy! --Scott Harden (www.SWHarden.com)
infile = 'x.sql' # < < THIS IS THE INPUT FILE NAME!
replacements= ["\r"," "],["\n"," \n "],["*:","* :"],["_:","_ :"],
["\n","<br>\n"],[">*","> *"],["*< ","* <"],
[">_","> _"],["_< ","_ <"],
[" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "]
#These are the easy replacements
def fixLinks(line):
## replace ["links":URL] with [<a href="URL">links</a>]. ##
words = line.split(" ")
for i in range(len(words)):
word = words[i]
if '":' in word:
upto=1
while (word.count('"')<2):
word = words[i-upto]+" "+word
upto+=1
word_orig = word
extra=""
word = word.split('":')
word[0]=word[0][1:]
for char in ".),'":
if word[1][-1]==char: extra=char
if len(extra)>0: word[1]=word[1][:-1]
word_new='<a href="%s">%s</a>'%(word[1],word[0])+extra
line=line.replace(word_orig,word_new)
return line
def stripTextile(orig):
## Handle the replacements and link fixing for each line. ##
if not orig.count("', '") == 13: return orig #non-normal post
line=orig
temp = line.split
line = line.split("', '",5)[2]
if len(line)<10:return orig #non-normal post
origline = line
line = " "+line
for replacement in replacements:
line = line.replace(replacement[0],replacement[1])
line=fixLinks(line)
line = orig.replace(origline,line)
return line
f=open(infile)
raw=f.readlines()
f.close
posts=0
for raw_i in range(len(raw)):
if raw[raw_i][:11]=="INSERT INTO":
if "wp_posts" in raw[raw_i]: #if it's a post, handle it!
posts+=1
print "on post",posts
raw[raw_i]=stripTextile(raw[raw_i])
print "WRITING..."
out = ""
for line in raw:
out+=line
f=open('o.sql','w')
f.write(out)
f.close()
I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)
This entry was posted on Friday, May 15th, 2009 at 5:56 pmand is filed under General, My Website, PHP, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.
One Response to “Removing Textile From Wordpress”
| gevv wrote the following at 12:58:48 PM on June 2nd, 2009 |
|
Good explanation thanks. |
