More permanent stuff at http://www.dusbabek.org/~garyd

22 February 2009

Restoring Myth Programs Whilst Maintaining Sanity

My MythTv system has been running continually, more or less, since October 2006. Not too long, but long enough. Long enough, in fact, for cruft to creep in. I discovered this on Saturday when I went to restore data to the replacement for the myth drive affectionately known as "xfs2" that died about a month ago.

That drive had a capacity of 250GB. My myth data is important to me, but not that important. I had it set to back up once a month to another computer in the house via rsync. I didn't pay too close attention to the way I used rsync though--instead of removing the remote files that had been removed locally, they just stayed there. In other words, unneeded data on the backup never went away, even when I deleted shows on the myth system.

So when I went to copy 274GB of data to a drive that would only hold 238GB, things went haywire.

Thankfully, I am a programmer. I am equipped for these kinds of situations.

I needed to figure out which shows missing locally that were present on the backups. These are the files that needed to be restored. After that, I figured it would be gravy if I could remove the orphans that were in the database but not on any file system, be it local or backup.

It turns out that python is incredibly easy to use with MySQL. I created a simple program that would restore from my backup and also give me a list of orphaned programs.

Seems easy, right? The complexity comes when both local and backup storage are strewn across different drives, directories and hosts. I made it simpler by using smbmount to mount the backup system so it appeared more or less local. After that it became a matter of letting the script run* and then cleaning up the orphans with a simple sql statement.

Source code for myth.py is at the bottom of this post.

* This turned out to be tricky. Something on my myth host causes it to intermittently hang when copying files to or from a remote host. It could be my router for all I know. I do know that *any* keystroke received by the myth system causes it to wake up and start accepting traffic again. Meanwhile, the system clock thinks nothing has happened and starts up again at the same tick where it fell asleep--so the clock is off. This problem first started when I decided to upgrade to Ubuntu 8.10 and MythTv 0.21 on the same day. Truly disturbing, I know. I circumvent this by bandwidth-limiting scp to 2Mbit for the transfer. Transfers take longer, but at least they complete. And yes, I am retiring this system in a few weeks. :)


import MySQLdb as db
import os
import sys

existing_video_dirs = ["/xfs1/myth/video", "/xfs2/myth/video"]
# If you use scp, backup_video_dirs and backup_scp_paths need to be maintained
# in parallel. For sure though, backup_video_dirs need to be locally mounted.
backup_video_dirs = ["/mnt/remote_backups/myth_backup/xfs1/myth/video", "/mnt/remote_backups/myth_backup/xfs2/myth/video"]
backup_scp_paths = ["garyd@child:/mnt/xfs3/myth_backup/xfs1/myth/video", "garyd@child:/mnt/xfs3/myth_backup/xfs2/myth/video"]
backup_info_dict = zip(backup_video_dirs, backup_scp_paths)
possible_file_extensions = ["mpg", "nuv"]

# this is the place video gets copied to.
restore_path = "/xfs2/myth/video/"

# connect to the database and get the list of shows.
con = db.connect(host="localhost", user="root", passwd="root", db="mythconverg")
cur = con.cursor()
cur.execute("select chanid, starttime, title from recordedprogram order by chanid, starttime")
rows = cur.fetchall()

# keep track of the number that are there or not.
there = 0
not_there = 0
restored = 0

for row in rows:
# name of the file is based on row values, plus a file extension.
fname = "%d_%s" % (row[0], row[1].strftime("%Y%m%d%H%M%S"))
# see if the file exists in the local myth dirs. If it does, there is no
# action.
exists = False
for dir in existing_video_dirs:
for ext in possible_file_extensions:
path = os.path.join(dir, fname + "." + ext)
if os.path.exists(path):
exists = True
if exists:
there += 1
else:
# if the file does not exist locally, look at the backup directories to
# see if it is there.
not_there += 1
backup_exists = False
backup_at = None # path of backup file.
backup_scp = None # scp path of backup file.
#for tup in backup_info_dict:
for dir, scp in backup_info_dict:
for ext in possible_file_extensions:
path = os.path.join(dir, fname + "." + ext)
if os.path.exists(path):
# found a backup! use wild cards so that the video file
# and its preview get copied.
backup_at = path + "*"
backup_scp = scp + "/" + fname + "." + ext + "*"
backup_exists = True
if not backup_exists:
print "No backup for %s" % (fname)
else:
# move the backup to the live system. use cp or scp depending on
# your preference.
cmd = "cp %s %s" % (backup_at, restore_path)
# something on my system is jacked. I need to bandwidth limit scp
# or else the myth server stalls.
cmd = "scp -l16000 %s %s" % (backup_scp, restore_path)
print cmd
os.popen(cmd)
restored += 1

# output the stats.
print "there:%d not_there:%d restored:%d" % (there, not_there, restored)


Update: fixed tabbing in code. Should have used pastebin.

0 comments: