Extracting Data from XML

Python does have tools for grepping XML files, but I’ve never been able to get them to work to my liking. I’ve generally just stripped out the data I need. And I will continue to do so, as it’s probably much faster than filtering through all of the crud I don’t need.

    library = os.path.expanduser('~')+'/Music/iTunes/iTunes Music Library.xml'
    data = open(library).readlines()
        
    tracks = {}
    this_track = 0
    for line in data:
        if line.count('<key>Track ID'):
            this_track = line.split('integer>')[1][:-2]
        elif line.count('<key>Location</key>'):
            tracks[this_track] = urllib.url2pathname(line.split('string>')[1][16:-2]).replace('&#38;','&')

The above code will search through the library file, and grab info on each track: just the database ID, and the location (which is a URI, encoded to remove spaces and dodgy characters. The info is then put into a dictionary, where the key is the database ID, and the value is the location. Note that there is a replace() at the end of the last line - for some reason python’s urllib.url2pathname() function doesn’t replace & characters - I guess that’s because these aren’t really intended to be in a filename. Also, on my NSLU2 the extended characters are replaced by underscores, but I’m going to update to samba 3 (at the risk of mucking up the entire library…) to see if this fixes that issue. Anyway, after coding this, I had a bit of a think, and came up with the following method of doing the same (ensure it’s all on one line):

    grep Location ~/Music/iTunes/iTunes\ Music\ Library.xml |
      awk 'sub("<key>Location</key%gt;<string>file://localhost","",$1)' |
      sed 'sx</string>xx'

The python version uses between 5-8 seconds of CPU time, the grep version around 1.5, but does not associate the database ID’s with the locations, which I need. It also looks to be much easier to do the changing of characters (%20, for instance, into a space) that I need to do so I can check to see if files exist. Actually, using urllib.urlopen(), I can use the escaped/quoted version to see if the file exists, but it might be slow.

Find Missing Tracks

Continuing on from my last programming related post, I’ve been giving some thought to the opposite problem: when tracks are in the iTunes Library, but the file they refer to is not there. iTunes marks these with an exclaimation mark in front of the track, but only if it realises they are in fact missing. Often, just clicking on a track does not mean iTunes knows that track is missing. A Get Info on such a track prompts the user for a choice between removing the track from the library, or locating it. It might be possible to use the following AppleScript paradigm to get a list of the tracks that are missing:

tell application "iTunes"
    set theTracks to every track in playlist 1
    set missing to {}
    repeat with theTrack in theracks
        if location of theTrack is [unknown] then
            set missing to missing + theTrack
        end if
    end repeat
end tell

Of course, that would be dead slow, as it relies on a couple of AppleScript events for each track. (Also, it may not work, but some sort of try - else - end try clause might work.) A better solution might be to sort through the iTunes Music Library.xml file.

    import os
        
    data = open("iTunes Music Library.xml").readlines()
    missing = []
        
    for line in data:
        if "<location>" in line:
            #extract location
            location = line.split("/Volumes")[1].split("</location>")[0]
            #check if location is not a file
            if not os.path.isfile(location):
                missing.append(location)

Assuming that the mount point is the same as the last time the XML file was written (and iTunes seems to be pretty clever about working out things like /Volumes/Media vs. /Volumes/Media-1, then your list missing should contain a list of files that are in the library, but not where iTunes expects them to be. It might be possible to then match these two lists (from this script), and see which ones are likely to be ‘modified tracks’, where one user has changed the track name, artist or album. It may even be worthwhile having a list of all track locations, and seeing if there are any duplicates, that will solve the problem of duplicates int he library that are the same file. I’d like to have python parse the XML file, and create data structures reflecting the XML data - but skip everything but the database ID and filename, as these are the things that we really need - but I haven’t had much luck with this type of process in the past. I jsut had a thought on how to do it, though.

Sing-along Music

Note to self: Classical Music is good public transport music. Why? Because you are less likely to sing along. People look strangely at you when you sing on the train, for some reason.

Find tracks not in iTunes library

I’ve written a python script that walks through a path tree, checking to see if each file in the tree is a track in the current user’s iTunes Music Library.xml file.

    import os, re
    
    startpath = "/Volumes/Media/Music/"
    prefix = "file://localhost"
    library = os.path.expanduser("~")+"/Music/iTunes/iTunes Music Library.xml"
    
    def eachpath(arg, path, tracks):
        for track in tracks:
            if os.path.isfile(os.path.abspath(path)+'/'+track):
                trackpath = os.path.join(os.path.abspath(path),track)
                grepstr = prefix+trackpath.replace(" ","%20")
                if grepstr not in data:
                    arg.append(grepstr)
    
    data = open(library).read()
    missing = []
    os.path.walk(startpath, eachpath, missing)
    
    print missing

It’s not flawless - on my machine it eats up over 11 meg of memory, and takes ages to run, but as a proof of concept, it works okay. The memory it uses is mostly because is stores the whole iTunes library file in memory, so that’s 9 meg on my system already. The main loop is doing a string1 not in string2, which is probably not optimal, but it was easy to code, for now. I’m still waiting to see how long it takes to do my whole library, but I’m getting bored with waiting. Edit: to reduce the time taken, I used the following code in the final if clause in the function:

try:
    if not re.search(grepstr, data):
        arg.append(grepstr)
except:
    if grepstr not in data:
        arg.append(grepstr)

The re one is much faster, but fails in some cases: the second one, while slower, is a fallback. There are also some other issues, at this stage I have not cared that much about escaped characters, which iTunes uses when storing the information. But, I came up with a quicker method than python’s os.path.walk(). Using the find command is much quicker:

    find /Volumes/Media/Music -type f -not -name .aacgained -not -name ._* -not -name .DS_Store

takes between 12-36 seconds for my 5700+ library stored on my NSLU2. If I telnet into the NSLU2 and run the equivalent command:

    find ~media/Music -type f -not -name .aacgained -not -name ._* -not -name .DS_Store

it takes on average less than one second to complete. So, that’s more than an order of magnitude, even if the network traffic is low. Oh, and it compares very favourably with the python version, which takes at least one minute to run.

iTunes Shared Library

Jaq and I share two computers, an iMac G4, and a Dell PC. I also bought a Linksys NSLU2 and a large USB Hard Drive, so that all of our music and videos can be stored on a server, and accessed from either computer (or the Xbox) without having to make sure the iMac was on. (That was the main computer, and the one we fight to get onto). Of course, the NSLU2 helped remove clutter from the iMac’s hard drive, not to mention freeing up a heap of space. Anyway, because we have a rather large music collection, it’s meaningless and wasteful to have copies of music stored in two places - I set up an SMB share on the NSLU2, wrote a small AppleScript to mount this on bootup, and pointed iTunes towards this location. This has the feature of storing one copy of all of our music, in the one location. There are some drawbacks, however:

  1. If I import music, it doesn’t appear in Jaq’s library by default. Similarly, if she imports, I don’t see it. Every now and then you need to drag the Music folder onto iTunes, and wait for it to update the library. Both of us need to do this, incase both of us have imported music.
  2. If one of us edits a track’s artist, title or album, iTunes for the other user sometimes cannot find the track. If you then re-import you wind up with two copies of the track, one of them (the one with the rating, and playcount) is a ‘dead track’.
  3. Sometimes a re-import causes a track to appear twice in the library. Sometimes it creates a second copy of the file in the directory.
  4. There is a lot of music in our library that one or other of us doesn’t really like that much. For instance, I listen to a lot of Classical music, but don’t like Red Hot Chilli Peppers. Even if you remove a track from your library, it gets re-added when you do a re-import.

Of course, there are some great benefits, too:

  1. Each person gets to have their own rating and playcount for each track. Initially we had a shared iTunes library file (both of us had read/write access to it), which worked well when only one user could run iTunes at a time, but fails dismally when multi-user is taken into account.
  2. Our iTunes library only takes up half of the space.
  3. I can modify the tags belonging to a track, and it gets propogated to her library.

I’m fairly confident the benefits outweigh the costs, but I’m still keen to come up with a better solution. Here are some ideas I have had to resolve some of the issues. iTunes stores a copy of it’s library in an XML file - it should be a trivial task to scan this and get some information that might be useful. For instance, compare the location field of each track to the directory structure, and work out if there are tracks that need to be added, or the path they have needs to be updated.

This requires a couple of things: * A decent XML parser (in some cases just a simple grep will do the trick - for instance seeing if a filename exists in the XML file). * An interface (AppleScript) to iTunes, to tell it to add/remove/re-locate files.

iPod Rating

I have been using Jaq’s iPod the last couple of days - mainly because when I am waiting at the Adelaide Railway Station, and at Lynton Station, I cannot get good enough radio reception on my phone’s radio. So, I copied all of my Classical music onto the iPod, and I’ve been listening away. I’d like to be able to rate tracks on the iPod, but I don’t really want to, for one big reason. I would only be able to rate them at 0-5 stars, not using the 0-100% ratings I use from iTunesRater. I wonder if it’s possible to hack the iPod firmware so that rating occurs by a finer gradient?

Funny Ha Ha?

Funny or interesting things I’ve seen, thought of or heard during the last day or so.

  1. I misread the colour of something in the hardware store yesterday - I thought it said Missing Brown.
  2. I thought I was going to be waiting for ages for the train this morning, as when I got to the crossing, I could still see the back of the last one. When I got to the platform, the next one was coming. If only the trains came that often all of the time. Oh, and the one I ended up catching was virtually empty. Not surprisingly.
  3. When my train was going through Keswick station this morning, there was a conjunction of trains, where all four were at nearly the same place at the same time. Different tracks, obviously.
  4. I misread fortune cookie as fortune wookie. Okay, that one isn’t that funny either.

Word Crapness

Microsoft Wordâ„¢ 2002 (10.2627.2625) has an extremely crap ‘new feature’. When you press the Delete key with a block of text selected, nothing happens. If you notice down in the bottom left corner, Word asks if you want to delete the block, requiring you to press the Y key. Talk about crap interface design. Having to press two keys to delete a block of text, when there is a perfectly good undo system in place if you accidentally delete the wrong text. How often do you accidentally delete a block of text, anyway? And why not make the consistency of having to press ‘y’ when you go to overtype a block of text? Or press Enter when a block of text is selected? I’m now going to have to use the latter, rather than delete. Microsoft, you guys suck.

Bizarre Blogsome Behaviour

I just posted an update on what I’ve done to to make Connections work on Blogsome. When I did so, it crapped out my stylesheet and layout. Just having the info in a post that is normally only in the template files caused WordPress Multi-User to wigg out. And I didn’t just have the text, it was all escaped and >‘d (I think I just created a new term, >‘d!!) I’ll have to just have the files hosted somewhere else, and linked to again, I guess. Update: It was actually the Smarty Tags that were wigging out, or more correctly, the CSS code that Smarty thought were Smarty Tags.

Competitive Living

I was thinking tonight of how we used to live - specifically, how my first share house ‘worked’. There were four of us, originally, living at 81 Alabama Avenue, Prospect. All four had been at school together (to some extent) at Naracoorte. Derek and I had been good mates, and apparently Heath and Bruce had “Gotten on okay.” The best way for us to live together in relative co-operation was to be somewhat competitive. For instance, if you wanted to do a load of washing, you had to do your own. The trick was, if there wasn’t room on the line for your clothes, too bad. That meant that we used to actually get up earlier than expected on weekends to do a load of washing. Of course, Uni was not all day every day, so you would often have time to do a wash during the week. But the rule still held. I remember a clash between Bruce and someone (it was always between Bruce and someone else…) to do with the moving of clothes on the line. Nasty stuff. The other way we used to compete was with cooking/dishes. We were in ‘teams’, Derek and myself, Heath and Bruce (poor Heath), and each team took it in turns cooking. Co-operative. Washing up was another matter. Each team took it in turns to wash up, with ideally the team who had not cooked having to wash up. But if they left it a day, then they had to wash up tomorrow, and do both days washing. Whilst sometimes we went a few days with a sink full of dirty dishes, the teamwork that your team-mate would coerce you into washing up, or you would them. Everything ran pretty smoothly for the first year, but eventually we got sick of Bruce, and kicked him out. Instead we got Derek’s younger brother, Dane (or Moper, as he was called) as our new house-mate. I seem to recall the system breaking down somewhat then.