Monday, May 14, 2012

Merge two files of different lengths in Python

I have two files with with the same number of columns, but a different number of rows. One file is a list of timestamps and a list of words, the second file is a list of timestamps with a list of sounds in each of the words, i.e.,:

9640 12783 she
12783 17103 had


9640 11240 sh
11240 12783 iy
12783 14078 hv
14078 16157 ae
16157 16880 dcl
16880 17103 d

I want to merge these two files and create a list of entries with the word as one value, and the phonetic transcription as the other, i.e.,:

[['she', 'sh iy']
['had', 'hv ae dcl d']

I'm a complete Python (and programming) noob, but my original idea was to do this by searching the second file for the second field in the first file, and then appending them into a list. I tried doing it this way:

word = open('SA1.WRD','r')
phone = open('SA1.PHN','r')
word_phone = []

for line in word.readlines():
words = line.split()
word = words[2]

for line in phone.readlines():
phones = line.split()
phone = phones[2]
if int(phones[1]) <= int(words[1]):

print word_phone

This is the output:

['she', 'had', 'your', 'dark', 'suit', 'in', 'greasy', 'wash', 'water', 'all', 'year', 'sh', 'iy', 'hv', 'ae', 'dcl', 'd', 'y', 'er', 'dcl', 'd', 'aa', 'r', 'kcl', 'k', 's', 'uw', 'dx', 'ih', 'ng', 'gcl', 'g', 'r', 'iy', 's', 'iy', 'w', 'aa', 'sh', 'epi', 'w', 'aa', 'dx', 'er', 'q', 'ao', 'l', 'y', 'iy', 'axr']

As I said, I'm a total noob, and some suggestions would be very helpful.

No comments:

Post a Comment