It’s been a long time since I’ve had the bandwidth to write up a code snippet here. This morning I had not quite enough time between Zoom meetings to tackle something more involved, so here goes!
In this case I needed to find ~200 sequence (fasta) files for a student in my lab. They were split across several sequencing runs, and for various logistical reasons it was getting a bit tedious to find the location of each sequence file. To solve the problem I wrote a short Python script to wrap the Linux locate
command and copy all the files to a new directory where they could be exported.
First, I created a text file “files2find.txt” with text uniquely matching each file that I needed to find. One of the great things about locate
is that it doesn’t need to match the full file name.
head files2find.txt 151117_PAL_Sterivex_1 151126_PAL_Sterivex_2 151202_PAL_Sterivex_3 151213_PAL_Sterivex_4 151225_PAL_Sterivex_5 151230_PAL_Sterivex_6 160106_PAL_Sterivex_7 160118_PAL_Sterivex_9 160120_PAL_Sterivex_10 160128_PAL_Sterivex_11
Then the wrapper:
import subprocess import shutil with open('files2find.txt') as file_in: for line in file_in: line = line.rstrip() ## Here we use the subprocess module to run the locate command, capturing ## standard out. temp = subprocess.Popen('locate ' + line, shell = True, executable = '/bin/bash', stdout = subprocess.PIPE) ## The communicate method for object temp returns a tuple. First object ## in the tuple is standard out. locations = temp.communicate()[0] locations = locations.decode().split('\n') ## Thank you internet for this one-liner, Python one-liners always throw ## me for a loop (no pun intended). Here we search all items in the locations ## list for a specific suffix that identifies files that we actually want. ## In this case our final analysis files contain "exp.fasta". Of course if ## you're certain of the full file name you could just use locate on that and ## omit this step. fastas = [i for i in locations if 'exp.fasta' in i] path = '/path/to/where/you/want/files/' found = set() ## Use the shutil library to copy found files to a new directory "path". ## Copied files are added to the set "found" to avoid being copied more than ## once, if they exist in multiple locations on your computer. for fasta in fastas: file_name = fasta.split('/')[-1] if file_name not in found: shutil.copyfile(fasta, path + file_name) found.add(file_name) ## In the event that no files are found report that here. if len(fastas) == 0: print(line, 'not found')