How Can I Print Second And Last Three Lines From Multiple Text Files, In Awk Or Python?
Solution 1:
This has the advantage that the whole file is not held in memory.
awk 'NR == 2 {print}; {line1 = line2; line2 = line3; line3 = $0} END {print line1; print line2; print line3}' files*
Edit:
The following uses some code from the gawk
manual that is portable to other versions of AWK. It provides per-file processing. Note that gawk
version 4 provides BEGINFILE
and ENDFILE
rules.
#!/usr/bin/awk -ffunctionbeginfile (file) {
line1 = line2 = line3 = ""
}
functionendfile (file) {
print line1; print line2; print line3
}
FILENAME != _oldfilename \
{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
}
END { endfile(FILENAME) }
FNR == 2 {
print
}
{
line1 = line2; line2 = line3; line3 = $0
}
Save that as a file, perhaps calling it "fileparts". Then do:
chmod u+x fileparts
Then you can do:
./fileparts file1 file2 anotherfile somemorefiles*.txt
and it will output the second line and the last three lines of each file in one set of output.
Or you can modify it to output to separate files or you can use a shell loop to output to separate files:
for file in file1 file2 anotherfile somemorefiles*.txt
do
./fileparts "$file" > "$file.out"done
You can name the output files however you like. They will be text files.
Solution 2:
To avoid reading the entire file into memory at once, use a deque with a maxlen of 3 to create a rolling buffer for capturing the last 3 lines:
from collections import deque
defget2ndAndLast3LinesFrom(filename):
withopen(filename) as infile:
# advance past first linenext(infile)
# capture second line
second = next(infile)
# iterate over the rest of the file a line at a time, saving the final 3
last3 = deque(maxlen=3)
last3.extend(infile)
return second, list(last3)
You could generalize this approach to a function that would take any iterable:
def lastN(n, seq):
buf = deque(maxlen=n)
buf.extend(seq)
return list(buf)
Then you can create different length "last-n" functions using partial:
from functools import partial
last3 = partial(lastN, 3)
print last3(xrange(100000000)) # or just use range in Py3
Solution 3:
If you aren't wedded to Python or AWK for the implementation, you can do something very straightforward using your shell and the standard head/tail utilities.
for file in"$@"; dohead -n2 "$file" | tail -n1
tail -n3 "$file"done
You can also wrap this in a function or place it in a script, and then call it from within Python or AWK with subprocess.check_output() if you really want, but in such cases it may just be easier to use native methods rather than spawning an external process.
Solution 4:
This would work, but it does load the entire file in memory, which might not be ideal if your files are very large.
text = filename.readlines()
print text[2] # print second linefor i inrange(1,4): # print last three linesprint text[-i]
There are also some good alternatives discussed here.
Solution 5:
i don't know about awk but if you are using Python i guess you will need something like this
inf = open('test1.txt','rU')
lines = inf.readlines()
outf = open('Spreadsheet.ods','w')
outf.write(str(lines[1]))
outf.write(str(lines[-3]))
outf.write(str(lines[-2]))
outf.write(str(lines[-1]))
outf.close()
inf.close()
Post a Comment for "How Can I Print Second And Last Three Lines From Multiple Text Files, In Awk Or Python?"