Efficiently extract lines between markers

sed -n '/START/,${/STOP/q;p}'
GNU Sed can 'address' between two regex, but it continues parsing through to the end of the file. This slight alteration causes it to terminate reading the input file once the STOP match is made. In my example I have included an extra '/START/d' as my 'start' marker line contains the 'stop' string (I'm extracting data between 'resets' and using the time stamp as the 'start'). My previous coding using grep is slightly faster near the end of the file, but overall (extracting all the reset cycles in turn) the new SED method is quicker and a lot neater.
Sample Output
Match near start of file
--
$ time (sed -n '/,166499.248,/,/!./{/166499/d;/RTKCOMMAND/q;/#BESTPOS/p}' ../UUT1.ASC > temp_time_file)

real    0m0.093s
user    0m0.030s
sys     0m0.030s

$  time (sed -n '/,166499.248,/,/#RTKCOMMAND/{/#BESTPOS/p}' ../UUT1.ASC > temp_time_file)

real    0m2.015s
user    0m0.484s
sys     0m0.061s

$ time (grep -A 2500 -m 1 ",166499.248," ../UUT1.ASC | grep -A 2500 -m  1 '\#RTKCOMMAND' | grep '\#BESTPOS' > temp_time_file)

real    0m0.171s
user    0m0.122s
sys     0m0.060s
--

Match near end of file
--
$ time (sed -n '/,230399.072,/,/#RTKCOMMAND/{/#BESTPOS/p}' ../UUT1.ASC > temp_time_file)

real    0m2.000s
user    0m0.390s
sys     0m0.046s

$ time (sed -n '/,230399.072,/,/!./{/,230399.072,/d;/RTKCOMMAND/q;/#BESTPOS/p}' ../UUT1.ASC > temp_time_file)

real    0m2.015s
user    0m0.468s
sys     0m0.076s

$ time (grep -A 2500 -m 1 ",230399.072," ../UUT1.ASC | grep -A 2500 -m  1 '\#RTKCOMMAND' | grep '\#BESTPOS' > temp_time_file)

real    0m1.703s
user    0m0.137s
sys     0m0.076s
--

3
By: mungewell
2009-06-19 15:27:36

These Might Interest You

  • Using sed to extract lines in a text file If you write bash scripts a lot, you are bound to run into a situation where you want to extract some lines from a file. Yesterday, I needed to extract the first line of a file, say named somefile.txt. cat somefile.txt Line 1 Line 2 Line 3 Line 4 This specific task can be easily done with this: head -1 somefile.txt Line 1 For a more complicated task, like extract the second to third lines of a file. head is inadequate. So, let's try extracting lines using sed: the stream editor. My first attempt uses the p sed command (for print): sed 1p somefile.txt Line 1 Line 1 Line 2 Line 3 Line 4 Note that it prints the whole file, with the first line printed twice. Why? The default output behavior is to print every line of the input file stream. The explicit 1p command just tells it to print the first line .... again. To fix it, you need to suppress the default output (using -n), making explicit prints the only way to print to default output. sed -n 1p somefile.txt Line 1 Alternatively, you can tell sed to delete all but the first line. sed '1!d' somefile.txt Line 1 '1!d' means if a line is not(!) the first line, delete. Note that the single quotes are necessary. Otherwise, the !d will bring back the last command you executed that starts with the letter d. To extract a range of lines, say lines 2 to 4, you can execute either of the following: sed -n 2,4p somefile.txt sed '2,4!d' somefile.txt Note that the comma specifies a range (from the line before the comma to the line after). What if the lines you want to extract are not in sequence, say lines 1 to 2, and line 4? sed -n -e 1,2p -e 4p somefile.txt Line 1 Line 2 Line 4 Show Sample Output


    0
    sed -n -e 1186,1210p A-small-practice.in
    evandrix · 2011-06-04 10:53:46 0
  • It extracts X number of lines from file1 and dumps them to file2.Using grep with the empty string '' extracts the complete lines (i.e. no filtering takes place) and the -m flag indicates how many lines to extract out from the given file. Then using the redirect > operator we send the extracted lines to a new file.


    -4
    grep '' -m X file1 > file2
    sardanapalos · 2009-03-22 04:34:43 6
  • sed extract every nth line. Generic is: sed -n 'STARTPOSITION,${p;n;*LINE}' foo where n;*LINE = how many lines. thus p;n;n; is "for every 3 lines" and p;n;n;n;n; is "for every 5 lines" Show Sample Output


    1
    sed -n '1,${p;n;n;}' foo > foo_every3_position1; sed -n '2,${p;n;n;}' foo > foo_every3_position2; sed -n '3,${p;n;n;}' foo > foo_every3_position3
    oshazard · 2010-01-08 04:19:59 0
  • You can actually do the same thing with a combination of head and tail. For example, in a file of four lines, if you just want the middle two lines: head -n3 sample.txt | tail -n2 Line 1 --\ Line 2 } These three lines are selected by head -n3, Line 3 --/ this feeds the following filtered list to tail: Line 4 Line 1 Line 2 \___ These two lines are filtered by tail -n2, Line 3 / This results in: Line 2 Line 3 being printed to screen (or wherever you redirect it).


    0
    head -n1 sample.txt | tail -n1
    gtcom · 2011-06-14 17:45:04 0

What Others Think

if you want the line with the STOP marker output as well you can use sed -n '/START/,/!./{p;/STOP/q}'
mungewell · 469 weeks and 4 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: