How shall I perform multiline matching and substitution using awk?

In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example

line 1

li

ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1

line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGIN{RS="";}; { if (match($0, /[^[:digit:] ] *n/)) print $0;} ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.

– mosvy
Nov 13 '18 at 17:02

add a comment |

line 1

li

ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1

line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGIN{RS="";}; { if (match($0, /[^[:digit:] ] *n/)) print $0;} ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.

– mosvy
Nov 13 '18 at 17:02

add a comment |

line 1

li

ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1

line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGIN{RS="";}; { if (match($0, /[^[:digit:] ] *n/)) print $0;} ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

line 1

li

ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1

line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGIN{RS="";}; { if (match($0, /[^[:digit:] ] *n/)) print $0;} ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

text-processing awk gawk

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

asked Nov 13 '18 at 16:02

Tim

26.4k75248457

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.

– mosvy
Nov 13 '18 at 17:02

add a comment |

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.

– mosvy
Nov 13 '18 at 17:02

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.

– mosvy
Nov 13 '18 at 17:02

add a comment |

4 Answers
4

active

oldest

votes

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])n/,"\1","g",$0)}' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

|
show 8 more comments

I would address it differently: by looping over the input until you find a "line-ending condition":

awk '{ 

       line=$0; 

       while($0 !~ /[[:digit:]] *$/ && getline > 0) { 

         line=line$0; 

       }

       print line

     }' < input

On an extended input file of:

line 1

li

ne 2

li

ne 

number 3

line 4

Or, more verbosely (to see the trailing space):

$ cat -e input

line 1$

li$

ne 2$

li$

ne $

number 3$

line 4$

The output is:

line 1

line 2

line number 3

line 4

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

add a comment |

$ cat file

line 1

li

ne 2

lo

ng li

ne 3

$ awk 'line ~ /[0-9]$/ { print line; line = "" } { line = line $0 } END { print line }' file

line 1

line 2

long line 3

This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/{ p; d; }; N; s/n//' -e 'tagain' file

line 1

line 2

long line 3

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

add a comment |

Small GNU sed?

sed ':L; /[0-9] *$/!{N; bL;}; s/n//g' file

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481498%2fhow-shall-i-perform-multiline-matching-and-substitution-using-awk%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])n/,"\1","g",$0)}' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

|
show 8 more comments

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])n/,"\1","g",$0)}' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

|
show 8 more comments

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])n/,"\1","g",$0)}' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

You could run something along the lines of

awk 'BEGIN{RS=SUBSEP; ORS="" } {print gensub(/([^0-9])n/,"\1","g",$0)}' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

edited Nov 14 '18 at 9:33

answered Nov 13 '18 at 17:24

JJoao

7,1691928

answered Nov 13 '18 at 17:24

JJoao

7,1691928

answered Nov 13 '18 at 17:24

JJoao

7,1691928

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

|
show 8 more comments

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

Thanks. Do you know matching without substitution for multiline case?

– Tim
Nov 13 '18 at 21:29

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?

– Tim
Nov 13 '18 at 22:30

Is RS="f" also a working solution?

– Tim
Nov 13 '18 at 22:40

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.

– Kusalananda
Nov 13 '18 at 22:40

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.

– Kusalananda
Nov 14 '18 at 9:36

|
show 8 more comments

I would address it differently: by looping over the input until you find a "line-ending condition":

awk '{ 

       line=$0; 

       while($0 !~ /[[:digit:]] *$/ && getline > 0) { 

         line=line$0; 

       }

       print line

     }' < input

On an extended input file of:

line 1

li

ne 2

li

ne 

number 3

line 4

Or, more verbosely (to see the trailing space):

$ cat -e input

line 1$

li$

ne 2$

li$

ne $

number 3$

line 4$

The output is:

line 1

line 2

line number 3

line 4

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

add a comment |

I would address it differently: by looping over the input until you find a "line-ending condition":

awk '{ 

       line=$0; 

       while($0 !~ /[[:digit:]] *$/ && getline > 0) { 

         line=line$0; 

       }

       print line

     }' < input

On an extended input file of:

line 1

li

ne 2

li

ne 

number 3

line 4

Or, more verbosely (to see the trailing space):

$ cat -e input

line 1$

li$

ne 2$

li$

ne $

number 3$

line 4$

The output is:

line 1

line 2

line number 3

line 4

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

add a comment |

I would address it differently: by looping over the input until you find a "line-ending condition":

awk '{ 

       line=$0; 

       while($0 !~ /[[:digit:]] *$/ && getline > 0) { 

         line=line$0; 

       }

       print line

     }' < input

On an extended input file of:

line 1

li

ne 2

li

ne 

number 3

line 4

Or, more verbosely (to see the trailing space):

$ cat -e input

line 1$

li$

ne 2$

li$

ne $

number 3$

line 4$

The output is:

line 1

line 2

line number 3

line 4

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

I would address it differently: by looping over the input until you find a "line-ending condition":

awk '{ 

       line=$0; 

       while($0 !~ /[[:digit:]] *$/ && getline > 0) { 

         line=line$0; 

       }

       print line

     }' < input

On an extended input file of:

line 1

li

ne 2

li

ne 

number 3

line 4

Or, more verbosely (to see the trailing space):

$ cat -e input

line 1$

li$

ne 2$

li$

ne $

number 3$

line 4$

The output is:

line 1

line 2

line number 3

line 4

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

edited Nov 13 '18 at 19:19

qubert

5536

edited Nov 13 '18 at 19:19

qubert

5536

edited Nov 13 '18 at 19:19

qubert

5536

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

answered Nov 13 '18 at 16:25

Jeff Schaller

39.4k1054125

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

add a comment |

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.

– Tim
Nov 13 '18 at 16:50

What "multilne patterns" are you thinking of?

– RudiC
Nov 13 '18 at 17:26

add a comment |

$ cat file

line 1

li

ne 2

lo

ng li

ne 3

$ awk 'line ~ /[0-9]$/ { print line; line = "" } { line = line $0 } END { print line }' file

line 1

line 2

long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/{ p; d; }; N; s/n//' -e 'tagain' file

line 1

line 2

long line 3

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

add a comment |

$ cat file

line 1

li

ne 2

lo

ng li

ne 3

$ awk 'line ~ /[0-9]$/ { print line; line = "" } { line = line $0 } END { print line }' file

line 1

line 2

long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/{ p; d; }; N; s/n//' -e 'tagain' file

line 1

line 2

long line 3

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

add a comment |

$ cat file

line 1

li

ne 2

lo

ng li

ne 3

$ awk 'line ~ /[0-9]$/ { print line; line = "" } { line = line $0 } END { print line }' file

line 1

line 2

long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/{ p; d; }; N; s/n//' -e 'tagain' file

line 1

line 2

long line 3

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

$ cat file

line 1

li

ne 2

lo

ng li

ne 3

$ awk 'line ~ /[0-9]$/ { print line; line = "" } { line = line $0 } END { print line }' file

line 1

line 2

long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/{ p; d; }; N; s/n//' -e 'tagain' file

line 1

line 2

long line 3

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

answered Nov 13 '18 at 22:49

Kusalananda

124k16234386

add a comment |

Small GNU sed?

sed ':L; /[0-9] *$/!{N; bL;}; s/n//g' file

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

add a comment |

Small GNU sed?

sed ':L; /[0-9] *$/!{N; bL;}; s/n//g' file

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

add a comment |

Small GNU sed?

sed ':L; /[0-9] *$/!{N; bL;}; s/n//g' file

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

Small GNU sed?

sed ':L; /[0-9] *$/!{N; bL;}; s/n//g' file

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

edited Nov 13 '18 at 22:55

Kusalananda

124k16234386

answered Nov 13 '18 at 17:25

RudiC

4,2191312

answered Nov 13 '18 at 17:25

RudiC

4,2191312

answered Nov 13 '18 at 17:25

RudiC

4,2191312

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

add a comment |

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

doesn't work for me?

– andrew lorien
Nov 13 '18 at 23:27

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk