3

I have some titles that start with # (as they are markdown), and I have the two following rules:

  • titles (#) should have exactly two newlines lines above and one underneath
  • subtitles (##, ### and so forth) should have exactly one blank line above and one below.
  • Titles should takes precedence over subtitles. (If there is two conflicting rules, use the title formatting and ignore subtitles).

NOTE: I'm trying to find all titles that does not conform to these three restrictions.

Below is some examples of good and bad titles

some text 
# Title     | BAD 

## Subtitle | Good (Has two spaces below, is needed for next main title)


# Title     | Good

## Subtitle | Bad
text  

# Title     | Bad

text

After fiddling around with regexp I came up with these expressions:

Main titles: Regexr

((?<=\n{4})|(?<=.\n{2})|(?<=.\n))(# .*)|(# .*)(?=(\n.|\n{3}(?!# )|\n{4}))

Subtitles: Regex

'((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)'

However to my great confusion they don't work with pcregrep? Here is the command I tried to run with pcgrep (just for completeness sake):

$ pcregrep -rniM --include='.*\.md' \
     '((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)' \
     ~/Programming/oppgaver/src/web

It doesn't work either when I try to just search one file either and I have a couple other expressions that work just fine.

Is there something wrong with my regex, or is it a faulty implementation?

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
  • I kinda sorta see what you're trying to do but this approach is making my head hurt 8-). Which usually means, from my experience, that I'm doing something either wrong or the "hard way". Just sayin...Keep at it though! – slm Jul 15 '18 at 19:26
  • This also typically means we're into a XY problem, maybe restate what you're trying to accomplish without telling us how to do it? Just a thought. – slm Jul 15 '18 at 19:27
  • 3
    ... a classic case of [Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.](http://regex.info/blog/2006-09-15/247) perhaps :) – steeldriver Jul 15 '18 at 19:28
  • 3
    My problem is not the regex, it does what I want it to do. It passes all my test cases, however in the shell it simply outputs nothing... @slm Perhaps an easier solution would be to use `perl` or something to purge the file for all excess white space, then insert white space matching the conditions above? @steeldriver Hahaha. That very well might be the case. Feel free to suggest other ways than regex to solve the problem though =) – Øistein Søvik Jul 15 '18 at 19:34
  • You have a couple of lines of `text` in your examples. Are there any place where such text, with no preceding `#` or `##` would be ok? – Kusalananda Jul 15 '18 at 20:00
  • I remember from the top of my head that there are two different pcregrep implementations, *I think*. – Rui F Ribeiro Jul 15 '18 at 20:09
  • @Kusalananda Yes, the file might start with text. – Øistein Søvik Jul 15 '18 at 20:21
  • So text may only occur before the first title. What about consecutive titles or subtitles? – Kusalananda Jul 15 '18 at 20:45
  • Consecutive titles or subtitles is also fine. – Øistein Søvik Jul 15 '18 at 20:48
  • Consecutive subtitles wouldn't be able to have both one line above and two lines below. It would need to be either one line between each subtitle, or two lines. Similarly for titles. – Kusalananda Jul 15 '18 at 20:50
  • Are you in fact wanting one empty line before a subtitle, but always two lines before a title? That would be much easier to accomplish. – Kusalananda Jul 15 '18 at 20:53
  • Subtitles should have ONE line below not two. I will edit my post accordingly. If it is two titles two spacings is fine. – Øistein Søvik Jul 15 '18 at 20:55
  • 1
    I fully agree with the folks mentioning the XY problem. This could be solved with Perl in a way that is more easily understandable by those reading it. – Tim Jul 16 '18 at 03:39
  • 3
    All of the requirements should be in the question, not in the comments. – Jeff Schaller Jul 16 '18 at 11:04
  • @ØisteinSøvik - can you please incorporate the new requirements that you're defining in the comments here back into the Q? – slm Jul 16 '18 at 14:59
  • Check the edit of my answer, I found `sed` solution. – MiniMax Jul 16 '18 at 18:17

1 Answers1

1

This solution fixes all unright titles.

sed -r '
    :loop; N; $!b loop

    s/\n+(#[^\n]+)/\n\n\1/g

    s/(#[^\n]+)\n+/\1\n\n/g

    s/\n+(#[^\n#]+)/\n\n\n\1/g
' input.txt;

With comments:

sed -r '
    ### put all file into the pattern space,
    # in other words, merge all lines into one line
    :loop; N; $!b loop;

    ### first traversal of the pattern space
    # searches the line with "#" sign (all cases matches - Titles, SubTitles, etc),
    # takes all its upper empty lines
    # and converts them to the one empty line 
    s/\n+(#[^\n]+)/\n\n\1/g;


    ### second traversal of the pattern space
    # again, searches the line with "#" sign, take all its bottom empty lines
    # and converts them to the one empty line 
    s/(#[^\n]+)\n+/\1\n\n/g;

    ### third traversal of the pattern space
    # searches the single "#" sign (Titles only),
    # takes all its upper newlines (at this moment only two of them are there,
    # because of previous substitutions) 
    # and converts them to three newlines 
    s/\n+(#[^\n#]+)/\n\n\n\1/g
' input.txt

Input

text
# Title
## SubTitle
### SubSubTitle
# Title
## SubTitle
text
### SubSubTitle
# Title
# Title
# Title
## SubTitle
### SubSubTitle

Output

text


# Title

## SubTitle

### SubSubTitle


# Title

## SubTitle

text

### SubSubTitle


# Title


# Title


# Title

## SubTitle

### SubSubTitle
MiniMax
  • 4,025
  • 1
  • 17
  • 32