4

I have many Markdown files with a YAML metadata block at the top:

---
title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false
---

**title**

a piece of indertimate  
length that could be a few lines of many hundreds

---[author name](author link)
---found in [source](source link)

I am trying to find a way to extract the metadata block (so I can feed it into yamllint, but later other things as well). Awk seems like the right tool, but hacking together in awk when I don't understand it, the best I have come up with is this:

awk '/^---$/ {printline = 1; print; next} /^---$/ {printline = 0} printline'

This just shows the whole file, my attempt to limit to the --- lines on their own isn't working (and maybe nothing else here is)!

5 Answers5

6

sed may be a better choice of tool. Since this is always at the top of the file you can tell sed to only print from line 2 until the --- line, and then delete that line.

eg

sed -n '2,/^---$/ {/^---$/d; p}'

This works by:

-n: don't print by default
2,/^---$/ { ... }: limit the next part to the lines between line 2 and the first line matching ---

And then inside that block

/^----$/d : delete the --- line
p : print what remains.

So on your test file the output is

title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false

To do it in awk would require tracking state. e.g

awk '
/^---$/ && printline==0 {printline = 1;next}
/^---$/ {printline = 2; next}
{ if (printline==1) { print } }
'

This state would be required just in case you had a --- line in the rest of the file.

Stephen Harris
  • 42,369
  • 5
  • 94
  • 123
5

As the metadata block is delimited by --- as a separator, we can set the record separator/RS to this.

With the assumption that the metadata block is always at the beginning of the yaml file and separated by ---, it is safe to also assume that the second record will be the metadata block, so we can say:

➜ awk 'BEGIN { RS = "---" } NR==2' example-yaml.txt 

title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false

to extract it.

awk defaults to newlines as a record separator but (at least in GNU awk, other awk implementations may differ as I haven't verified this for other than gawk) it can be set to any/multiple characters as well as regex (see awk split records (The GNU Awk User’s Guide) for reference).

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
wmey
  • 51
  • 3
4

With a one-liner:

$ perl -0ne 'print $1 if /^---\n(.*)^---$/ms' file.yaml  
title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false
Gilles Quénot
  • 31,569
  • 7
  • 64
  • 82
4

With :

$ awk 'BEGIN{printline=2} /^---$/{printline--; next} printline' file.yaml
title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false

printline at the end of the command is a kind of print statement, as far as awk default behavior on a positive integer is to print the whole line. At the beginning, the printline variable is set to 2 and then decremented each times the regex ^---$ match. So when printline is zero, awk stop printing.

Gilles Quénot
  • 31,569
  • 7
  • 64
  • 82
4

Using the editor to delete everything from the second YAML document separator --- to the end of the document, and then display what remains:

$ printf '%s\n' '1;/^---$/,$d' ,p Q | ed -s file
---
title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false

Reversing all the lines, deleting up to the first --- delimiter (which will be the one at the end of the YAML section, and then reversing the lines again:

$ tail -r file | sed '1,/^---$/d' | tail -r
---
title: title of work
author: author name
author-sort: name, author
published: N
date: XXXX-XX-XX
pub-number: XXXXX
embedded-title: false

If your implementation of does not have the -r option (available on macOS and other BSD systems), you may use GNU instead.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936