2

Having CSV content similar to this:

Family,URL,IP,FirstSeen
Pony,http://officeman.tk/images/admin.php,207.180.230.128,01-06-2019
Pony,http://learn.cloudience.com/ojekwaeng/yugo/admin.php,192.145.234.108,01-06-2019
Pony,http://vman23.com/ba24/admin.php,95.213.204.53,01-06-2019

I'm aware that the URL column can be selected using:

mlr --mmap --csv --skip-comments --headerless-csv-output cut -f 'URL'

How could domains be extracted using Miller w/out piping to other commands?

Desired Output:

officeman.tk
learn.cloudience.com
vman23.com
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
T145
  • 121
  • 7

3 Answers3

3

The Swiss Army knife of Miller is the put verb - which gives you access to an entire domain specific language for transforming your data. From there there are several approaches:

  • matching and capturing the domain portion of the URL with a regular expression
  • trimming the leading and trailing components using regular expressions
  • splitting the URL as a delimited string

So for example

mlr --mmap --csv --skip-comments --headerless-csv-output put -S '
  $URL =~ "https?://([^/]+)"; $Domain = "\1"
' then cut -f Domain file.csv

or

mlr --mmap --csv --skip-comments --headerless-csv-output put -S '
  m = splitnvx($URL,"/"); $Domain = m[3]
' then cut -f Domain file.csv
steeldriver
  • 78,509
  • 12
  • 109
  • 152
1

If you are OK with usage of other commands you can try awk. And the command will be something like:

awk -F\/ 'FNR!=1 {print $3}' input_file.csv

The idea is to use / as delimiter and print field 3

Romeo Ninov
  • 16,541
  • 5
  • 32
  • 44
1

running

mlr --c2n put '$m = splitnvx($URL,"/")[3]' then cut -f m input.csv

you will have

officeman.tk
learn.cloudience.com
vman23.com
aborruso
  • 2,618
  • 10
  • 26