-1

I have a set of URLs and I'm really only interested in anything up to the first /.

How can I capture this info to a text file?

Input (foo.txt):

apple.com/nothing.js  
t1.msn.com/cookie=22  
happy.net/whatever

Output (redirected to file: foo_filter.txt):

apple.com/  
t1.msn.com/  
happy.net/  
αғsнιη
  • 40,939
  • 15
  • 71
  • 114
Scott
  • 1
  • 1
  • do you have URLs such as `unix.stackexchange.com`? and not something `unix.stackexchange.com/questions` for example. then what output you are expecting? skip those lines or print the URL? – αғsнιη Feb 10 '23 at 04:02
  • and what about for URLs including `https://` like [`https://unix.stackexchange.com/help/someone-answers`](https://unix.stackexchange.com/help/someone-answers)? – αғsнιη Feb 10 '23 at 04:13
  • @αғsнιη I don't think this is a duplicate. Here, the obvious pattern match is to be included in the result, which can be done trivially because it's a single character. In the other answer the obvious pattern match is to be excluded from the result. Not the same at all – roaima Feb 10 '23 at 07:22
  • 1
    @roaima these are same. there and also here OPs want to cut off upto specifc patterns (here up to first `/` string, there up to `.com` string) and in both they wanted that strings to be included in the result. so no doubt from me that these are exact duplicates. – αғsнιη Feb 10 '23 at 07:31
  • @αғsнιη no, here [they want](https://unix.stackexchange.com/questions/734952/output-everything-before-the-first-slash-in-a-line?noredirect=1#comment1395232_734957) the split character, there they do not. – roaima Feb 10 '23 at 07:44
  • 1
    @roaima there also they wanted the split string too (which that is `.com`). please read the Q there slowly then you will find it. in addition you can compare your answer with answers there. both Q and A are duplicates except you have an extra cut approach which that answer even is not what OP wanted here but later approaches (printing the split character) is – αғsнιη Feb 10 '23 at 07:52

6 Answers6

1

If you don't want the trailing slash, it's very straightforward

cut -d/ -f1 foo.txt
awk -F/ '{print $1}' foo.txt
sed 's!/.*!!' foo.txt

If you do want that trailing slash, then

awk -F/ '{print $1 "/"}' foo.txt
sed 's!/.*!/!' foo.txt

All of these will write to stdout (your screen) so you can see the result immediately. To redirect them to your target file, use >foo_filter.txt on the end of the command. For example,

awk -F/ '{print $1 "/"}' foo.txt >foo_filter.txt
roaima
  • 107,089
  • 14
  • 139
  • 261
1

Using Miller:

mlr --nidx --ifs '/' -N cut -f 1 file

or using GNU datamash:

datamash dirname 1 <file
αғsнιη
  • 40,939
  • 15
  • 71
  • 114
Prabhjot Singh
  • 1,276
  • 1
  • 4
  • 16
1
$ awk 'sub("/.*","/")' foo.txt
apple.com/
t1.msn.com/
happy.net/
Ed Morton
  • 28,789
  • 5
  • 20
  • 47
0

Your better option here is actually sed, because it edits the stream on a line by line basis.

Try the following:

sed 's/\/.*//' foo.txt > foo_filter.txt

This tells sed that - per line - replace anything after the / with nothing. You then redirect the output to the new file with the >. You can read more from the sed manual here.

Note: because sed is greedy, you might need to specify the first slash with a 1 at the end of the command:

sed 's/\/.*//1' foo.txt > foo_filter.txt

You definitely can use awk if you have strings with multiple slashes:

awk -F"/" '{print $1"/"}' foo.txt > foo_filter.txt

The -F"/" sets the field delimiter to forward slash, and '{print $1"/"}' prints the first field followed by a slash (since it's the field delimiter, it gets removed on print and has to be re-included).

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
Yehuda
  • 207
  • 3
  • 17
  • `print $1/` in awk would be a syntax error. Don't escape the `/` in your sed command, just pick a different delimiter like `:`. – Ed Morton Feb 09 '23 at 23:28
0

Here's an awk and a cut solution.

$ cut -f1 -d/ foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1 }' foo.txt
apple.com
t1.msn.com
happy.net
$ awk -F/ '{ print $1"/" }' foo.txt
apple.com/
t1.msn.com/
happy.net/
$
steve
  • 21,582
  • 5
  • 48
  • 75
0

With just :

$ grep -oE '^[^/]+/' foo.txt

Output:

apple.com/
t1.msn.com/
happy.net/

To fulfill all the requirements:

grep -oE '^[^/]+/' foo.txt | tee foo_filter.txt
Gilles Quénot
  • 31,569
  • 7
  • 64
  • 82
  • 1
    Thanks, time to learn some switches, apparently! – Scott Feb 09 '23 at 20:23
  • Advice to newcomers: If an answer solves your problem, please accept it by clicking the large check mark (✓) next to it and optionally also up-vote it (up-voting requires at least 15 reputation points). If you found other answers helpful, please up-vote them. Accepting and up-voting helps future readers. – Gilles Quénot Feb 09 '23 at 20:40