How to generate a new file by extracting some parts?

Question

My initial logs file extract is as follows:

b6227|—|  Thermometer:  CRC matched: computed: 36 == read: 36
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5537%
b6227|—| SocEvaluator:  Final SoC is 64.5537%
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%

From which I'd like to create a new csv for rendering curves in Excel. The CSV file should look like this:

64.5537

I tried this but didn't make it:

sed -nr 's/ Final SoC is (\d\.\d%)/\1/gp' ~/extremeCold.20220926.log > final.csv

What's wrong?

[UPDATE]

I am running on macOS Monterey (v12.5)

The very number to keep is 64.5537 from the line b6227|—| SocEvaluator: Final SoC is 64.5537% (for you guys having read the before update question, forget about the 64.5536. It was simply the next available SoC).

Please [edit] your question and explain, in words as well as by example, what numbers should be chosen from the file. Where does the `64.5536` come from? Should we only look at lines containing the string `Final SoC`? Do we _need_ to use ? And what operating system are you using? We need to know to know what tools are available. — terdon, Sep 27 '22 at 09:33
I assume then, the second line of your desired output should actually be `64.5552`, not `64.5557`? — AdminBee, Sep 27 '22 at 10:35
Nope, one must only keep `b6227|—| SocEvaluator: Final SoC is ` pattern (aka 64.5537 here)and ignore the others lines — Stéphane de Luca, Sep 27 '22 at 10:46
2 lines, each containing a number, isn't a CSV, it's just a text file. A CSV would have Comma Separated Values on each line. It's not at all obvious why your output would not include `64.5552` given the last line of input is `b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%`. — Ed Morton, Sep 27 '22 at 22:54
You seem to only have one line matching your requirement(s), yet you show two lines of predicted (filtered) output? — jubilatious1, Sep 29 '22 at 04:51

Kusalananda · Answer 1 · 2022-09-29T07:23:22.000

This is using sed to match the wanted line(s) and then chop off everything up to the last space:

sed -e '/SocEvaluator:.*Final SoC is [[:digit:]]/!d' -e 's/.* //' file

For the given data, this would output

64.5537%

To remove the % character:

sed -e '/SocEvaluator:.*Final SoC is [[:digit:]]/!d' -e 's/.* //' -e 's/%$//' file

Using awk with exactly the same detecting regular expression, and then printing the last field on each line that this expression matches:

awk '/SocEvaluator:.*Final SoC is [[:digit:]]/ { print $NF }' file

Removing the % sign before printing:

awk '/SocEvaluator:.*Final SoC is [[:digit:]]/ { sub("%$","",$NF); print $NF }' file

The regular expression

SocEvaluator:.*Final SoC is [[:digit:]]

... would match any line that contains the text SocEvaluator: followed later by the text Final SoC is and a digit.

Note that sed and awk on macOS do not understand Perl-compatible regular expressions like \d. Related to this point: Why does my regular expression work in X but not in Y?

GNU `sed` doesn't recognise `\d` either. ast-open's `sed` does (and `-r`, initially a GNUism, BSD and soon-standard equivalent being `-E`). — Stéphane Chazelas, Sep 29 '22 at 07:19
@StéphaneChazelas Thanks, I tweaked that point a bit in response to your comment. I'm a bit unfamiliar what expressions GNU tools usually support. — Kusalananda, Sep 29 '22 at 07:24

score 1 · Accepted Answer · edited Sep 29 '22 at 10:51

1

Using sed on Monterey 12.6

$ sed -En '/.* SocEvaluator:  Final SoC is ([0-9.]+)%.*$/s//\1/pwfinal.csv' input_file
64.5537
$ cat final.csv
64.5537

edited Sep 29 '22 at 10:51

Stéphane de Luca

163
7

answered Sep 27 '22 at 10:45

sseLtaH

2,706
1
6
19

One thing to note: I imported the source file from a colleague working on Windows. I noticed the file is CRLF terminated, which makes the sed not woking. I had to replace by LF. Is there a way to tell sed to handle CRLF as the end of line? – Stéphane de Luca Sep 29 '22 at 09:56
@StéphanedeLuca Unfortunately, I cannnot answer that at this time without importing a similar file myself and testing as I use WSL Linux for files imported from Windows and not the mac. Have you managed to find a solution for the line endings? – sseLtaH Sep 29 '22 at 10:57

score 1 · Answer 3 · answered Sep 29 '22 at 05:42

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put $<> if m/ "Final SoC is " <(\d* \. \d*)> /;'  file

#OR
 
~$ raku -ne 'put $0 if m/ "Final SoC is " (\d* \. \d*) /;' file

Sample Input:

b6227|—|  Thermometer:  CRC matched: computed: 36 == read: 36
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5537%
b6227|—| SocEvaluator:  Final SoC is 64.5537%
b6227|—| SocEvaluator: Final SoC is following to midle range, aka: SoC 64.5552%

Sample Output:

64.5537

In the first example, Raku's capture markers <( … )> are used to drop the "Final SOC..." text from the match object, and the remaining capture is output using the $<> (or synonymous $/) match variable, subject to an if conditional.

In the second example, parentheses are used to capture a portion of the match into match-variable $<>.[0] which is the same as match-variable $/.[0] which is the same as $0. This $0 capture is output, subject to an if conditional.

https://raku.org

How to generate a new file by extracting some parts?

3 Answers3