1

This is a homework question that I have only been able to partially solve.

I want to use grep to find words in a list that contain three 'e's separated by 't's, and also do an accent insensitive search.

The closest I could get with regular expressions was this:

grep 'e.*t.*e.*t.*e' mylist

I get two issues with this:

  1. I don't understand how to do an accent insensitive search with a pattern like this. I've only heard of equivalence class operators recently and I don't know how to include them in the syntax of my search.
  2. The matched patterns I get with this search do not include repeating 't's.
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
Mndx
  • 13
  • 4
  • 1
    Does "_accent insensitive_" mean `ü` matches `u`, or that it matches `ue`? – roaima Mar 10 '19 at 14:27
  • Edited. Yes, I'm looking to do a search that ignores accents. Let's say I'm looking for all `e`s, then I should be able to find `éèêë` as well. – Mndx Mar 10 '19 at 14:36
  • Looks like you were on the right lines. Would `grep '[eéêèë]'` (etc.) work for you? – roaima Mar 10 '19 at 14:39
  • It probably would, but isn't there a regular expression that does this? – Mndx Mar 10 '19 at 14:41

1 Answers1

3

If your regex engine supports them, you can essentially just replace the character e by its equivalence class [[=e=]]

Ex.

$ grep -m 10 '[[=e=]].*t.*[[=e=]].*t.*[[=e=]]' /usr/share/dict/french
absentéiste
absentéistes
anesthésiste
anesthésistes
cafés-théâtres
café-théâtre
casse-tête
centimètre
centimètres
centripète

See Collating Sequences and Character Equivalents

steeldriver
  • 78,509
  • 12
  • 109
  • 152
  • Ok that works. This is what I was looking for. For some reasons, I thought the results I had with my initial search ignored cases in which patterns had two `t`s. Might have overlooked it. – Mndx Mar 10 '19 at 14:45