0

I'm trying to verify if a subdomain entered by a user is valid, but whatever I pass in, it's never valid. I know the regex is ok, so the problem is my "if" logic, however I'm new to shell/bash

#!/bin/bash
#

echo Enter the subdomain\'s name to configure.
read SUBDOMAIN

if [[ ! $SUBDOMAIN =~ [A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])? ]]; then
    echo "$SUBDOMAIN is not a valid domain"
fi

Examples:
Would be accepted (regular subdomain names): test
Would not be accepted (invalid subdomain name): -
Would not be accepted (invalid subdomain name): (Empty)
Would not be accepted (invalid subdomain name): #$??&@#&?$##$

I would prefer using shell, but the parentheses in the regex make the script throw an error.

I'm not sure if it can be done with grep, but I never understood how to use grep and it always confused me.

NaturalBornCamper
  • 175
  • 1
  • 1
  • 7

1 Answers1

2

If you're trying to match "alphanumeric" followed by "alphanumeric or dash", ensuring there's not a dash at the end, such that there is a total of 1..62 characters, this RE will work for you

^[[:alnum:]](([[:alnum:]]|-){0,61}[[:alnum:]])?$

This binds to the beginning and end of the string, so the RE must match the string in its entirety.

  • Start of line ^
  • A single alphanumeric, any case [[:alnum:]]
  • An optional block (bracketed (...) and terminated with ?)
    • [[:alnum:]] or a dash -, repeated 0..60 times
    • [[:alnum:]]
  • End of line $

As has been recommended in the comments under this answer, I should point out that the [[:alnum:]] range is affected by the current locale. If you want to ensure that it matches only "ASCII" A-Z, a-z and 0-9 you need to ensure it's running with LANG=C. Otherwise you may find that additional characters are accepted, such as á é ø ß and others.

roaima
  • 107,089
  • 14
  • 139
  • 261
  • Thanks friend! Your regex looks much better! I just have to change the regex a bit so subdomains can't end with a dash as well and It's all good :) – NaturalBornCamper Apr 30 '18 at 16:21
  • @NaturalBornCamper that's actually a little more complicated than it sounds – roaima Apr 30 '18 at 16:23
  • Nope, what you gave me got me started, I just changed your answer a bit and it's working: if [[ ! $SUBDOMAIN =~ ^[[:alnum:]]([[:alnum:]]|-){0,61}[[:alnum:]]$ ]]; – NaturalBornCamper Apr 30 '18 at 16:26
  • @NaturalBornCamper that will fail with a single character entry. It will also accept a 63 character string. Please see the amended answer for my suggestion. – roaima Apr 30 '18 at 16:27
  • Oh wow you're right, I totally missed that, thanks heaps! – NaturalBornCamper Apr 30 '18 at 16:32
  • A subdomain like `aábé` will be accepted in a default utf8 locale. –  May 01 '18 at 04:13
  • @Isaac that's good. IDNs are permitted these days. – roaima May 01 '18 at 06:13
  • @roaima [From RFC 5890](https://tools.ietf.org/html/rfc5890) *4.6. Legacy IDN Label Strings The URI Standard [RFC3986] and a number of application specifications (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII labels in DNS names used with those protocols, i.e., only the A-label form of IDNs is permitted in those contexts.* It sounds reasonable to limit to ASCII labels (even those IDN punycode strings that expand to Unicode characters). Or, at least, for web pages (HTTP) name addresses (more than 95% of internet on present days). –  May 01 '18 at 07:17
  • @isaac it's probably right to mention that explicitly, but as we don't know the OP's application I don't believe we should assume too much about the intended use. – roaima May 01 '18 at 07:31
  • 1
    @roaima Since you are writing an answer about what you do know it follows that it is reasonable that you should make a note about the a-z ranges matching many UNICODE characters and not leave that hidden. –  May 01 '18 at 07:40