Bash subdomain regex validation never matches

Question

I'm trying to verify if a subdomain entered by a user is valid, but whatever I pass in, it's never valid. I know the regex is ok, so the problem is my "if" logic, however I'm new to shell/bash

#!/bin/bash
#

echo Enter the subdomain\'s name to configure.
read SUBDOMAIN

if [[ ! $SUBDOMAIN =~ [A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])? ]]; then
    echo "$SUBDOMAIN is not a valid domain"
fi

Examples:
Would be accepted (regular subdomain names): test
Would not be accepted (invalid subdomain name): -
Would not be accepted (invalid subdomain name): (Empty)
Would not be accepted (invalid subdomain name): #$??&@#&?$##$

I would prefer using shell, but the parentheses in the regex make the script throw an error.

I'm not sure if it can be done with grep, but I never understood how to use grep and it always confused me.

Likely related: [Bash =~ regex and https://regex101.com/](https://unix.stackexchange.com/questions/421460/bash-regex-and-https-regex101-com) — steeldriver, Apr 30 '18 at 15:51
@steeldriver I checked it out but "set -o rematchpcre" doesn't work — NaturalBornCamper, Apr 30 '18 at 16:06
@roaima Because subdomains can contain dashes for example, but cannot start with a dash — NaturalBornCamper, Apr 30 '18 at 16:12

roaima · Accepted Answer · 2018-05-01T12:35:15.837

2

If you're trying to match "alphanumeric" followed by "alphanumeric or dash", ensuring there's not a dash at the end, such that there is a total of 1..62 characters, this RE will work for you

^[[:alnum:]](([[:alnum:]]|-){0,61}[[:alnum:]])?$

This binds to the beginning and end of the string, so the RE must match the string in its entirety.

Start of line ^
A single alphanumeric, any case [[:alnum:]]
An optional block (bracketed (...) and terminated with ?)
- [[:alnum:]] or a dash -, repeated 0..60 times
- [[:alnum:]]
End of line $

As has been recommended in the comments under this answer, I should point out that the [[:alnum:]] range is affected by the current locale. If you want to ensure that it matches only "ASCII" A-Z, a-z and 0-9 you need to ensure it's running with LANG=C. Otherwise you may find that additional characters are accepted, such as á é ø ß and others.

edited May 01 '18 at 12:35

answered Apr 30 '18 at 16:16

roaima

107,089
14
139
261

Thanks friend! Your regex looks much better! I just have to change the regex a bit so subdomains can't end with a dash as well and It's all good :) – NaturalBornCamper Apr 30 '18 at 16:21
@NaturalBornCamper that's actually a little more complicated than it sounds – roaima Apr 30 '18 at 16:23
Nope, what you gave me got me started, I just changed your answer a bit and it's working: if [[ ! $SUBDOMAIN =~ ^[[:alnum:]]([[:alnum:]]|-){0,61}[[:alnum:]]$ ]]; – NaturalBornCamper Apr 30 '18 at 16:26
@NaturalBornCamper that will fail with a single character entry. It will also accept a 63 character string. Please see the amended answer for my suggestion. – roaima Apr 30 '18 at 16:27
Oh wow you're right, I totally missed that, thanks heaps! – NaturalBornCamper Apr 30 '18 at 16:32
A subdomain like `aábé` will be accepted in a default utf8 locale. – May 01 '18 at 04:13
@Isaac that's good. IDNs are permitted these days. – roaima May 01 '18 at 06:13
@roaima [From RFC 5890](https://tools.ietf.org/html/rfc5890) *4.6. Legacy IDN Label Strings The URI Standard [RFC3986] and a number of application specifications (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII labels in DNS names used with those protocols, i.e., only the A-label form of IDNs is permitted in those contexts.* It sounds reasonable to limit to ASCII labels (even those IDN punycode strings that expand to Unicode characters). Or, at least, for web pages (HTTP) name addresses (more than 95% of internet on present days). – May 01 '18 at 07:17
@isaac it's probably right to mention that explicitly, but as we don't know the OP's application I don't believe we should assume too much about the intended use. – roaima May 01 '18 at 07:31
1

@roaima Since you are writing an answer about what you do know it follows that it is reasonable that you should make a note about the a-z ranges matching many UNICODE characters and not leave that hidden. – May 01 '18 at 07:40

Bash subdomain regex validation never matches

1 Answers1