How do I use PHP’s parse_url() function to separate out the base domain from the domain extension?
How do I use PHPto extract the domain from a URL?
How do I extract the domain when it is using a ccSLD, or “country code second level domain”, such as .com.au and .co.uk?
PHP’s parse url function is quite handy, but not all powerful, and it cannot distinguish between domains and subdomains.
Separating out the sub-domain, base domain, and domain extension using PHP can be tricky. In reality what you are doing is separating out the Top Level Domain (aka. TLD, or “Domain Extension”) from the Second Level Domain (aka. SLD, or “Second Level Domain”), and anything other than that is a sub-domain. However this rule does not work universally, because many SLD’s are reserved for country specific purposes such as “.co.uk”. In fact, the domain extension “.co.uk” is a combination of a “ccTLD” (country code Top Level Domain) and a “ccSLD” (country code Second Level Domain). For example, parsing out the domain from “example.com” might seem easy, and parsing out the domain from “example.co.uk” might seem easy, but how do you do this universally for any domain?
If you are not concerned with ccTLD’s and ccSLD’s, just continue down the page to the code.
The proper solution is a bit difficult, but we can accomplish it:
The first thing you need is a comprehensive list of all TLD”s, including all ccSLD and ccTLD combinations. If you can get a comprehensive list, programming the rest is relatively easy.
The second thing you need is a way to check for updates, as you should fully expect new ccTLD’s and ccSLD’s to be introduced over time. This is necessary if you want to ensure your code doesn’t break in the future.
If you have the first piece, you can continue to writing the code. There are a lot of ways to go about it, we will just throw out one simple solution: the regular expression.
First, ensure your list of TLD’s and ccSLD’s are in a PHP array format:
$knownExtensions = array(".com", ".net",".org", ".co.uk");
Obviously you’ll need a lot more extensions than that.
Next, create a regular expression, and use it to completely remove the extension from the domain.
$regexp = "/".str_replace(".", "\.", join("|", $knownExtensions))."/";
$theDomain = preg_replace($regexp, $originalDomain);
There you have it, that’s your base domain. If you need to know which extension it was, you can just do this:
$regexp = "/$theDomain/";
$extension = preg_replace($regexp, "", $originalDomain);
Of course that might seem lame, but it just works in easily with the rest of our code. We’ll say it again: there are a LOT of ways to go about this.
You can further split the base domain out from any subdomains like this:
$allSubdomains = explode(".", $theDomain);
$baseDomain = array_pop($allSubdomains);
$subDomain = join(".", $allSubdomains);
And there you have it, your base domain without any subdomains or domain extensions.
See the following related pages: