0

I am trying to write a website blocker and the simplest thing I found is putting lines at the end of /etc/hosts. If I want ot have 1 million websites blocked that redirect to localhost would that slow something about internet browsing or something about normal OS operations.

  • So as far as I understood from the linked question, it's going to slow to some extent the loading of a site but it depends on the speed of the hard drive and the algorithm for reading which the given browser uses. – Deyan Dachev May 21 '20 at 19:24
  • Have you looked into `/etc/hosts.deny`? – DopeGhoti May 21 '20 at 20:15

1 Answers1

1

The same as for any other flat file database.

This is actually a general thing.

The problem is inherent in the file format, and applies as much to /etc/passwd, /etc/group, /etc/shells. /etc/phones, /etc/ttys, /etc/services, /etc/fstab, and others as to /etc/hosts. These files are not indexed nor sorted, so any lookup has to be a sequential scan, potentially of the whole file. A lookup of a non-existent entry must read every record of the file. Insertions and deletions must read and write every record of the file.

There's a reason that the BSDs switched the user account database from this flat file format to Berkeley DB (/etc/pwd.db and /etc/spwd.db) and this is it. Berkeley DB is indexed, and lookups are not sequential scans from the start of the file. This is one of the reasons that the DNS came about (flat file access methods was one reason that HOSTS.TXT was a problem; replicating this file around the whole network was another).

This is the reason that e-mail systems involve databases in CDB, Berkeley DB, and other formats, for setting up things like domain lists and virtual user accounts. In contrast, your C library almost certainly does not provide for using some other database format instead of a flat file for /etc/hosts.

Furthermore, a lot of things involve performing lookups in this table, from special command argument completions in shells through networking tools and system monitoring/administration utilities to printing, WWW browsers, and e-mail systems.

A million-record flat file database is a poor idea for this.

I haven't even yet touched upon the fact that insertions and deletions must be done atomically, lest the system see the file in a partly rewritten state (which may have quite undesirable consequences). This will not be the case if your practice is to naïvely open the file in a text editor. Most text editors by themselves by default will not do atomic file updates. Flat file databases, especially large ones, need more careful handling than people naïvely think. No, they are not "just text files".

Using name lookup to filter WWW server access is also a poor mechanism, of course.

Further reading

JdeBP
  • 66,967
  • 12
  • 159
  • 343