Part 2. Web Servers
Introduction¶
HTTP Protocol¶
HTTP (HyperText Transfer Protocol) has been the most widely used protocol on the Internet since 1990.
This protocol enables the transfer of files (mainly in HTML format, but also in CSS, JS, AVI...) localized by a character string called URL between a browser (the client) and a Web server (called httpd
on UNIX machines).
HTTP is a "request-response" protocol operating on top of TCP (Transmission Control Protocol).
- The client opens a TCP connection to the server and sends a request.
- The server analyzes the request and responds according to its configuration.
The HTTP protocol is "STATELESS": it does not retain any information about the client's state from one request to the next. Dynamic languages such as php, python, or java store client session information in memory (as on an e-commerce site, for example).
The HTTP protocol is version 1.1. Version 2 is still under development.
An HTTP response is a set of lines sent to the browser by the server. It includes:
-
A status line: this specifies the protocol version used and the processing status of the request, using a code and explanatory text. The line comprises three elements separated with a space:
- The protocol version used
- The status code
- The meaning of the code
-
Response header fields: these are a set of optional lines providing additional information about the response and/or the server. Each of these lines consists of a name qualifying the header type, followed by a colon (:) and the header value.
-
The response body: this contains the requested document.
Here is an example of an HTTP response:
$ curl --head --location https://docs.rockylinux.org
HTTP/2 200
accept-ranges: bytes
access-control-allow-origin: *
age: 109725
cache-control: public, max-age=0, must-revalidate
content-disposition: inline
content-type: text/html; charset=utf-8
date: Fri, 21 Jun 2024 12:05:24 GMT
etag: "cba6b533f892339d3818dc59c3a5a69a"
server: Vercel
strict-transport-security: max-age=63072000
x-vercel-cache: HIT
x-vercel-id: cdg1::pdqbh-1718971524213-4892bf82d7b2
content-length: 154696
Note
Learning the curl
command usages will be very helpfull for you to troubleshoot your servers in the future.
The role of the web server is to translate a URL into a local resource. Consulting the https://docs.rockylinux.org/ page is like sending an HTTP request to this machine. The DNS service therefore plays an essential role.
URLs¶
A URL (Uniform Resource Locator) is an ASCII character string used to designate resources on the Internet. It is informally referred to as a web address.
A URL has three parts:
<protocol>://<host>:<port>/<path>
-
Protocol name: this is the language used to communicate over the network, for example HTTP, HTTPS, FTP, and so on. The most widely used protocols are HTTP (HyperText TransferProtocol) and its secure version HTTPS, the protocol used to exchange Web pages in HTML format.
-
Login and password: allows you to specify access parameters to a secure server. This option is not recommended, as the password is visible in the URL (for security purposes).
-
Host: This is the name of the computer hosting the requested resource. Note that it is possible to use the server's IP address, which makes the URL less readable.
-
Port number: this is a number associated with a service, enabling the server to know the requested resource type. The default port associated with the HTTP protocol is port number 80 and 443 with HTTPS. So, when the protocol in use is HTTP or HTTPS, the port number is optional.
-
Resource path: This part lets the server know the location of the resource. Generally, the location (directory) and name of the requested file. If nothing in the address specifies a location, it indicates the first page of the host. Otherwise it indicates the path to the page to display.
Ports¶
An HTTP request will arrive on port 80 (default port for http) of the server running on the host. However, the administrator is free to choose the server's listening port.
The http protocol is available in a secure version: the https protocol (port 443). Implement this encrypted protocol with the mod_ssl
module.
Using other ports is also possible, such as port 8080
(Java EE application servers).
Apache¶
In this chapter, you will learn about Apache, the web server.
Objectives: In this chapter, you will learn how to:
install and configure apache
apache, http, httpd
Knowledge: Complexity:
Reading time: 30 minutes
Generalities¶
The Apache HTTP server is the work of a group of volunteers: The Apache Group. This group set out to build a Web server on the same level as commercial products, but as free software (its source code is available).
Joining the original team were hundreds of users who, through their ideas, tests, and lines of code, contributed to making Apache the most widely used Web server in the world.
Apache's ancestor is the free server developed by the National Center for Supercomputing Applications at the University of Illinois. The evolution of this server came to a halt when the person in charge left the NCSA in 1994. Users continued to fix bugs and create extensions, which they distributed as "patches", hence the name "a patchee server".
The release of Apache version 1.0 was on December 1, 1995 (over 30 years ago!).
The development team coordinates its work by way of a mailing list, where discussions regarding proposals and changes to the software happen. Voting on changes happens before incorporation into the project. Anyone can join the development team: all you need to do to become a member of The Apache Group is make an active contribution to the project.
The Apache server has a very strong presence on the Internet, still accounting for around 50% of market share for all active sites.
The market share lost by Apache often goes to its biggest challenger: the nginx server. The latter is faster at delivering web pages, and less functionally complete than the giant Apache.
Installation¶
Apache is cross-platform. It is usable on Linux, Windows, Mac...
The administrator will have to choose between two installation methods:
-
Package installation: the distribution vendor supplies stable, supported (but sometimes older) versions
-
Installation from source: which involves compilation of the software by the administrator, who can specify the options that interest him or her, thus optimizing the service. Since Apache has a modular architecture, it is generally not necessary to re-compile the apache software to add or remove additional functionalities (add or remove modules).
The package-based installation method is strongly recommended. Additional repositories are available to install more recent versions of apache on older distributions, but nobody will provide support in the event of problems.
On Enterprise Linux distributions, the httpd
package provides the Apache server.
In the future, you might have to install some extra modules. Here are some examples of modules and their roles:
- mod_access: filters client access by host name, IP address or other characteristic
- mod_alias: enables the creation of aliases or virtual directories
- mod_auth: authenticates clients
- mod_cgi: executes CGI scripts
- mod_info: provides information on server status
- mod_mime: associates file types with the corresponding action
- mod_proxy: proposes a proxy server
- mod_rewrite: rewrites URLs
- Others
sudo dnf install httpd
The version installed on Rocky Linux 9 is 2.4.
Installing the package creates an apache
system user and a corresponding apache
system group.
$ grep apache /etc/passwd
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
$ grep apache /etc/group
apache:x:48:
Enable and start the service:
$ sudo systemctl enable httpd --now
Created symlink /etc/systemd/system/multi-user.target.wants/httpd.service → /usr/lib/systemd/system/httpd.service.
You can check the service's status:
$ sudo systemctl status httpd
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; preset: disabl> Active: active (running) since Fri 2024-06-21 14:22:34 CEST; 8s ago
Docs: man:httpd.service(8)
Main PID: 4387 (httpd)
Status: "Started, listening on: port 80"
Tasks: 177 (limit: 11110)
Memory: 24.0M
CPU: 68ms
CGroup: /system.slice/httpd.service
├─4387 /usr/sbin/httpd -DFOREGROUND
├─4389 /usr/sbin/httpd -DFOREGROUND
├─4390 /usr/sbin/httpd -DFOREGROUND
├─4391 /usr/sbin/httpd -DFOREGROUND
Do not forget to open your firewall (see Security section).
You can check now the availability of the service:
- from any web browser providing the IP address of your server (for example http://192.168.1.100/).
- directly from your server.
For that, you will have to install a text browser, for example elinks.
sudo dnf install elinks
Browse your server and check the default page:
elinks http://localhost
Installing the httpd
package generates a complete tree structure that needs to be fully understood:
/etc/httpd/
├── conf
│ ├── httpd.conf
│ └── magic
├── conf.d
│ ├── README
│ ├── autoindex.conf
│ ├── userdir.conf
│ └── welcome.conf
├── conf.modules.d
│ ├── 00-base.conf
│ ├── 00-brotli.conf
│ ├── 00-dav.conf
│ ├── 00-lua.conf
│ ├── 00-mpm.conf
│ ├── 00-optional.conf
│ ├── 00-proxy.conf
│ ├── 00-systemd.conf
│ ├── 01-cgi.conf
│ ├── 10-h2.conf
│ ├── 10-proxy_h2.conf
│ └── README
├── logs -> ../../var/log/httpd
├── modules -> ../../usr/lib64/httpd/modules
├── run -> /run/httpd
└── state -> ../../var/lib/httpd
/var/log/httpd/
├── access_log
└── error_log
/var/www/
├── cgi-bin
└── html
You will notice that the /etc/httpd/logs
folder is a symbolic link to the /var/log/httpd
directory. Similarly, you will notice that the files making up the default site are in the /var/www/html
folder.
Configuration¶
Initially, configuration of the Apache server was in a single /etc/httpd/conf/httpd.conf
file. Over time, this file has become increasingly large and less readable.
Modern distributions therefore tend to distribute Apache configuration over a series of *.conf
files in the directories /etc/httpd/conf.d
and /etc/httpd/conf.modules.d
, attached to the main /etc/httpd/conf/httpd.conf
file by the Include directive.
$ sudo grep "^Include" /etc/httpd/conf/httpd.conf
Include conf.modules.d/*.conf
IncludeOptional conf.d/*.conf
The /etc/httpd/conf/httpd.conf
file is amply documented. In general, these comments are sufficient to clarify the administrator's options.
Global server configuration is in /etc/httpd/conf/httpd.conf
.
This file has 3 sections for configuring:
- in section 1, the global environment;
- in section 2, the default site and default virtual site parameters;
- in section 3, the virtual hosts.
Virtual hosting lets you put several virtual sites online on the same server. The sites are then differentiated according to their domain names, IP addresses, and so on.
Modifying a value in section 1 or 2 affects all hosted sites.
In a shared environment, modifications are therefore in section 3.
To facilitate future updates, it is strongly recommended that you create a section 3 configuration file for each virtual site.
Here is a minimal version of the httpd.conf
file:
ServerRoot "/etc/httpd"
Listen 80
Include conf.modules.d/*.conf
User apache
Group apache
ServerAdmin root@localhost
<Directory />
AllowOverride none
Require all denied
</Directory>
DocumentRoot "/var/www/html"
<Directory "/var/www">
AllowOverride None
Require all granted
</Directory>
<Directory "/var/www/html">
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
<IfModule dir_module>
DirectoryIndex index.html
</IfModule>
<Files ".ht*">
Require all denied
</Files>
ErrorLog "logs/error_log"
LogLevel warn
<IfModule log_config_module>
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
<IfModule logio_module>
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio
</IfModule>
CustomLog "logs/access_log" combined
</IfModule>
<IfModule alias_module>
ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
</IfModule>
<Directory "/var/www/cgi-bin">
AllowOverride None
Options None
Require all granted
</Directory>
<IfModule mime_module>
TypesConfig /etc/mime.types
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddType text/html .shtml
AddOutputFilter INCLUDES .shtml
</IfModule>
AddDefaultCharset UTF-8
<IfModule mime_magic_module>
MIMEMagicFile conf/magic
</IfModule>
EnableSendfile on
IncludeOptional conf.d/*.conf
Section 1¶
The various directives encountered in section 1 are :
Option | Information |
---|---|
ServerTokens |
This directive will be in a future chapter. |
ServertRoot |
Indicates the path to the directory containing all the files making up the Apache server. |
Timeout |
The number of seconds before the expiry time of a too long request (incoming or outgoing). |
KeepAlive |
Persistent connection (several requests per TCP connection). |
MaxKeepAliveRequests |
Maximum number of persistent connections. |
KeepAliveTimeout |
Number of seconds to wait for the next client request before closing the TCP connection. |
Listen |
Allow apache to listen on specific addresses or ports. |
LoadModule |
Load add-on modules (fewer modules = greater security). |
Include |
Include other server configuration files. |
ExtendedStatus |
Display more information about the server in the server-status module. |
User and Group |
Allows the launching of Apache processes with different users. Apache always starts as root, then changes its owner and group. |
Multi-Process Modules (MPM)¶
The Apache server was designed to be a powerful and flexible server, capable of running on a wide variety of platforms.
Different platforms and environments often mean different functionality, or the use of different methods to implement the same functionality as efficiently as possible.
Apache's modular design allows the administrator to choose which features to include in the server, by selecting which modules to load, either at compile-time or at run-time.
This modularity also includes the most basic web server functions.
Certain modules, the Multi-Process Modules (MPM), are responsible for associating with the machine's network ports, accepting requests and distributing them among the various child processes.
Configuring MPM modules is in the /etc/httpd/conf.modules.d/00-mpm.conf
configuration file:
# Select the MPM module which should be used by uncommenting exactly
# one of the following LoadModule lines. See the httpd.conf(5) man
# page for more information on changing the MPM.
# prefork MPM: Implements a non-threaded, pre-forking web server
# See: http://httpd.apache.org/docs/2.4/mod/prefork.html
#
# NOTE: If enabling prefork, the httpd_graceful_shutdown SELinux
# boolean should be enabled, to allow graceful stop/shutdown.
#
#LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
# worker MPM: Multi-Processing Module implementing a hybrid
# multi-threaded multi-process web server
# See: http://httpd.apache.org/docs/2.4/mod/worker.html
#
#LoadModule mpm_worker_module modules/mod_mpm_worker.so
# event MPM: A variant of the worker MPM with the goal of consuming
# threads only for connections with active processing
# See: http://httpd.apache.org/docs/2.4/mod/event.html
#
LoadModule mpm_event_module modules/mod_mpm_event.so
As you can see, the default MPM is the mpm_event
.
The performance and capabilities of your web server depend heavily on the choice of MPM.
Choosing one module over another is therefore a complex task, as is optimizing the chosen MPM module (number of clients, queries, and so on.).
By default, the Apache configuration assumes a moderately busy service (256 clients max).
About keepalive directives¶
With the KeepAlive
directive disabled, every resource request on the server requires opening a TCP connection, which is time-consuming from a network point of view and requires a lot of system resources.
With the KeepAlive
directive set to On
, the server keeps the connection open with the client for the duration of the KeepAlive
.
Given that a web page contains several files (images, stylesheets, javascripts, etc.), this strategy is a quick winner.
However, it is important to set this value as precisely as possible:
- Too short a value penalizes the customer,
- Too long a value penalizes server resources.
KeepAlive
values for individual customer virtual hosts allows more granularity per customer. In this case, setting KeepAlive
values happens directly in the customer's VirtualHost or at proxy level (ProxyKeepalive
and ProxyKeepaliveTimeout
).
Section 2¶
Section 2 sets the values used by the main server. The main server responds to all requests that are not handled by one of the Virtualhosts in section 3.
The values are also used as default values for virtual sites.
Option | Information |
---|---|
ServerAdmin |
specifies an e-mail address which will appear on certain auto-generated pages, such as error pages. |
ServerName |
specifies the name identifying the server. Can happen automatically, but it the recommendation is to specify it explicitly (IP address or DNS name). |
DocumentRoot |
specifies the directory containing files to serve to clients. Default /var/www/html/. |
ErrorLog |
specifies the path to the error file. |
LogLevel |
debug, info, notice, warn, error, crit, alert, emerg. |
LogFormat |
defines a specific log format. Done with the CustomLog directive. |
CustomLog |
specify path to access file. |
ServerSignature |
seen in the security part. |
Alias |
specifies a directory outside the tree and makes it accessible by context. The presence or absence of the last slash in the context is important. |
Directory |
specifies behaviors and access rights by directory. |
AddDefaultCharset |
specifies the encoding format for pages sent (accented characters can be replaced by ?...). |
ErrorDocument |
customized error pages. |
server-status |
report on server status. |
server-info |
report on server configuration. |
The ErrorLog
directive¶
The ErrorLog
directive defines the error log to use.
This directive defines the name of the file in which the server logs all errors it encounters. If the file path is not absolute, the assumption is to be relative to ServerRoot.
The DirectoryIndex
directive¶
The DirectoryIndex directive defines the site's home page.
This directive specifies the name of the file loaded first, which will act as the site index or home page.
Syntax:
DirectoryIndex display-page
The full path is not specified. Searching for the file happens in the directory specified by DocumentRoot.
Example:
DocumentRoot /var/www/html
DirectoryIndex index.php index.htm
This directive specifies the name of the website index file. The index is the default page that opens when the client types the site URL (without having to type the index name). This file must be in the directory specified by the DocumentRoot
directive.
The DirectoryIndex
directive can specify several index file names separated by spaces. For example, a default index page with dynamic content and, as a second choice, a static page.
The Directory
directive¶
The Directory tag is used to define directory-specific directives.
This tag applies rights to one or more directories. The directory path is entered as an absolute.
Syntax:
<Directory directory-path>
Defining user rights
</Directory>
Example:
<Directory /var/www/html/public>
Require all granted # we allow everyone
</Directory>
The Directory
section defines a block of directives applying to a part of the server's file system. The directives contained here will only apply to the specified directory (and its sub-directories).
The syntax of this block accepts wildcards, but it is preferable to use the DirectoryMatch block.
In the following example, we're going to deny access to the server's local hard disk, regardless of the client. The "/" directory represents the root of the hard disk.
<Directory />
Require all denied
</Directory>
The following example shows authorizing access to the /var/www/html publishing directory for all clients.
<Directory /var/www/html>
Require all granted
</Directory>
When the server finds an .htaccess
file, it needs to know whether directives placed in the file have authorization to modify the pre-existing configuration. The AllowOverride
directive, controls that authorization in Directory
directives. When set to none
, .htaccess
files are completely ignored.
The mod_status
¶
The mod_status
displays a /server-status
or /server-info
page summarizing server status:
<Location /server-status>
SetHandler server-status
Require local
</Location>
<Location /server-info>
SetHandler server-info
Require local
</Location>
Please note that this module provides information that should not be accessible to your users.
Shared hosting (section 3)¶
With shared hosting, the customer thinks they are visiting several servers. In reality, there is just one server and several virtual sites.
To set up shared hosting, you need to set up virtual hosts:
- declaring multiple listening ports
- declaring multiple listening IP addresses (virtual hosting by IP)
- declaring multiple server names (virtual hosting by name)
Each virtual site corresponds to a different tree structure.
Section 3 of the httpd.conf
file declares these virtual hosts.
To facilitate future updates, it is strongly recommended that you create a section 3 configuration file for each virtual site.
Choose virtual hosting "by IP" or "by name". For production use, it is not advisable to mix the two solutions.
- Configuring each virtual site in an independent configuration file
- VirtualHosts are stored in
/etc/httpd/conf.d/
- The file extension is
.conf
The VirtualHost
directive¶
The VirtualHost
directive defines virtual hosts.
<VirtualHost IP-address[:port]>
# if the "NameVirtualHost" directive is present
# then "address-IP" must match the one entered
# under "NameVirtualHost" as well as for "port".
...
</VirtualHost>
If you configure the Apache server with the basic directives seen above, you will only be able to publish one site. Indeed, you can not publish multiple sites with the default settings: same IP address, same TCP port and no hostname or unique hostname.
The use of virtual sites will enable us to publish several websites on the same Apache server. You are going to define blocks, each of which will describe a website. In this way, each site will have its own configuration.
For ease of understanding, a website is often associated with a single machine. Virtual sites or virtual hosts are so called because they dematerialize the link between machine and website.
Example 1:
Listen 192.168.0.10:8080
<VirtualHost 192.168.0.10:8080>
DocumentRoot /var/www/site1/
ErrorLog /var/log/httpd/site1-error.log
</VirtualHost>
Listen 192.168.0.11:9090
<VirtualHost 192.168.0.11:9090>
DocumentRoot /var/www/site2/
ErrorLog /var/log/httpd/site2-error.log
</VirtualHost>
IP-based virtual hosting is a method of applying certain guidelines based on the IP address and port on which the request is received. In general, this means serving different web sites on different ports or interfaces.
The NameVirtualHost
directive¶
The NameVirtualHost
directive defines name-based virtual hosts.
This directive is mandatory for setting up name-based virtual hosts. With this directive, you specify the IP address on which the server will receive requests from name-based virtual hosts.
Syntax:
NameVirtualHost adresse-IP[:port]
Example:
NameVirtualHost 160.210.169.6:80
The directive must come before the virtual site description blocks. It designates the IP addresses used to listen for client requests to virtual sites.
To listen for requests on all the server's IP addresses, use the * character.
Taking changes into account¶
For each configuration change, it is necessary to reload the configuration with the following command:
sudo systemctl reload httpd
Manual¶
There is a package containing a site that acts as an Apache user manual. It is called httpd-manual
.
sudo dnf install httpd-manual
sudo systemctl reload httpd
When installed, you can access the manual with a web browser at http://127.0.0.1/manual.
$ elinks http://127.0.0.1/manual
The apachectl
command¶
The apachectl
is the server control interface for Apache httpd
server.
It is a very usefull command with the -t
or configtest
witch run a configuration file syntax test.
Note
Very usefull when used with ansible handlers to test the configuration.
Security¶
When protecting your server with a firewall (which is a good thing), you might need to consider opening it.
sudo firewall-cmd --zone=public --add-service=http
sudo firewall-cmd --zone=public --add-service=https
sudo firewall-cmd --reload
SELinux¶
By default, if SELinux security is active, it prevents the reading of a site from a directory other than /var/www/
.
The directory containing the site must have the security context httpd_sys_content_t
.
You can check current context with the command:
* ls -Z /dir
Add context with the following command:
sudo chcon -vR --type=httpd_sys_content_t /dir
It also prevents the opening of a non-standard port. Opening the port is a manual operation, using the semanage
command (not installed by default).
sudo semanage port -a -t http_port_t -p tcp 1664
User and Group directives¶
the User
and Group
directives define an Apache management account and group.
Historically, root ran Apache, which caused security problems. Apache is always run by root, but then changes its identity. Generally User apache
and Group apache
.
Never ROOT!
The Apache server (httpd
process) starts with the root
superuser account. Each client request triggers the creation of a "child" process. To limit risks, launching these child processes happens from a less privileged account.
The User and Group directives declare the account and group used to create child processes.
This account and group must exist in the system (by default, this happens during installation).
File permissions¶
As a general security rule, web server content must not belong to the process running the server. In our case, the files should not belong to the apache
user and group, since it has write access to the folders.
You assign the contents to the unprivileged user or to the root user and the associated group. Incidentally, you also take the opportunity to restrict the group's access rights.
cd /var/www/html
sudo chown -R root:root ./*
sudo find ./ -type d -exec chmod 0755 "{}" \;
sudo find ./ -type f -exec chmod 0644 "{}" \;
Author: Antoine Le Morvan
Contributors: Steven Spencer, Ganna Zhyrnova