Support >
  About cybersecurity >
  Apache configuration and management that disables specific User-Agents on Linux
Apache configuration and management that disables specific User-Agents on Linux
Time : 2025-12-02 17:25:21
Edit : Jtti

Apache servers on Linux servers frequently encounter various automated traffic requests, some of which may be from web crawlers, scanning tools, or malware. To ensure server security, conserve resources, and maintain content integrity, administrators can filter unexpected client requests by identifying and blocking specific User-Agent strings. This is one of the fundamental methods for implementing access control.

The User-Agent is a field in the HTTP protocol header. Clients (such as browsers and web crawlers) use this field to identify themselves to the server. Normal browsers carry identifiers such as `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36`, while automated tools, malicious scanners, or content scraping programs often use uniquely identifiable strings. By configuring the Apache server to check and block these specific User-Agents, unnecessary or harmful access can be prevented at the entry point.

Apache provides several modules for implementing User-Agent-based access control, the most commonly used being `mod_rewrite` and `mod_setenvif`. Both can achieve the blocking purpose, but their applicable scenarios and configuration methods differ.

The `mod_rewrite` module is powerful, offering highly flexible matching of request conditions through its `RewriteCond` directive. Its core logic is: when the User-Agent header of a request matches a preset rule, a specific rewrite action is executed (such as returning a 403 Forbidden status code).

A basic configuration example is shown below, which can be placed in the Apache main configuration file (e.g., `httpd.conf`), the virtual host configuration section, or a directory-level `.htaccess` file (ensuring the `AllowOverride` option is enabled):

RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Malicious crawler keyword [NC]`
RewriteRule .* - [F,L]

In this rule, `RewriteCond` defines the matching conditions: `%{HTTP_USER_AGENT}` represents the User-Agent header, `malicious crawler keyword` is the string to be matched, and the `[NC]` flag indicates that case is ignored. `RewriteRule .* - [F,L]` means to perform the operation on any URL (`.`): `[F]` returns a 403 Forbidden, and `[L]` indicates that this rule is final and subsequent rules will not be processed.

In actual configuration, you need to replace `malicious crawler keywords` with the actual identifiers to be blocked. For example, to block a crawler named `BadBot` and another traffic-abusing download tool `Wget/1.12` (a specific version), the rule can be written as:

RewriteCond %{HTTP_USER_AGENT} BadBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget/1\.12 [NC]
RewriteRule ^ - [F,L]

Here, the `[OR]` flag is used to connect the two conditions, indicating that the rule will be triggered if either condition is met. The second condition uses the regular expression `^Wget/1\.12`, where `^` indicates the beginning of the string and `.` matches literal periods, ensuring an exact match for a specific version.

The advantage of `mod_rewrite` is its ability to combine complex regular expressions and multiple conditions for fine-grained control. However, it's important to note that using it in `.htaccess` will affect all requests to the directory and its subdirectories, and frequent complex rule checks may incur a slight performance overhead.

The `mod_setenvif` module sets environment variables based on request header conditions using the `SetEnvIf` directive, and combines it with the `Deny` directive to implement access denial. This method has an intuitive syntax and is easy to manage multiple User-Agents.

The basic configuration format is as follows, typically placed in the `<Directory>`, `<Location>`, or `<Files>` section of the main configuration file or virtual host configuration:

SetEnvIf User-Agent "malicious crawler keyword" bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_bot

In this configuration, the `SetEnvIf User-Agent ...` line checks if the User-Agent contains the specified keyword. If it matches, the environment variable `bad_bot` is set to true. `Order Allow,Deny` specifies the processing order. `Allow from all` allows all requests. `Deny from env=bad_bot` ultimately rejects requests marked as `bad_bot`.

For multiple User-Agents, multiple `SetEnvIf` statements can be defined, and they will share the environment variable. For example, to block a series of known spam crawlers:

SetEnvIf User-Agent "Scrapy" bad_bot
SetEnvIf User-Agent "HttpClient" bad_bot
SetEnvIf User-Agent "^Java" bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_bot

This matches User-Agents containing `Scrapy`, `HttpClient`, or starting with `Java`. When using regular expressions like `^Java`, ensure that `mod_setenvif` is supported.

`mod_setenvif` configuration is clearer, especially suitable for maintaining a long list of blocked crawlers.` However, note that the `Order`, `Allow`, and `Deny` directives belong to the `mod_access_compat` module. In Apache 2.4 and above, it is recommended to use the new `Require` directive instead:

SetEnvIf User-Agent "malicious crawler keyword" bad_bot
<RequireAll>
Require all granted
Require not env bad_bot
</RequireAll>

The new syntax is more concise and powerful.

After modifying the Apache configuration, you must perform a syntax check and then restart the service. Use the command `apachectl configtest` or `httpd -t` to verify that the configuration is correct. After confirming that it is correct, restart Apache using `systemctl restart httpd` (if the system uses systemd) or `service httpd restart` to make the rules take effect.

After the rules take effect, verification is crucial. You can use the `curl` command to simulate a request for testing:

curl -I -A "BadBot/1.0" http://your server address/

The `-A` option is used to specify a custom User-Agent. If configured correctly, a `403 Forbidden` response should be received.

Monitoring logs is crucial for evaluating rule effectiveness and identifying new threats. Denied requests are typically logged in Apache's error logs (such as `/var/log/httpd/error_log`) with a 403 status code. Regularly analyzing access logs (such as `/var/log/httpd/access_log`) can also help identify anomalous User-Agents not covered by existing rules. Tools like `grep` and `awk` can be used for analysis, for example, to count the access frequency of a specific User-Agent:

grep 'BadBot' /var/log/httpd/access_log | wc -l

In practice, directly matching the complete User-Agent string may fail due to slight modifications by the target. Therefore, fuzzy matching using regular expressions is often used. For example, `.*BadBot.*` can match any position containing `BadBot`, and `^(?!Mozilla).*$` might be used to match non-browser clients (caution is advised, as it may false positives). For large blocking lists, consider setting the rules to a separate file and then including them in the main configuration using the `Include` directive for easier management and updates.

Special care must be taken during configuration to avoid accidental blocking. Before applying global rules, verification can be performed in a test environment or using a specific IP address. Some legitimate services (such as search engine crawlers Googlebot and Bingbot) have official verification methods and should not be simply blocked.

Furthermore, it's important to recognize that simply blocking the User-Agent is not absolutely secure. Malicious users can easily forge or alter the User-Agent. Therefore, this should be part of a comprehensive security strategy, used in conjunction with IP restrictions, rate limiting, and web application firewalls.

In terms of performance, excessively long and complex rule lists, especially in `.htaccess` files, can increase request processing time. It is recommended to regularly review and optimize rules, merging similar entries and removing invalid ones.

Pre-sales consultation
JTTI-Jean
JTTI-Defl
JTTI-Selina
JTTI-Coco
JTTI-Amano
JTTI-Eom
JTTI-Ellis
Technical Support
JTTI-Noc
Title
Email Address
Type
Sales Issues
Sales Issues
System Problems
After-sales problems
Complaints and Suggestions
Marketing Cooperation
Information
Code
Submit