Bots are noisy, like really. And dangerous as well, especially if they can do crawling and increase usage by legitimate operations like items catalog retrieve in the case of e-commerce. I mean, we have a lot of reasons to do not like bots and count this problem as a cybersecurity threat, which should not be explained in details.
In this post, I’d like to share an idea on how to detect some bots based on non-browser engines. As you can see in the title, it’s based on CSP.
The main reason for this research (most likely brainstorm) was an ability to detect bots using a reverse proxy without changing response bodies. That’s why all the JS-based tricks were prohibited from the early beginning.
Obviously, it’s possible to inject custom location by 302-redirect and put JS there, but if it’s JSON, XML or anything else except HTML, your redirect with HTML will break an application. I simply don’t want to parse application response body somehow (yes, Content-Type is not enough here because of the developers) to make a decision on my proxy if I need to inject my header or not.
What’s the idea?
Non-browsers bots will not follow CSP guides, that’s how we can check them. As a result, it’s possible to inject custom CSP header with unique string marker in a domain name and track all the requests from a particular browser session.
Moreover, with unique domain names, you can also track DNS requests from clients and correlate it with HTTP requests from them to find bad actors who using proxies/socks for HTTP but forgot about proxying DNS requests also.
If we are talking about agent-based detection proxy somewhere under the roof of load balancer like ELB, we can also track users’ TCP parameters by using CSP responder outside of this infrastructure (load balancers terminates clients’ TCP itself which sacrifices your ability to look there). Also, it’s possible to check SSL ciphers by SalesForce JA3 https://github.com/salesforce/ja3 project and so on.
How it’s easy to bypass?
However, if you will just a simple check of a fact of CSP request, it’s possible to simply parse CSP header and send sample CSP policy violation request there in any bot script. Anyways, even this way is better than nothing for the first time, especially, because it’s pretty new for attackers.
Let’s discuss. It seems like a good point for research, understand what and how could be implemented in details and test in the wild. I’ll more than happy if somebody will take this idea and put it into some GitHub project.