What Is SQL Injection? | Baeldung on Computer Science

1. Introduction

On the myriad of attacks that can really ruin our day, SQL Injection deserves a very special place. It is one of the most used web system hacking techniques. And they are now weaponized on many hacking tools (e.g. sqlmap). So much that these tools can help even not so skilled hackers to engage in highly destructive attacks.

In short, injection attackers seek to insert arbitrary code into the application logic. If they succeed, this code will be executed using the application permissions and security roles. These attacks fall into the broader category of Remote Code Execution attacks (RCE). That is, uploading code (binary or scripted) and tricking the server to execute it afterward.

Injection attacks have been among the riskier on OWASP‘s top ten for a long time, and so they should. Well-crafted attacks can easily alter application data and, even erase their tracks. An SQL injection is when the attacker tries to modify and corrupt the SQL clauses used by the application.

We have already a tutorial covering some of its aspects in Java programming. In this tutorial, we shall delve a little further into what is SQL Injection. How it works, and what makes it so dangerous. We’ll see its categories and obfuscation techniques, and show how to fix it.

2. How Does an SQL Injection Work?

In a nutshell, SQL Injection may happen when the application builds SQL commands by concatenating the strings along with application input parameters. For instance, let’s say that some web application login uses the following pseudo-code to verify if the credentials on the login form are valid:

algorithm VulnerableLoginFormHandler(username, password):
    // INPUT
    //    username, password = login data from an HTML form
    // OUTPUT
    //    authOk (boolean) = authentication status
    //    UserId = user ID 
    //    FullName = the name of the user

    passwd_hash <- md5 of password
    sql <- "select id, fullname from users where passwordHash = '" + passwd_hash
    sql <- sql + "' and userName ='" + username + "'"
    result <- query sql and fetch from the database

    if result is an error:
        return false, null, null
    else:
        return true, result[id], result[fullname]

The resulting SQL clause, if the parameters were Username = “root” and Password = “root” yields to:

SELECT id,Fullname FROM users WHERE Password ='63a9f0ea7bb98050796b649e85481845' AND Username ='root'

If we use the Username as “root’ or 1=1 limit 1;–” and a blank password, the resulting clause would be:

SELECT id,Fullname FROM users WHERE Password ='d41d8cd98f00b204e9800998ecf8427e' AND Username ='root' or 1=1 limit 1;--'

The latter gives the same results as the first one, even if the attacker could not guess the correct password. The trick was the “or 1=1” condition insertion. That could fool the application into accepting any credentials as valid.

That simple injection attack could be even worse, by adding other SQL clauses like, for instance:

Username=”root’ or 1=1 ; select * from customers;– “, to select on other tables
Username=”root’ or 1=1 ; insert into users (username, password) values (‘backdoor_user’,’some_hash’);– “, to insert data
Username=”root’ or 1=1 ; select * customers; — “, to steal application data; or even
Username=”root’ or 1=1 ; drop table customers; — “, to delete data
Username=”root’ or 1=1 union select password_hash as fullname from users; — “, to steal password hashes

The possibilities of damage are endless! Well, of course, the actual commands that will work on any given application will depend on its actual data model, and how the application will respond to the SQL changes. For that reason, the first thing an attacker will do is to use SQL Injection to identify the database and try to retrieve its model.

As we saw, what makes many applications vulnerable to SQL Inject attacks is the fact that their user inputs are concatenated directly to build their SQL commands. Even worse, most of the time they don’t even apply validation rules to those inputs.

On top of that, most databases allow multiple SQL commands in the same query string. Even if the applications don’t show the results of chained SQL clauses issued in a single string command, they are executed anyways. In some cases, the attacker can infer the query response through the application error messages and results.

SQL inject attacks are highly scriptable. That way some tools can reverse engineer the entire database model testing letter by letter and checking error conditions. For your training practice, there is a sample vulnerable containerized application on my GitHub (https://github.com/gchehab/sql_inject_sandbox) that shows what not to do, while coding.

3. Categories of SQL Injection Attacks

There are different approaches to SQL Injection, depending on how the application reacts:

3.1. Error-based

The application returns the SQL errors back. The attacker can use the error messages as feedback to improve their approach. For that reason, is a good practice to have proper application error handling. Not giving details directly to the user that might disclose information on how the application works is strongly recommended.

3.2. Union-based

The crafted SQL Injection attacks try to use a single clause string using a union construct, tricking the application into sending additional information. For that, the attacker must have first acquired information on the database column names. Note that, if the application sends back the actual SQL errors, it also informs the complete query, including the valid part, and that will surely disclose table and columns names.

In the previous techniques, the application somehow was tricked into giving information up to the attackers. However, a more sophisticated is used when the application does not give any feedback of any error or false SQL to the user. So the attackers have to guess the effect of their attempts.

They can try to guess the desired information iteratively letter by letter using any boolean-based query and substrings. It is a form of brute-forcing their way into the complete information. Yes, it takes a very long time, and yes, thousands of subsequent queries to gather any useful information. But if the system is left unmonitored, it is just a matter of time and persistence.

By the way, a boolean operation is anyone that returns a yes or no, true or false information. For instance, informing that a user does not exist in the application is fair game for this attack. That is why secure-aware systems never disclose if a failed login was due to an error on the user or on the password.

Even if the system does not have a boolean operation, the attacker can measure the time for a response. Information that is found in the database is always retrieved faster than one that it is not. It might be enough to guess. Secure-aware applications may be tweaked to have regular response times regardless of the query results. Moreover, they can throttle down response times to insistent repetitive queries.

4. Advanced SQL Injection Obfuscation Techniques

Some applications started to detect attacks by searching input data for SQL reserved words (‘select‘, ‘update‘, ínsert‘, ‘delete‘, ‘drop‘, ‘create‘, for instance). That is far from enough.

Attackers use creative obfuscation techniques to evade detection. Mixing lower and upper-case, using escaped chars, or SQL commands to translate ASCII codes to chars are some of their tactics. The code below is an excerpt of a real obfuscated attack:

GET http://www.somesite.com/Portal/showPortalPage.do?action&codItem=(cONVErt(int,(ChAR(58)%2BChAR(113)%2BChAR(101)%2BChAR(118)%2BChAR(58)%2B(selECT%2F%2A%2A%2FTOP%2F%2A%2A%2F1%2F%2A%2A%2FsubsTriNG((IsNull(caSt(APP..syscolumns.name%2F%2A%2A%2FaS%2F%2A%2A%2FNVARChAR(4000)),ChAR(32))),1,100)%2F%2A%2A%2FFrom ...

When parsed this command excerpt yields to (attack payload in bold font):

GET http://www.somesite.com/Portal/showPortalPage.do?action&coditem=<strong>(</strong>
<strong>convert(int,(:qev:(select top 1 substring((isnull(cast(app..syscolumns.name as nvarchar(4000)), )),1,100) from</strong> ...

Amazing, isn’t it? Had to write a huge script to parse it properly…

5. Mitigating SQL Injection Attacks: Parameter Binding

How can we protect ourselves from those nasty, not-so-little, attacks? First of all, we should never use input data to build queries directly. That means, always using parameter binding to create SQL. Many of the modern Object-Relational Mapping frameworks use parameter binding. However, their behavior may be overridden: Crafting your own raw SQL clauses, for instance.

Parameter binding is a kind of jack of all trades. Not only does it give moderate protection against SQL injection, but also improves database performance. The base pseudo-code, corresponding to the sample code previously shown is:

algorithm FixedLoginFormHandler(username, password):
    // INPUT
    //    username, password = login data from an HTML form
    // OUTPUT
    //    authOk (boolean) = authentication status
    //    UserId = user ID 
    //    FullName = the name of the user

    passwd_hash <- md5 of password
    sql <- "select id,fullname from users where passwordHash = :bPass and userName = :bUser"
    
    statement <- database.prepare(sql)
    statement.bind_parameter('bPass', passwd_hash)
    statement.bind_parameter('bUser', username)
    statement.execute()
    
    result <- statement.fetch_all_data()
    
    if result is an error:
        return false, null, null
    else:
        return true, result[id], result[fullname]

Instead of executing the SQL directly what happens is:

The prepare command sends the SQL template to the database. Based on this template, the database engine parses the command, generates, and caches its execution plan.
The bindParam command informs about the parameters it shall use. The database can establish other data access strategies
Execute command, asks the database to execute the query
Fetch the data

When we have a prepared SQL statement, we can execute it again with different parameters. Just replay steps 2-4 saves a lot of database resources.

Using this technique, every information sent through the application form is used only as parameters to the query. That way they are not confused with the SQL query itself. It is clear that the code gets a little more line codes. So, a common design pattern is to encapsulate the SQL preparation, parameter binding, and query execution in library functions.

5.1. Other Frameworks

We can see that any framework has its own way of parameter binding:

Java EE – use the PreparedStatement() to bind variables
Hibernate – use the createQuery() to bind named parameters
PHP – in PDO use the prepare database object method to create a statement and the statement bindParam method to bind named parameters
.NET – use parameterized queries like SqlCommand() or OleDbCommand()

5.2. Additional Benefits of Parameter Binding

Subsequent uses of the same prepared SQL statement, even with different parameters, will reutilize the cached execution plan. That increases the performance greatly. On a side note, modern databases try to reutilize queries’ planning even for clauses without parameter binding. On the other hand, that still creates some additional performance overhead.

By the way, not using parameter binding was once one the easiest ways to take a database performance down.

A Denial of Service attack, overwriting the entire database cache area, could be done with a simple script like:

algorithm DatabaseCacheDoSBomb():
    i <- 0
    while true:
        sql <- "select " + i + " from dual"
        Database query sql
        i <- i + 1

6. Additional Tips and Good Practices

Besides parameter binding, there are also some good practices we must have in mind to enhance the protection and security of our code.

6.1. Input Form Validation and Escaping

Parameter binding is a good protection technique. Even so, it is strongly advised to properly validate all input data against the expected boundaries and content. This might add resilience against other forms of Remote Code Execution attacks.

That means checking if the data belongs to its proper domain before using it further. For instance, for numbers, check if they are only decimal characters and within boundaries. For regular text, check if there are any only alphanumeric, and so on. Also, avoid letting input fields accept a higher number of characters than actually needed. Always impose appropriate limits.

6.2. Stored Procedures

Similar to the parameter binding, the use of Stored procedures also separates the SQL query parameters from the statements. This happens because the parameters are sent to the database as function parameters so the database shall not mix them with the statement. That ensures SQL Injection safeness, unless, of course, the stored procedure doesn’t try to create dynamic queries on the fly using string concatenation.

6.3. Least Privilege

In any case, as we can see, unless coded otherwise, the queries are executed using the application’s database access privileges. One way to mitigate the risks of this behavior is to adjust the application database service accounts to the minimum required access level.

No DBA or schema owner privilege for the account is a minimum starting point. But it can surely be more restricted. Sometimes, having specific application accounts for higher-privilege tasks is also a good practice.

6.4. Web-application Firewalls

And what if we have a closed-source, or hard to fix, vulnerable application?

In that case, we can use a Web Application Firewall. Do not confuse it with regular network firewalls.

A Web Application Firewall is a software solution that operates on the OSI application layer. That means that it inspects all aspects of the application data flow. In SSL/TLS (like https) connections, it requires the server’s private key in order to do that. It can learn an application’s valid parameters and usage patterns in order to enforce them. It tries to apply some of the protection techniques earlier described to validate, escape, and detect inject patterns in user input.

Moreover, they can detect automated web requests and data leakages, like SSNs or Credit Cards numbers, blocking further attempts, for instance. Their main issue is that they are hard to configure and prone to give false positives, especially if the application behavior is not fully understood.

There are even some open-source Web Application Firewall options that we may try before switching to commercial packages, like Modsecurity, Naxsi, Shadow Daemon. Also, various cloud-based security or CDN (Content Delivery Networks) solutions include such protection on their services.

7. Conclusion

Knowing the vulnerabilities that sloppy coding may open, is key the secure application development. Also, the knowledge of how the attacks work gives us a better understanding of how to mitigate them. Even though it is one of the most common attacks and one easy fixable, there are still many applications open for SQL Injection.

In this tutorial, we’ve shown a little more about SQL Injection. How do vulnerable applications work, and what are the best practices to fix them.

Full Archive

About Baeldung

Core Concepts

Operating Systems

Artificial Intelligence

Graph Theory

Latex