SQL Injection is code injection technique where SQL is
injected by user (as part of user input) into the back end query, and
ultimately changes query purpose which upon execution gives harmful result.
How can SQL Injection happen.
How can SQL Injection happen.
At server side, queries generally by themselves are not complete and
require user data to make it complete, meaningful and executable.
"select * from user where username = ' " + username +
" ' ";
Above query is not complete as it has dependency on username variable.
Now if username variable is filled by third party, then there are chances that user data contains SQL,
Take an example. Application is asking user to enter user name,
Enter user name:________________________
Enter user name:___david'; delete from user where id='1__
At Server Side,
username = "david'; delete from user where id='1"
Final Query = "select * from user where username = ' david'; delete from user where id='1 ' ";
If you observe final query, upon execution it will delete the record from user table which was never the purpose of original query and this is called SQL Injection attack.
Because of user data (which can be anything and uncontrolled) involvement in formation of query, SQL Injection attack can happen.
How PreparedStatement in Java prevents SQL Injection?
To understand this, Lets see steps involved in query execution.
1. Compilation Phase.
2. Execution Phase.
Whenever SQL server engine receives a query, It has to pass through below phases,
1. Parsing and Normalization Phase
In this phase, Query is checked for syntax and semantics. It checks whether references table and
columns used in query exist or not.
It also has many other task to do, Let's not go in much detail.
2. Compilation Phase
In this phase, keywords used in query like select, from, where etc are converted into format
understandable by machine.
This is the phase where query is interpreted and corresponding action to take is decided.
It also has many other task to do, Let's not go in much detail.
3. Query Optimization Plan
In this phase, Decision Tree is created for finding the ways in which query can be executed.
It find out the number of ways in which query can be executed and cost associated with each way
of executing Query.
It choose the best plan for executing a query.
4. Cache
Best plan selected in Query optimization plan is stored in cache, so that whenever next
time same query come in, it doesn't has to pass through Phase 1, Phase 2 and Phase 3.
When next time query come in, it will be checked directly in Cache and picked up from there
and execute.
5. Execution Phase
In this phase, supplied query gets executed and data is returned to user as ResultSet object.
Behaviour of PreparedStatement
API on above steps?
1. PreparedStatement are not complete SQL query and contains placeholder,
which at run time is replaced by actual user data.
2. Whenever any PreparedStatment containing placeholders is passed in to SQL Server engine,
It passes through below phases
1. Parsing and Normalization Phase
2. Compilation Phase
3. Query Optimization Plan
4. Cache (Compiled Query with placeholders are stored in Cache.)
UPDATE user set username=? and password=? WHERE id=?
3. Above query will get parsed, compiled with placeholders as special treatment, optimized and
get Cached.
Query at this stage is already compiled and converted in machine understandable format.
So we can say that Query stored in cache is Pre-Compiled means already compiled and
only placeholders need to be replaced with user data.
4. Now at run-time when user data comes in, Pre-Compiled Query is picked up from Cache and
placeholders are replaced with user data.
(Remember, after place holders are replaced with
user data, final query is not
compiled/interpreted again and SQL Server engine treats user data as pure data and not a
SQL that needs to be parsed or compiled again and that is beauty of PreparedStatement.)
If the query doesn't has to go through compilation phase again, then whatever data is replaced to
placeholders are treated as pure data and has no meaning to SQL Server engine and it directly
executes the query.
Note: It is the compilation phase after parsing phase, that understands/interprets the query
structure and give meaningful behavior to it. In case of PreparedStatement, query is
compiled only once and cached compiled query is picked up all the time to replace
user data and execute.
Due to one time compilation feature of PreparedStatement, it is free of SQL Injection
attack.
SUMMARY:
PrepareStatment query which is incomplete and contain placeholders is first compiled.
In the compilation phase, actual query is parsed and at this point only, any inner queries or
any sub queries are checked.
After this phase query is optimized and saved in Cache.
Now, when user data comes in, Pre-compiled Query is picked from Cache and placeholders are
replaced with user data.
After user data replacement, final Query formed is not compiled again since it is Pre-Compiled,
Now, if user data contain any SQL then also it has no effect and will be treated as pure data.
(Remember if query is compiled again then only it will treat data as SQL if it contains.)
compiled/interpreted again and SQL Server engine treats user data as pure data and not a
SQL that needs to be parsed or compiled again and that is beauty of PreparedStatement.)
If the query doesn't has to go through compilation phase again, then whatever data is replaced to
placeholders are treated as pure data and has no meaning to SQL Server engine and it directly
executes the query.
Note: It is the compilation phase after parsing phase, that understands/interprets the query
structure and give meaningful behavior to it. In case of PreparedStatement, query is
compiled only once and cached compiled query is picked up all the time to replace
user data and execute.
Due to one time compilation feature of PreparedStatement, it is free of SQL Injection
attack.
SUMMARY:
PrepareStatment query which is incomplete and contain placeholders is first compiled.
In the compilation phase, actual query is parsed and at this point only, any inner queries or
any sub queries are checked.
After this phase query is optimized and saved in Cache.
Now, when user data comes in, Pre-compiled Query is picked from Cache and placeholders are
replaced with user data.
After user data replacement, final Query formed is not compiled again since it is Pre-Compiled,
Now, if user data contain any SQL then also it has no effect and will be treated as pure data.
(Remember if query is compiled again then only it will treat data as SQL if it contains.)
Why to use PreparedStatement
instead of Statement API in Java?
1. Whenever Query is executed using Statement in java, then Query contains all the necessary data
for execution.
Query is supplied to SQL Server engine after placeholder replacement with user data.
"SELECT * FROM user WHERE username = ' " + username +
" ' ";
2. Now at run time, if user supplied data contains SQL like,
username = david'; delete from user where id='1 then query becomes,
select * from user where username='david'; delete from user where
id='1';
3. This final query is now supplied to SQL Server engine, which pass through all the below phases,
1. Parsing and Normalization Phase
2. Compilation Phase
3. Query Optimization Plan
4. Cache (Compiled Query with placeholders are stored in Cache.)
4. In Compilation phase, Query supplied will became,
select * from user where username='david';
delete from user where id='1';
delete from user where id='1';
and here the problem starts. So Statement API is prone to SQL Injection attack.
NOTE:
So problem with Statement API is query is not pre-compiled.
In original query with placeholders, user data is first replaced and modified query is passed to
SQL Server engine which runs compilation phase every time on query.
In compilation phase, if user data contains SQL then that will be converted into instructions and
get executed which is not the case with PreparedStatment where compilation of query happens
only one time.
5. When Compared to PreparedStatement, Statement API are not Pre-Compiled and whenever
supplied to SQL Server engine, it passes through compilation phase, which doesn't differentiate
over user input and SQL and that is where problem starts.
6. In PreparedStatement, SQL is supplied to SQL Server engine with place holders which doesn't
contain any user data. SQL Server engine passes the query to all above phases and keep it in ready
to run state, where only place holder replacement is remaining.
So whenever user data is supplied, placeholders are replaced with data and final query is executed
without compiling and that is where user data is treated as pure data and if SQL is supplied in data
that will be treated as data.
NOTE:
So problem with Statement API is query is not pre-compiled.
In original query with placeholders, user data is first replaced and modified query is passed to
SQL Server engine which runs compilation phase every time on query.
In compilation phase, if user data contains SQL then that will be converted into instructions and
get executed which is not the case with PreparedStatment where compilation of query happens
only one time.
5. When Compared to PreparedStatement, Statement API are not Pre-Compiled and whenever
supplied to SQL Server engine, it passes through compilation phase, which doesn't differentiate
over user input and SQL and that is where problem starts.
6. In PreparedStatement, SQL is supplied to SQL Server engine with place holders which doesn't
contain any user data. SQL Server engine passes the query to all above phases and keep it in ready
to run state, where only place holder replacement is remaining.
So whenever user data is supplied, placeholders are replaced with data and final query is executed
without compiling and that is where user data is treated as pure data and if SQL is supplied in data
that will be treated as data.
Importatnt benefits of
PreparedStatement in Java?
1. Now, from above observation we saw that whenever a query is executed using PreparedStatment,
It passes through below phases, and keep the SQL query in Pre-Compiled state.
1. Parsing and Normalization Phase
2. Compilation Phase
3. Query Optimization Plan
4. Cache (Compiled Query with placeholders are stored in Cache.)
So now, if we need to run a query 10 times or 100 times, it doesn't has to pass through all the
above phases and can run quickly by simply replacing placeholders in Pre-Compiled query.
This feature of PreparedStatement gives better performance.
6. Since PreparedStatement are compiled only once, it helps SQL Injection attacks while
Statement is prone to SQL Injection due to its every time compile nature.
No comments:
Post a Comment