My Java project uses pure JDBC for interacting with Oracle DB (v. 12). The transaction isolation level is Read Committed.
I have a highly denormalized table, which stores an entity in set of rows. I cannot change this. Unfortunately this table has to remain this way and the reasons are independent from me.
+------+------+---------+
| date | hash | ....... |
+------+------+---------+
| date | xyz | ....... |
| date | xyz | ....... |
| date | xyz | ....... |
I have two columns identifying an entity – a date and a hash. Since each entity is stored as several rows, these columns are not really unique, or a primary key, but only indexed columns. Still I want to enforce a kind of “uniqueness”, meaning that only one entity exists at the time, no matter how many rows is it made of.
Such entity can be updated couple of times a day, resulting in different values, but also different number of rows.
To make all this happen, every time I update an entity, I do two or more queries within single transaction:
delete from "table" where "date" = ? and "hash" = ?
insert into "table" values (?, ?, .....)
insert into "table" ....
... -- as many inserts as needed to store whole entity
This works fine for a single instance of application. Unfortunately, I have 2 instances working simultaneously, trying to store exactly the same data at almost the same time (they are simply primary-backup instances, but backup is also persisting – this I also have no influence on).
If this was normalized table, the solution would be to use MERGE statement, but it won’t work here.
My current solution:
What I tried to do so far is to add one more column, an ID of the instance persisting, then executing INSERT statements using SELECT as a data source and putting condition to SELECTs that there must be no data for this date/hash and app ID, otherwise SELECT provides no data to insert.
I thought it would work, but apparently it does not. I still see duplicates. I think it’s because two transactions do their delete at first, still don’t see data yet to be committed by other transaction, thus perform inserts on their own. Then the “commit” is execute and boom. Both transactions insert their data.
Other approaches that I considered:
I guess also optimistic locking won’t work, because at the final version check both transactions can still consider version to not be changed, while they are actually changed by both transactions at the same time and are about to be committed this way.
I know I could switch transaction isolation to SERIALIZABLE, but it isn’t perfect either (first of all, Oracle driver will not serialize queries, but will do optimistic approach and fail with error in case of concurrent modification, I don’t like that, it’s a “programming by exception” paradigm, an anti-pattern, then the second disadvantage is performance of course).
Are there any other solutions to such a problem?
Solution:
Your requirements, as I read them are:
- Database structure cannot change
- Both application must update exactly the same data at the same time
- Optimistic locking is out because it might cause errors or a performance degradation
- Pessimistic locking is out for the same reasons as optimistic locking
It seems like the most important thing is not what data you’re changing but what data you are reading. You need a method of determining what data the users of your system (I can’t tell whether these applications are just maintaining data or also using it) should be served.
I assume what your current queries for serving data are something like:
select * from table where date = :1 and hash = :2
If you change this to the following then you’ll always select the most recent data and if there are duplicates in time you’ll pick the first application (essentially random – change to whatever ordering you want)
select *
from ( select t.*
, rank() over (partition by hash
order by date desc, app_id desc) as rnk
from table t
)
where rnk = 1
You can maybe put this in a view?
Then, you’re able to essentially running two separate tables in a single table. You can use MERGE etc. and can change your DELETE/INSERT statements to:
merge into table o
using (select :1, :2 ... ) n
on ( o.date = n.date
and o.hash = n.hash
and o.app_id = n.app_id
)
when matched then
update
set ...
when not matched then
insert (...
commit;
delete from table
where date < :1
and hash = :2
commit;
where you’re using the same date and hash from your MERGE statement. If the DELETE fails you don’t really mind – you’re protected from selecting the wrong data because you’ve changed your SELECT queries.
Personally, I’d acknowledge that one of your requirements have to change.
If there’s any plans for adding additional applications I’d accept the performance degradation and perform updates on this table serially using a queuing mechanism.
If there’s no plans for adding additional applications take the simple approach now and start using a locking strategy (not pretty) and just handle some known errors.