Archives for the month of: May, 2012

So, The first week of GSoC is over and we have a new feature. fixedProperties – these allows the site admins to allow some (highly used) properties to be stored in separate tables by listing them in a simple PHP array.

Coming Up : Next I am splitting the core db access code into multiple files (for better readability) and next is the inception of the new class SMWSemanticDataCache which will be a subject-centered cache of all SMW data that could be used for faster retrieval of SMW data rather than querying all the tables for each page.

SMW has been doing too many unnecessary deletes and inserts into your database lately and this has been worrying you too much? Don’t you worry about it now, I have wrote this little code (will link to it after the code cleanup day) that stops SMW from doing this behavior.

 

So what does this new SMW do? It will now check if something has changed for a page and only rewrite the values if something has really changed (Yes, it is just a small feature but look at some profiling data and you will see the difference it does). Here’s some information I gathered for a page with about 14 property-value tuples :

  1. The newer version did about 14 less queries (including 1 update, 1 insert ,8 delete and remaining select queries).
  2. Memory used by older version 38578056 and by newer version 36118576
  3. Time taken by older version 774.183 and by newer version 753.942

 

Though, I don’t expect property-values to really change much in a wiki page but there will always be new additions, so I think that SMW must also check for each property one-by-one and perform updates rather than rewrite everything.

Will this happen? you will find out in another blog from me, till then have a good day 🙂

We discussed ways to improve the performance at the ground level, partitioning the tables by properties and creating new tables (duplicate ones) with partition based on subjects. The documentation is again at http://www.semantic-mediawiki.org/wiki/SQLStore_Update. This let to an interesting idea of separating the smw_id table into  two new tables for  subjects (pages) and properties separately, and this will enable us to store additional information about properties such as their table name (as now we shall have different ones for some properties) and  datatype of their values.
Some of the places that helped us are http://backchannel.org/blog/friendfeed-schemaless-mysql and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Currently, I am ready to create these tables, any further additions will be done on the fly, as otherwise we will never get to code actually. I won’t rewrite the setup.php now and will do it once everything is finalized (also then we will think of the migration script).

I and Markus set today our priorities for the various small work segments on my project and database update won the race, I will shortly be making updates to the SMW databases as per mentioned at http://semantic-mediawiki.org/wiki/SQLStore_Update

My EndSem Exams have almost ended, and today I will start to work on the project.
My next few objectives will be finding the places where we can monitor the write queries done on page edit, and check if we are doing the same query again, if yes we don’t do it and save a lot of time and resources 🙂
Here’s the full list of objectives http://www.google-melange.com/gsoc/project/google/gsoc2012/nischayn22/26001