Not sure if this is my last post for GSoC’12 but I am happy to announce that my contribution to Semantic MediaWiki over the summer as part of my Google Summer of Code project GreenSMW is now released as SMW 1.8 (http://semantic-mediawiki.org/wiki/Semantic_MediaWiki_1.8.0_released)
Overall this summer has been a life changing time for me, from being introduced to a new world of open source software to introducing others to the same. I also had the opportunity to give a talk on my work for SMW at the SMWCon in October (slides can be found at http://semantic-mediawiki.org/wiki/SMWCon_Fall_2012/Improvements_in_SQLStore3/Presentation). Later I attended the Wikipedia DevCamp and got to meet with lot of developers from WMF and other volunteers; its an amazing world out there. Hopefully, my contributions to open source continues like this forever 🙂
Last few days I have done some major improvements to Special pages for SMW.
The old way of generating SpecialProperties involved unnecessary joins and limited result (some result were omitted), the new method is much simpler, I only query the required information in multiple simple queries which should be much much faster than the old way.
After this I also did some analysis on using Indexes in the db tables to see how MySQL worked with the new method and got amazing results.
The following query is done with no indexes (as is now), it runs very slow scanning about 10,724 rows to return only 250 of them. Slow Indeed 😉
mysql> explain select * from store3.smw_ids where smw_namespace=102 order by smw _sortkey limit 200,50;
| id | select_type | table | type | possible_keys | key | key_len | ref | ro
ws | Extra |
| 1 | SIMPLE | smw_ids | ALL | NULL | NULL | NULL | NULL | 10
724 | Using where; Using filesort |
1 row in set (0.00 sec)
Next is my new way using index (smw_namespace, smw_sortkey) runs faster by scanning only 1010 rows to return the same result of 250.
mysql> explain select * from store3.smw_ids where smw_namespace=102 order by smw
_sortkey limit 200,50;
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
| 1 | SIMPLE | smw_ids | ref | NS_SK | NS_SK | 4 | const |
1010 | Using where |
1 row in set (0.00 sec)
While the new method is really fast we still need to cache stuff as this is still expensive for really large wikis
My EndSem Exams have almost ended, and today I will start to work on the project.
My next few objectives will be finding the places where we can monitor the write queries done on page edit, and check if we are doing the same query again, if yes we don’t do it and save a lot of time and resources 🙂
Here’s the full list of objectives http://www.google-melange.com/gsoc/project/google/gsoc2012/nischayn22/26001
Exams, tests and vivas all knocking on my doors together; there are lots of them.
Will continue working in small bits till 7th May.
Right now on working on a bug https://bugzilla.wikimedia.org/show_bug.cgi?id=31269 .It’s interesting to work on bugs with some votes, so you know you made something useful to someone.
Thanks to Yuvipanda for reminding me of Invalidating Cache, Possible solutions are
- expiry time (easy)
- dependency check ( eg. some new properties are added so we mark the cache of Special:Properties as invalid )
For once I have used memcache , which in absence of it will use objectcache. Let’s see if Markus and Jeroen have any suggestions. We can roll back easily, however this serves our purpose and is working 🙂
Seeing the code of SMW Special Properties it is clear that Markus thought of implementing Query Page.
Query Pages provide caching in the table query cache which only provides fields to save page-title and namespace. But, for SMW Special Pages we might need more fields. Should we use some other table (object cache) or memcache and store the results as objects ?
Should we use Query Page to implement caching in special pages? they serve all requirement for caching special pages