Archives for the month of: August, 2012

This post is a after-completion summary of my GSoC project GreenSMW

What was the idea of this project?

The original proposal can be found at http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/nischayn22/1

The main deliverables proposed there were

  • Validation of writes using a hash
  • Caching of Special Pages
  • Identification and caching of frequently made queries in Special:ExportRDF
  • Improvements to SMW’s accesses to the database.
  • Identification and caching of large inline queries or complex templates using memcache
  • Profiling and documentation

What part of this has been achieved, what was left behind?

  • Validation of writes using a hash — Done (was very easy to do, got completed in very early)
  • Improvements to SMW’s accesses to the database. — Done (this was the most complicated task as it involved lots of refactoring of old code)
  • Caching of Special Pages — An alternative strategy is being applied here, the Special page methods are made very efficient and now don’t need any caching as such. (yet to commit this change)
  • Identification and caching of frequently made queries in Special:ExportRDF — This was later identified as very low priority as many more places were identified to improve.
  • Identification and caching of large inline queries or complex templates using memcache — This task was later identified as not so trivial, memcache uses time based caching, which is not a good solution for query as they involve lots of invalidation. We planned to work on a different technique to invalidate queries by storing their metadata, this is a bigger task and we decided to do it post GSoC. However, users can use a memcache based approach till then as MWJames has been using http://wikimedia.7.n6.nabble.com/Re-Query-result-caching-and-invalidation-Jeroen-De-Dauw-td4981469.html#none
  • Profiling and documentation — Mostly done, but more part to be done when SMW 1.8 is going to be released.

What was not in the plan (we don’t have plans for everything, do we?)

  • Unit Tests — We covered some parts of SMW’s code using PHPUnit tests.
  • Fixed Properties — Side product of re-organizing the DB stuff, wiki admins can assign separate tables for highly used properties so querying takes little time on those.
  • Migration Script — A script to let users actually switch to SMW 1.8 without disrupting their site’s activity.
  • Semantic diff and site stats — Not fully mature stuff,  but SMW will now be able to produce a diff of the Semantic data, and also store stats of Property usage.

What do you consider the best aspect of participating in GSoC?

The best aspect of participating was contributing to a project that hundreds of people use. Besides, this opportunity gave me immense exposure to the process of Software Development in Open Source

What do you consider the most challenging part of your summer?

Working with existing code was a challenge. I changed something here and it broke something there, such issues occurred many times.

How were your mentors?

Awesome, having two mentors was really beneficial.

Which tips would you give to future students?

Talk to previous year students, talk to mentors as early as possible. Don’t be intimidated by big source codes 😛

What one thing did the Wikimedia community do that you consider very
helpful for your project and would suggest they continue to do?

Developers at Wikimedia have been very helpful throughout, they maintain a friendly atmosphere that welcomes more contributors. I am also thankful to Wikimedia Deutschland for funding my travel to SMWCon in Germany.

 

This week I made the migration scripts so users of SMW can actually use all my work without any hassle of doing a refreshData.php and wait for hours for it to complete. This script will let users to directly migrate all their data from the SMW’s older version to the newer version while still the site could run uninterruptedly with the older version, thus one can safely switch to the newer version at the end and with full control on the user’s part. I am yet to document this migration process but I promise it is very very straightforward and hopefully not time consuming.

Besides, this week I did a few small tweaks that had been left notice sometime or were marked as TODO.

The following weeks we are planning to start working on caching of inline queries, this has many challenges coming for us, and we are expecting this to only be available in a later version (For more information browse semanticmediawiki-devel thread with subject Query caching and Invalidation.)