Welcome, Sumana Harihareswara, Volunteer Development Coordinator
Posted by Rob Lanphier (robla @ WMF) in General engineering, Jobs, Technology on 2011/05/03
Sumana started in a part-time capacity back in March coordinating our participation in Google Summer of Code, as well as helping plan WMF’s participation in the Berlin Developer meeting happening later this month. Starting after the Berlin Developer meeting, she’ll be dedicating her working time to Foundation issues.
In addition to the specific initiative above, she’ll be recruiting and encouraging volunteers more generally. In the near term, she’ll be evangelizing movement priorities within the development community, and working toward matching interested volunteers and organizations to important movement work. She’ll be working with Bugmeister Mark Hershberger on bug triage and finding volunteers to test and fix MediaWiki. She’ll also gather some baseline metrics about our volunteer and corporate communities to measure our progress against. And she’ll be coordinating WMF development work in other open source communities as appropriate. Her Open Source Bridge talk last year (“ The Second Step: HOWTO encourage open source work at for-profits”) is particularly relevant to this last task.
Sumana is currently an active contributor in the GNOME community, as a writer and editor for GNOME Journal, and recently led the marketing effort for GNOME 3.0. She is also a blogger at GeekFeminism, and a longtime participant in open source communities. She has worked at the GNOME Foundation, QuestionCopyright.org, Collabora, Fog Creek Software, and Salon.com, and contributed to the AltLaw, Empathy, Miro, and Zeitgeist open source projects. She’s written a weekly newspaper column and has performed (and taught) stand-up comedy.
Sumana intends on communicating with the MediaWiki and Wikimedia communities in many ways: via IRC and mailing lists, conference calls, and frequent visits to WMF headquarters from New York City and to relevant conferences, both MediaWiki-related and not. For example, she’ll be speaking again this year at Open Source Bridge, giving a talk titled “Learn Tech Management in 45 Minutes”.
If you’re interested in learning more, or dropping a comment on her talk page, Sumana’s user page on MediaWiki.org has much more information.
Welcome, Sumana!
–Rob Lanphier, Engineering Programs Manager for General Engineering
Account Creation Improvement Project Update
Posted by Nimish in Data analytics, Deployments, Technology on 2011/04/27
As you may know from Sue’s March 2011 update, the Wikimedia Foundation has made it one of our highest priorities to improve the experience of new editors, and we thought we’d start right at the beginning: from when a potentially new editor makes an account.
The Wikimedia Foundation’s Community Department has been studying how we can more effectively invite users who create new accounts to actually start editing. Since February, the Account Creation Improvement Project (ACIP) has been experimenting with different user interface messages and landing pages in the account creation flow (see their results and testing content to-date).
We didn’t have an A/B testing infrastructure that supported this work, so while ACIP has performed the first tests sequentially, we’ve now deployed a modification to our ClickTracking extension to English Wikipedia which will allow us to run multiple tests in parallel and record the results.
You’ll notice the “Log in/create account” link on the English Wikipedia will send you to several possible randomized log in screens, recognizable by the “ACP” identifier in the address. This is from the newly created CustomUserSignup extension. Over the next few months, we’ll be varying the look and messaging of these screens to see what kind of impact that has on new editors, and sharing our findings. Our testing framework will allow us to bucket-test small tweaks to the interface and measure the number of accounts created and edits made by users (in aggregate or on a per-session basis) who have gone through different flows.
What data we are storing
We are storing a new cookie upon visiting the “Log in/create account” page, with a lifetime of three months. This cookie will be used to track the following information:
- Which account creation messaging group the user was placed in (identified as ACP1, ACP2 or ACP3 for now)
- What version of the account creation campaign they recieved
- Whether the particular user made it to the end of the account creation process, or whether they dropped off after reaching the login screen or the account creation screen
- If (and only if) the user creates a new account, the number of edits or previews during the course of the trial
The information is associated with browser sessions (each of which has an individual unique identifier), not with an individual user or user account.
Anyone visiting the login page or the account creation page for English Wikipedia will have this
cookie set. This is to make sure that we always provide the same wording to a particular visitor, so as not to invalidate our test. We will stop setting this cookie at the conclusion of this work, though we will likely perform other similar tests in the future.
Because of the privacy-sensitive nature of the system, we have a limit on the level of granularity of our findings. For example, we won’t be able to create a plot of users vs edits, because we don’t have user-level data.
We look forward to the findings of the Account Creation Improvement Project, which will ultimately help us create a better sign-up experience for all users. Independent of this project, the CustomUserSignup extension may also prove useful to other outreach projects, by making it possible to create customized sign-up forms (e.g. for student workshops or e-mail invitations).
Nimish Gautam
MediaWiki selects eight students for Google Summer of Code 2011
Posted by sumanah in MediaWiki, Summer of code, Technology on 2011/04/25
We received more than 25 proposals for this year’s Google Summer of Code, and several mentors put many hours into evaluating project ideas, discussing them with applicants, and making the tough decisions. Our final choices, the Google Summer of Code students for MediaWiki for 2011:
- Akshay Agarwal‘s “Account Creation, Login Screens and AJAX-ification of everything” (mentor: Brandon Harris)
- Kevin Brown’s “Working Archival for Web References/Citations,” “to facilitate the archival of external links used as references in the English Wikipedia” (mentor: Neil Kandalgaonkar)
- Devayon Das‘s “Improving Semantic Search/Semantic Query usability issues in SMW” (mentor: Markus Krötzsch)
- Ankit Garg‘s “Semantic Schemas extension” (mentor: Yaron Koren)
- Salvatore Ingala‘s “AMICUS: Awesome Monolithic Infrastructure for Customization of User Scripts” (mentors: Brion Vibber and Max Semenik)
- Aigerim Karabekova‘s “Extension Release Management” (mentors: Sam Reed, Priyanka Dhanda, and Chad Horohoe)
- Yuvi Panda‘s “Making Offline Wikipedia Article Selection Easier with Mediawiki Extensions” (mentor: Arthur Richards)
- Zhenya Vlasyenko‘s “MediaWiki Extension: SocialProfile – UserStatus feature” (mentor: Jack Phoenix)
You’ll be hearing more about each of these projects in the next few weeks!
Congratulations to this year’s students, and thanks to all the applicants, as well as MediaWiki’s many mentors, developers who evaluated applications, and Google’s Open Source Programs Office. The accepted students now have a month to ramp up on MediaWiki’s processes and get to know their mentors (the Community Bonding Period) and will start coding their summer projects on or before May 23rd. As organizational administrator for MediaWiki’s GSoC participation, I’ll be keeping an eye on all eight students and helping them out.
Good luck!
MediaWiki 1.16.4 security release
Posted by Rob Lanphier (robla @ WMF) in MediaWiki, Technology on 2011/04/15
MediaWiki 1.16.4 is a second security release this week. Shortly after previous release (1.16.3), Masato Kinugawa discovered that one of the XSS problems that the 1.16.3 release was designed to address hadn’t been fully addressed, and reported bug 28507. As a consequence, Internet Explorer 6 users visiting a site running 1.16.3 will still be vulnerable to an XSS attack. After more thorough testing (thanks Roan Kattouw!), we’re releasing 1.16.4.
Full details are in Tim Starling’s 1.16.4 release announcement. Sorry for the inconvenience of a second release, and thank you everyone involved in getting this fixed!
MediaWiki 1.16.3 security release
Posted by Rob Lanphier (robla @ WMF) in General engineering, MediaWiki, Technology on 2011/04/12
There is a new MediaWiki release available which addresses three security vulnerabilities:
- A cross-site scripting (XSS) issue involving media uploads affecting Internet Explorer version 6 and earlier. Note: fully addressing this issue requires web server configuration changes. See bug 28235 and full announcement below for details (discovered by Masato Kinugawa).
- A CSS validation problem in the wikitext parser. This is a cross-site scripting (XSS) issue for all Internet Explorer clients, and a privacy loss issue for other clients. See bug 28450 and full announcement below for details (discovered by user Suffusion)
- A transwiki import problem with access control checks on form submission, which only affects wikis where this feature is enabled. For more details, see bug 28449 and full announcement below for details (discovered by MediaWiki developer Happy-Melon)
Full announcement from Tim Starling after the jump…
Thumbnail issues being resolved
Posted by Mark Bergsma in Operations, Technology on 2011/04/03
Last Monday, our Solaris server that contains all image thumbnails developed problems. It ran out of memory, became too slow and eventually even started to crash. (For the technically inclined: we think the kernel is leaking some file system structure in kernel memory.) This caused missing thumbnails across Wikimedia projects.
We addressed these problems in the following ways:
- We decreased the load on this server by adapting the Squid configuration, so it would have to handle fewer requests.
- We ordered more memory, in order to double the total physical memory in the relevant systems.
- We set up two new Linux servers that will eventually replace the Solaris server.
At first, the addition of these Linux servers in a partially caching setup seemed enough to fix the immediate problem, while gradually copying all thumbnail files, allowing us to replace the Solaris server completely.
However, on Saturday night the Solaris server started crashing repeatedly, making it necessary to engage the image scalers to regenerate a large part of the missing thumbnails. This is causing some slowness of loading and generating new (uncached) thumbnails.
Fortunately, most users have not experienced serious problems while using the site, since most thumbnails are cached by our HTTP caching layer. It is impossible to determine exactly how long it will take to recover completely from the slower service, but we expect that this will take no more than a few days.
Over the past months we have been developing a new and more scalable architecture for media storage, which will solve these problems once and for all. We hope to deploy this new architecture within a few months, also utilizing the new data center. Please watch the Tech Blog for updates on this project.
Wikimedia engineering March 2011 report
Posted by Guillaume Paumier in Technology, WMF engineering reports on 2011/04/02
Major news this month include:
- The publication of a Product whitepaper by the Strategic product team (and the associated update from Sue Gardner) that will guide future engineering efforts.
- The return of Brion Vibber, Wikimedia’s first employee, as Lead Architect for MediaWiki.
- The deployment of Article Feedback 2.0 to the English Wikipedia, and of Upload Wizard 1.0 to Wikimedia Commons.
Events
Upcoming events
- Berlin Hackathon 2011 (May 13-15, Berlin) — Daniel Kinzler announced the dates and location of the Berlin Hackathon. Registration is open until April 10. Participants are also listing topics to work on.
- Summer of Code 2011 — Sumana Harihareswara sent a call for students for the upcoming summer of code. Developers are now signing up as students and mentors, and projects are being discussed. Read the dedicated article to learn more and join us.
- Wikimania (August 2-7, Haifa, Israel) — This year’s Wikimania will be preceded by two days of hacking (August 2-3); the actual conference (August 4-7) will also include Technology tracks.
Personnel
Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
The following positions have opened this month:
The following positions are still open:
- Performance Engineer
- Software Developer — Features
- Software Developer — Mobile
- Systems Engineer — Data Analytics (previously Data Analytics Engineer)
- Operations Engineer
- Senior QA Engineer
- Networking Contractor — Amsterdam
- Software Engineer — Community R&D
In addition, we hope to post the following positions over the next few months:
- Rich Text Editor Engineer
- Release Engineer
- Technical Writer
Short news
- Visitors — Ward Cunningham continued his “in-residence” visits to the Wikimedia office in San Francisco.
- Hires — We’re delighted to welcome Peter Youngmeister, as a Consultant Operations Engineer, and the legendary Brion Vibber, who rejoined the Wikimedia Foundation as Lead Architect,
Operations
Site operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
- Status: The last pieces of hardware arrived at the data center and were racked. The network routers and switches were setup, and the configuration is about 60% done. The first servers are being brought up while we wait for our network connectivity to be installed. We expect to be able to serve limited live traffic and services starting in May.
- Program manager: Mark Bergsma
Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
- Status: A test cluster of three machines running OpenStack Swift will soon be deployed, and will serve a small portion of media traffic. Contractor Russ Nelson is also developing MediaWiki FileRepo support for Swift, so new media uploads can be pushed to the Swift cluster directly.
- Program manager: Mark Bergsma
Testing environment
Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
- Status: The deployment of the virtualization test cluster hardware (which was slightly delayed) is now ready for service. Ryan Lane released version 1.2 of his OpenStackManager extension and created detailed documentation on the setup. He will be finishing the deployment of the virtual test cluster in the first weeks of April.
- Program manager: Mark Bergsma
Backups and data archives
Backups — Improvement of backup coverage of Wikimedia-hosted data.
- Status: Backup coverage of Wikimedia hosted data will see a major increase as soon as connectivity between our two primary data centers is available and data can be copied and replicated. As reliability, fail-over and backup are the primary goals of the new primary data center, setting up live replicas and frequent backups of all our data will have the highest priority of service deployments there.
- Program manager: Mark Bergsma
Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
- Status: The dumps server is back, hardware repaired and running, and we have started to move data over as a live backup of the XML dumps. The new server for the English Wikipedia dumps arrived and is being set up.The January run of the English Wikipedia dumps completed in March and the history files are available for download in two formats. The March run is almost complete and the history files are ready for download in one format already. We’re also working with Google to enable regular mirroring of the most recent dumps to Google storage for download.
- Program manager: Mark Bergsma
Short news
- Thumbnail issues — Our existing, non-scalable media storage architecture hit a performance limit again, which caused image thumbnail download slowdowns around Monday March 28th. This is a known problem that will finally be resolved by our Media Storage redesign described above. In the meantime, we have been working on fixing the existing problems by fine tuning the performance and behavior of the existing systems, and increasing the memory capacity of the current media servers. We are also working on deploying a second thumbnail server to take on some load, as a temporary solution.
Features Engineering
Content Quality and Editorial Tools
Article Feedback (phase 2) — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
- Status: The second phase of this feature was released on the English Wikipedia in mid March. A major change in the interface is the ability for reviewers to specify the source of their knowledge, e.g. if they have an academic degree in a related field (see screenshot). Experiments to encourage user engagement are being performed as well. Dario Taraborelli also published an analysis of the first phase experiment. We’re currently expanding the scope of the experiment to include several thousand articles, in order to get results that are more meaningful statistically.
- Program manager: Alolita Sharma
Article feedback (extended review) — An interface for quality reviews of Wikipedia content.
- Status: The “Open wiki review system” is now considered as a possible evolution of the Article feedback feature. It would offer an interface to submit detailed quality reviews, as well as a system to sort and assess reviews. Ways to surface quality indicators for readers are also being explored.
- Commissioned by: Erik Möller
Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
- Status: Development is in maintenance mode; work will resume when developer resources become available, and after the English Wikipedia community makes a decision regarding the future of the this trial. Steven Walling requested additional data to help the community come to a consensus.
- Program manager: Alolita Sharma
Personal image filter — A feature to allow users to selectively hide media files on a wiki.
- Status: Brandon Harris‘ initial UI design recommendations were presented to the Board of Trustees. Erik Möller is now coordinating with Brandon to take the Board’s feedback into consideration. See the detailed article published in a recent Signpost issue.
- Program manager: Alolita Sharma
Discussions and Interactions
Wikilove 0.1 — A user script to encourage praise and virtual gifts between users.
- Status: Because many automated patrolling tools and gadgets are focused on making it easy to warn or reprimand users, Ryan Kaldari wrote a user script to facilitate nice behavior between editors. For example, it is now possible, on the English Wikipedia and other wikis, to give a “virtual kitten” to another editor. The script was adapted for use by the Russian and Tamil communities, and Ryan is helping support other communities willing to use it.
- Program manager: Alolita Sharma
Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
- Status: Neil Kandalgaonkar and Ryan Kaldari continued to fix bugs, test functionality, and generally ready the software for a 1.0 release. The changes were deployed to the Commons prototype and Neil sent out a call for testing to uncover remaining bugs. Roan Kattouw reviewed the code and deployed the 1.0 release to Commons on March 30th.
- Program manager: Alolita Sharma
Community feature prototyping
As the first engineer “embedded” in the Community department, Trevor Parscal completed the first experiment, related to the location and appearance of the edit link. The results are not available yet, but will be published in the coming weeks. He’s now turning to the account creation improvement project (and the associated A/B testing) with Frank Schulenburg & Lennart Guldbrandsson.
Nimish Gautam and Roan Kattouw also provided support for the A/B testing and deployment respectively.
Engineering support
Editor survey — Integration work between LimeSurvey and MediaWiki to support
- Status: In preparation for the upcoming Editors survey conducted by the Global development department, work was done to integrate the survey software (LimeSurvey) with Wikimedia’s infrastructure. Arthur Richards and Nimish Gautam worked on the back-end to allow LimeSurvey to pull information directly from our database, and automatically provide useful stats about editors, hence simplifying and shortening the survey. Ryan Kaldari worked on integrating LimeSurvey with CentralNotice.
- Program manager: Alolita Sharma
Other projects
- Style guide for forms — Designer Brandon Harris published a draft style guide for forms in MediaWiki, and started a discussion on wikitech-l.
- Liquid Threads — Main developer Andrew Garrett laid down a timeline for his upcoming work on this feature that brings threaded discussions capabilities to MediaWiki. He will first focus on back-end work, before moving to documentation and front-end.
- SimpleSurvey 2.0 — Work on this survey extension for MediaWiki is currently on hold, and will resume as developer resources become available.
- JavaScript parsing library — Work on this JavaScript parsing library for wikitext was slowed down in favor of the Upload Wizard.
- Resource loader — This core feature of MediaWiki 1.17, improving the load time for JavaScript and CSS, is now feature-complete and transitioned to maintenance mode. Trevor Parscal and Roan Kattouw continued to fix bugs as they arose.
- Non-Roman character set localization — Roan Kattouw deployed the Narayam extension, that he had previously refactored in depth. It is now in production on all wikis in Malayam language. This extension adds input methods for some Indic scripts.
Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
- Status: The back-end code of the TimedMediaHandler extension was reviewed by Roan Kattouw, and Michael Dale started to integrate the feedback in the code. The front-end and JavaScript code will be reviewed by Trevor Parscal.
- Program manager: Alolita Sharma
General Engineering
MediaWiki development and tools
MediaWiki 1.17 release — The upcoming MediaWiki release.
- Status: Developers continued to fix bugs discovered after the deployment of MediaWiki 1.17 to Wikimedia sites. A few issues remain, notably related to the new installer and the support of alternative database management systems. We plan to release a beta in early April.
- Program manager: Rob Lanphier
Code review — Review of changes made to the MediaWiki code.
- Status: After the 1.17 code review sprint, the number of unreviewed new revisions started to increase again (see the automatically generated chart). Mark Hershberger started to assign name tags to revisions, to help developers track reviews that are requested from them.
- Program manager: Rob Lanphier
Bugzilla 4.0 upgrade — Upgrade of our bug tracker to the latest version of Bugzilla.
- Status: Priyanka Dhanda coordinated with Rob Halsell to prepare for the upgrade. A prototype was set up, the Vector skin was cleaned up, and some old tweaks were moved into extensions. Chad Horohoe also used the prototype to try out a summary report script shared by the KDE community.
- Program manager: Rob Lanphier
Performance optimization
PoolCounter — A MediaWiki extension to avoid parser deadlocks on high-traffic pages.
- Status: Tim Starling deployed this extension, written by Platonides to controls the number of simultaneous parses that happen on a single page (to avoid the “Michael Jackson” effect). It was later disabled because of a bug now fixed; Platonides also added integrated statistics to this tool. We plan a second deployment attempt early the week of April 4.
- Program manager: Rob Lanphier
Ehcache deployment — Deployment of a disk-backed object cache to increase parser cache hit ratio.
- Status: Tim Starling investigated Wikimedia’s low parser cache hit ratio and suggested to increase the parser cache size to reduce Apache CPU usage. After researching available options for disk-backed object caches, he selected EHcache and wrote a MediaWiki client for it. Our test deployments showed promising results, but also surfaced additional problems that we need to sort out.
- Program manager: Rob Lanphier
Wikimedia analytics
udp2log — A custom data analytics logging system.
- Status: A second logging machine was installed and a load balancer set up to handle the amount of data. Data is now being collected, sampled, filtered and cleaned up. The long-term plan is still to use multicast, in order to allow for growth.
- Program manager: Rob Lanphier
A/B testing — A set of tools to perform A/B testing on Wikimedia sites.
- Status: Nimish Gautam and Trevor Parscal are working on a tally extension, based on the ClickTracking extension. Its purpose is to provide a managing console for A/B tests via an interface similar to how we manage banners in CentralNotice. A “bucketing” extension is also planned, that will direct people to the proper test group. This feature will be integral to the account creation improvement project led by the Community department.
- Program managers: Rob Lanphier
Report card — A monthly report of key metrics to measure community health.
- Status: Erik Zachte tweaked his code on page view statistics. Future improvements include mining the CentralAuth database to identify accounts of the same user across wikis, and use this information to refine editor counts.
- Program managers: Rob Lanphier
Technical communications
Development process improvement — A project to increase transparency and organize Wikimedia Foundation’s engineering efforts more efficiently.
- Status: Guillaume Paumier revived this project and focused on summary pages and versions & phases for Wikimedia-funded engineering projects. The goal is to make it easier to find this information and keep it up-to-date, for the benefit of staff, volunteer developers and users.
- Program manager: Rob Lanphier
Wikimedia blog overhaul — A project to consolidate and improve the Wikimedia blogs.
- Status: After assessing the current situation of Wikimedia blogs, Guillaume Paumier worked with the Communications team, and other departments, to collect requirements. A technical proposal was then created and a prototype set up. Implementation should now happen shortly.
- Project manager: Guillaume Paumier
Other projects
- MediaWiki 1.17 deployment — Some bugs and other minor issues were fixed following the deployment of MediaWiki 1.17 to Wikimedia sites.
- Test framework deployment — Work on this automated test environment for MediaWiki (based on Selenium and PHPUnit) is currently on hold. It will resume when the virtualization cluster is in place, and resources become available.
- OpenWebAnalytics — We’re wrapping up our work on OWA until we’re able to hire our new dedicated analytics team. In the short term, we’re focusing our efforts on A/B testing and other immediate needs, allowing the future analytics team to map out a long-term strategy.
- API maintenance — Sam Reed continued to work on the backlog of bugs and feature requests. He is also investigating appropriate APIs for monitoring system health.
- Shell bugs — Site requests that require shell access to the servers are mostly handled by Rob Halsell and a few dedicated volunteers. Priyanka Dhanda is going to join the team and help out where possible.
- Access to Subversion — Rob Lanphier, Priyanka Dhanda and Chad Horohoe have joined Tim Starling to handle requests for commit access to Subversion.
- Migration to Git — Migrating from Subversion to Git was discussed on the wikitech-l list and issues were raised. The engineering staff is interested in supporting this migration once consensus is formed amongst developers.
- Heterogeneous deployment — The deployment of MediaWiki 1.17 across Wikimedia sites confirmed the need for a way to target software changes and upgrades to specific sets of wikis. Progress is expected to be done by the deployment of MediaWiki 1.18.
- Software deployments tracking — A new page on the wikitech wiki is now tracking recent and upcoming software changes, besides the server admin log.
- Wikistats — Erik Zachte checked in the source code of many of his tools (that provide general statistics on Wikimedia wikis) into our code versioning system.
Mobile
Mobile — All things Mobile and Wikimedia.
- Status: With User:Qgil‘s help, we created a portal on meta for all mobile projects. Alongside our software engineering efforts, a significant significant amount of ground research is being done on mobile strategy. Hiring is now almost done, including for the mobile site rewrite. Volunteer developer Vivek also continued to work on WikiSnaps for Android.
- Program manager: Tomasz Finc
Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
- Status: We finished assessing the existing tools and are actively working with their original author (User:CBM) to plan our next steps. The project is going to focus on making it easier to create collections for schools, and is an excellent fit for a Summer of Code project. We are also discussing with one of the most active offline project members (User:Walkerma) to make sure our use cases are capturing what’s needed.
- Program manager: Tomasz Finc
OpenZim for Collections — Integration of OpenZim into the Collections extension.
- Status: After a successful deployment, we collected both email feedback and bugs. We are now exploring where else we might engage with PediaPress for further work to improve the workflow of our offline projects.
- Program manager: Tomasz Finc
Kiwix UX study — Evaluation of the user experience of the Kiwix mobile app to access offline Wikimedia content.
- Status: We finished our first development sprint of the Kiwix UX improvements. Our next step is to work with testers from Wikimedia Kenya, Wikimedia India and WMF staff members to find bugs in the beta. If you would like to help us, please sign up as a tester. We’re now looking at adding an integrated download manager to facilitate the download of new openZim collections.
- Program manager: Tomasz Finc
This article was written by Mark Bergsma, Tomasz Finc, Alolita Sharma, CT Woo, Rob Lanphier & Guillaume Paumier. See the full revision history. A wiki version is also available.
Project ideas, students, and mentors wanted for Google Summer of Code
Posted by sumanah in MediaWiki, Summer of code, Technology on 2011/03/31
For the sixth year in a row, Wikimedia is participating in the Google Summer of Code program. Google Summer of Code (GSoC) is a program where Google pays summer students USD 5000 each to hack open source projects during the summer (read more).
Over time, MediaWiki has benefited from GSoC students and their projects. For example, Samuel Lampa’s 2010 RDF import/export extension in Semantic MediaWiki is in use. And Jeroen De Dauw, GSoC student in 2009 and 2010, is now a persistently contributing member of the MediaWiki community, as is Brian Wolff, 2010 GSoC student.
In the past, the administrative and management challenges of GSoC have been an extra task that take engineers’ time, and too often fell through the cracks. So this year, Rob Lanphier asked me to act as organizational administrator for MediaWiki’s involvement, via the Wikimedia Foundation.
I’m recruiting students to apply, getting project ideas, and managing the application process overall. Once we choose the students and they start ramping up and working, I will also help mentors manage their students and keep communication going, to make sure that every GSoC student’s project gets delivered and gets used!
We hope 2011′s students will develop useful chunks of MediaWiki (core, extensions, gadgets, scripts, or utilities), help us get their code shipped, and stay in the MediaWiki community afterwards.
This year’s ideas include writing and implementing cite templates in a PHP extension, improving the ImageTagging extension, XML dump work, pre-commit checks in our code repositories, and more. And of course we want to hear your own ideas, too! Interested?
University, community college, and graduate students around the world are eligible to apply to Google Summer of Code. You don’t need to be a computer science or IT major, and you can work from home.
We are looking for students who already know PHP. It’s also great if you have some experience with LAMP, MAMP, LAPP, or one of those kinds of stacks, and with the Subversion version control system. If you haven’t contributed to MediaWiki before, How to become a MediaWiki hacker is a good place to start.
If you’d like to participate, check out the timeline. Make sure you are available full-time from 23 May till 22 August this summer, and have a little free time from 25 April till 23 May for ramp-up.
If you’re interested, please sign up on our wiki page and start talking with us on IRC in #mediawiki on Freenode about a possible project! Then you can submit your proposal via the official GSoC website. The deadline for you to submit a project proposal is April 8th, but we encourage you to start early and talk with us about your idea first.
And, to repeat what Brion once said:
If you’re an experienced MediaWiki developer and would like to help out with selecting and mentoring student projects, please give us a shout! We’ll take you even if you live in the southern hemisphere. ;) We need folks who’ll be available online fairly regularly over the summer and are knowledgeable about MediaWiki — not necessarily knowing every piece of it, but knowing where to look so you can help the students help themselves.
We’re looking forward to hacking with you!
Sumana Harihareswara
MediaWiki Coordinator, GSoC 2011
Article Feedback Pilot: Next Version
Posted by howief in Deployments, MediaWiki, Technology on 2011/03/28
On March 14, we launched v2.0 of the Article Feedback Tool. Version 2.0 is represents a continuation of the work we started last September. To quickly recap, the tool was originally launched as a part of the Public Policy Initiative. In November, the feature was added to about 50-60 articles on the English Wikipedia, in addition to the Public Policy articles. The purpose of adding the tool to these additional pages was to provide us with additional data to help understand the quality of the ratings themselves, namely do these ratings represent a reasonable measurement of article quality?
Since then, we’ve been evaluating the tool using both qualitative and quantitative research. We conducted user research on the Article Feedback tool both to see how users actually used the tool and to better understand the motivations behind rating an article. Readers liked the interactivity of the feature, ease of use, and the ability to easily provide feedback on an article. On the other hand, some of the labels (e.g., “neutral”) were difficult to understand. A detailed summary of the user research has been posted here.
We also did some quantitative research on the ratings data. Though the ratings do appear to show some correlation with changes in the content of the article, there is ample room for improvement (see discussion of GFAJ-1). It also appears as though articles of different lengths show different ratings distributions. For example, there appears to be a correlation between Well-Sourced and Completeness and length for articles under 50kb, but for articles over 50kb in length, the correlation becomes far weaker (see Factors Affecting Ratings).
Based in part on the results from the first version, v2.0 of this feature was designed with two main goals in mind.
- First, we wanted to see if we could improve the correlation between ratings and change in article quality by segmenting ratings based on the rater’s knowledge of a topic. We introduced a question which asks the user whether she is “highly knowledgeable” about the topic. The answers to this question will enable us to compare ratings from users that self-identify as highly knowledgeable versus ones that don’t.
- Second, we wanted to see if rating an article could lead to further participation — does rating an article provide an easy way to contribute, leading to additional participation like editing? We wanted to test this hypothesis in light of the recent participation data. We don’t know whether this will actually be the case, but we wanted to get some data. In v2.0, there is a mechanism that shows a user a message (e.g., “Did you know you can edit this article?”) after they submit a rating. We will measure how well these messages perform. (These messages are dismissible by clicking a “Maybe later” link).
We also made some UI changes based on the feedback from the user study. For example, “Neutral” was changed to “Objective” (as were some other labels) and the submit button has been made more visually obvious. There are a number of other improvements which may be found on the design page.
Finally, in an effort to get a wider variety of articles to research, we increased the number of articles with the tool. We knew from our early analysis that articles in different length bands received different rating distributions, so we created length buckets (e.g., 25-50kb) and selected a random set of articles within each length bucket. User: Kaldari wrote a bot which takes the list of articles and places the tool on the articles in the list [10]. As of March 24, there are approximately 3000 articles that the tool is currently active on. We may expand this list if we can do so without impacting performance of the site.
We’ll be publishing analysis on v2.0 in the coming weeks. In the meantime, please let us know what you think on the workgroup page. Or better yet, join the workgroup to help develop this feature!
UploadWizard nearing 1.0, preview available for testing
Posted by NeilKandalgaonkar in Deployments, MediaWiki, Technology on 2011/03/22
I’m happy to announce that we’re getting close to a 1.0 release for UploadWizard, and we’re planning to deploy it to Wikimedia Commons by the end of this month.
UploadWizard is a step-by-step, multi-file uploader extension for MediaWiki that was developed as part of the Multimedia Usability Project. We launched a beta version in November 2010, and have been working on getting it to release quality ever since.
Recently, Ryan Kaldari joined the team, and he and I have been squashing bugs, testing functionality and readying the software for deployment. We’ve focused on achieving a pleasant interface, that works on all browsers, that orients users to Commons’ mission and helps them make good contributions.
You’re invited to try the new version (you’ll need an account on the prototype) and report issues you encounter with it.
By the way, some people find the UploadWizard’s design a bit surprising — you can upload files before you set a license or describe them, which sounds a bit dangerous (but not the way we’ve done it). We explain all that and more in the FAQ.
If you find a bug, you might want to check the list of open issues first. The following bugs are expected to be completed before launch: 24692, 24696, 24703, 24758, 26053, 26063, 26076, 26179, 26182, 26591, 26592, and 28046. If your problem hasn’t been reported yet, please enter the issue directly in our tracker, or leave a note on the feedback page.
In the meantime, we will be periodically updating Upload Wizard on the prototype server, fixing any (more) bugs you find as fast as we can.
And what else is left to do? Well, after this is deployed, we’re going to be watching things very closely to see how this affects Commons. Our goal is to increase the number of contributions, and the pool of contributors — without any downgrade in quality or burdening the community with spam. We have some plans about how to determine that, but we could always use more help there. If you have ideas about it, please let us know!
Thanks in advance!
Neil Kandalgaonkar
Software Engineer, Multimedia Projects
Wikimedia Foundation
UI Design Experiments
Posted by Trevor Parscal in MediaWiki, Technology on 2011/03/09
In 2009 the Wikipedia Usability Initiative performed research on how to improve the usability of MediaWiki by watching users as they performed various tasks on Wikipedia. Each of the problems that were identified were then matched with potential solutions which were then filtered by technical complexity and user impact. During the development process, yet more features were filtered out because of resource limitations, many of which had even been designed, but never finished.
In a joint effort between members of the Wikimedia Foundation’s community and engineering departments, many of these ideas, as well as some new ones based on ongoing research are being developed and deployed as a series of brief experiments.
The first of these experiments is a redesign of how section edit links are displayed. The current design displays the section edit link on the opposite side of the heading text. During research, users often became confused about which section the link was related to, sometimes associating it with the section above rather than below. Other users did not find the links easily, or in some cases at all. The proposed design was to move the links to be displayed together with the heading text, and to add a small pencil icon to draw attention to the link. This feature was designed but never finished. At the same time as this research was being conducted, Wikia designed and deployed a very similar change to their sites, and reported a measurable increase in the use of section edit links on their wikis as a result of the change.
The experiment is scheduled to be conducted on English Wikipedia from March 9th, 2011 to March 16th 2011, during which time a small fraction of users will be randomly selected to participate in the experiment. During the experiment anonymous statistics will be collected using the ClickTracking extension. Data collected will be used to improve the user experience of Wikipedia and other sites running on MediaWiki. If you would like to abstain from participating in this and other experiments in the future, you can select the “Exclude me from feature experiments” option in your user preferences.
Site fixes this week
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Outage, Technology on 2011/03/09
- One problem with the site since the deployment was a problem with our job queue, which meant that emails that were supposed to be sent from the site weren’t. This backlog was removed last night, and a lot of pent-up email was sent.
- There were some HTML cache invalidations that caused parts of the site to get overloaded for a few minutes.
- Yesterday, we started the deployment of the category sorting improvements. We deployed some modifications to the database today. This resulted in a few hiccups on the site that we’ve since mostly recovered from.
One key set of improvements in the MediaWiki 1.17 release is the category sorting work spearheaded by Aryeh Gregor. This code will eventually improve the sorting of categories in different languages, allowing us to choose the most appropriate sort order for the language. For now, we’re at least switching over to a more sensible sorting algorithm (Unicode Collation Algorithm (UCA)), and have made other improvements to sorting.
This set of changes required a modification of the database that we didn’t believe was risky, but was irreversible. Given how complicated the initial 1.17 deployment was, we decided to hold back on deploying this work.
There are still some maintenance scripts left to run before this work is fully-deployed, but most parts of this are done.
Brion Vibber to rejoin Wikimedia Foundation
Posted by danese in Jobs, MediaWiki, Technology on 2011/03/07
Apparently, Thomas Wolfe was wrong, You can go home again…
It is with great pleasure and excitement that I today announce the pending return of Brion Vibber to Wikimedia Foundation Tech Department in the role of Lead Architect reporting directly to me. Brion’s start date will be March 31st, 2011.
For those of you know don’t know, Brion was the first employee of the Wikimedia Foundation and its first Chief Technical Officer. He wrote much of the original code in MediaWiki, and as such is one of a very small number of people in the world who deeply understands the internal, technical underpinnings of our projects, such as Wikipedia. Brion has been much-honored for his past involvement with MediaWiki, including establishment of “Brion Vibber Day”, which was first celebrated in 2004. Last year he accepted an award on behalf of the original MediaWiki team (Magnus Manske, Lee Crocker, Brion Vibber, and Tim Starling) from the USENIX organization for developing the MediaWiki project. Brion left the Foundation in 2009 to join StatusNet, an open source startup focused on microblogging, while remaining active as a Wikimedia volunteer.
Since I joined WMF in February 2010, I have been looking for a Lead Architect to work on the future of the platform (both for our use and for the thousands of wikis that run on our engine). The biggest challenge was to find somebody who both understands and can work well with our unique culture and still think forward about what I’ve been referring to as “MediaWiki.next”. I recently talked to Brion about the possibility of having him take a role with Wikimedia again to work on MediaWiki. I was ecstatic when he said yes.
Brion’s first project will be on the team tasked with re-writing MediaWiki’s parser, which should be both a challenging and rewarding effort, to which Brion tells me he’s looking forward (you can see why I’m so happy he’s coming back). Please join me in welcoming Brion back in the comments, or catch him on IRC.
Danese Cooper, Chief Technical Officer, Wikimedia Foundation
Wikimedia engineering February report
Posted by Guillaume Paumier in Technology, WMF engineering reports on 2011/03/04
Major accomplishments this month include:
- the racking party at our new data center in Virginia
- the Data Summit that happened in early February in California
- the release of Editor Trends study data and tooling
- the painful, but ultimately successful, deployment of MediaWiki 1.17 to all Wikimedia wikis.
Note: In the past, each “monthly engineering update” has reported on what was accomplished the previous month: the previous “February update” hence reported on what we did in January. In order to avoid any ambiguity, and to be more consistent with the other Wikimedia reports, we’re now going to explicitly call them reports of the previous month. This means this “February report” is about what we did in February.
Read the rest of this entry »
Main deployment of MediaWiki 1.17 to Wikimedia sites complete
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Technology on 2011/02/17
We have been running MediaWiki 1.17 on all Wikimedia wikis for almost a day now, and things seem to be in pretty good shape. We still have a lot of issues to fix, including a problem with disabling the enhanced toolbar in prefs and some issues with categories (see below). Many of the problems are around Javascript and replacing code that isn’t compatible with ResourceLoader. We have a migration guide for developers of gadgets and other MediaWiki customizations, which we encourage anyone who is having problems with gadgets to refer to. Our developers are continuing to find and fix problems.
Based on early reports (albeit very subjective) ResourceLoader is already paying dividends, as navigating around the site seems much zippier in many cases. We hope this is your experience as well.
We still have some deployment work left to do around this release. In addition to the bugfixes, we also want to reintroduce the category improvements that Aryeh Gregor made last summer. We had to temporarily remove these because they required schema changes that would make it difficult to do the type of deployment that we did. Now that we’re confident we’re staying with MediaWiki 1.17, we should be able to deploy these improvements soon. Some bugs with categories you see now may actually be related to this plan, so the good news is that those problems may be fixed by this coming update. We also plan to update ArticleFeedback now that we’re on the newer codebase, and we’ll probably also update some other extensions, too.
If you are interested in the deployment, there’s much more below…
Another 1.17 maintenance window
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Technology on 2011/02/16
Continuing with the work started last week, we plan to deploy 1.17 to more wikis in a couple hours (Wednesday, February 16 at 6:00 UTC for 6 hours). We had hoped we would be able to figure out the performance issues in the past week, but unfortunately, the only practical way we have to see the load problems we witnessed last week is to put the software into production. We have put a lot of instrumentation in place to help us diagnose our load issues. We plan to start the upcoming deployment by rolling out to nl.wikipedia.org, and do some debugging (rolling back if necessary). If we’re able to diagnose and fix the problems quickly, we then plan to roll out 1.17 more widely. If we’re still stumped, we may still roll out to a few more low-traffic wikis, but leave the high-traffic sites until we figure this out.
We plan to have more updates and detailed information on the deployment page on mediawiki.org. Thanks for your patience!
Update (2011-02-16 6:45 UTC): We’ve started the deploy, and it’s going better than we hoped. We’ve deployed to several wikis now, including nl.wikipedia.org, de.wikipedia.org, fr.wikipedia.org, and ja.wikipedia.org. More detailed updates will happen on the deployment page on mediawiki.org.
Update (2011-02-16 12:39 UTC) – We have now pushed 1.17 to all wikis, and the deployment window is closed. Please see deployment page on mediawiki.org for the best way to report any problems you might encounter.
New two-part schedule for 1.17 deployment
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Technology on 2011/02/10
As covered on this blog this week, we had a few problems with our initial deployment of 1.17 to the Wikimedia cluster of servers. We’ve investigated the problems, and believe we have fixed many of the issues. Some of the unsolved issues are complicated enough that the only timely and reasonable way to investigate them is to deploy and react, so we’ve come up with a plan that lets us do it in a safe way by deploying on just a few wikis at a time (as opposed to all at once, as we tried earlier).
We’re scheduling two deployment windows:
- First window – This wave will be deployed between Friday, February 11, 6:00 UTC – 12:00 UTC (10pm PST Thursday, February 10 in San Francisco). This first wave will be to a limited set of wikis (see below).
- Second window – Wednesday February 16 (between 6:00 UTC – 12:00 UTC) – full deployment (tentative)
Repeating what is new about 1.17: There are many, many little fixes and improvements (see the draft release notes for an exhaustive list), as well as one larger improvement: Resource Loader. Read more in the previous 1.17 deployment announcement.
Update (2011-02-11, 8:00 UTC) – we’ve deployed to a few of the wikis now (see below for updates on which ones). We uncovered a couple issues we were able to fix, and plan to keep going.
Update (2011-02-11, 9:07 UTC) – we added he.wikisource.org to the list due to community member request, and so we’d have a right-to-left language wiki in the mix. Thank you he.wikisource community! We’ve now deployed 1.17 to meta.wikimedia.org and he.wikisource.org.
Update (2011-02-11, 10:26 UTC) – we deployed to our last six wikis, and then backed off of nl.wikipedia.org and eo.wikipedia.org once we saw some issues with ParserFunctions. We’re investigating those, and will probably try again before this window is complete.
Final update (2011-02-11, 12:28 UTC) – we found and fixed some localization problems that triggered ParserFunction bugs on both nl.wikipedia.org and eo.wikipedia.org. However, the traffic from nl.wikipedia.org was enough to cause a very noticeable spike in the CPU usage on the web servers, as well as timeout errors in our logs. We have profiling turned on for the list of wikis we’ve deployed to, and will use the time between now and our next deployment window to find and fix problems.
Post Mortem on last night’s 1.17 deployment attempts…
Posted by danese in Deployments, MediaWiki, Outage, Technology on 2011/02/09
We’ve received many complaints about strange behavior on various wikis we host starting last night. These problems were directly related to an attempted deployment.
A bit of background about the 1.17 release:
- In Oct 2010 we committed to more frequent releases in response to community requests.
- Simultaneously, we committed to cutting through the backlog of code review requests from the community. As of this writing, the Code Review Team we formed has reduced the backlog of over 1400 un-reviewed core revisions down to zero in the 1.17 branch, as well as dispatching roughly 4000 other revisions in extensions (figuring out which ones we needed to review, and reviewing the important revisions there, too).
- 1.17 was an omnibus collection of fixes, including a large number of patches which had been waiting for review for a long time. The Foundation’s big contribution to the release was the ResourceLoader, a piece of MediaWiki infrastructure that allows for on-demand loading of JavaScript. Many other incremental improvements were made in how MediaWiki parses and caches pages and page fragments.
As is our usual practice, we review all code before trying to deploy it This practice has generally been good enough in the past that we have been able to quickly address anything we don’t catch in review within the first few minutes of deployment. The 1.17 release process has been longer than we would have liked, which has meant more code to review, and more likelihood for accumulating a critical mass of problems that would cause us to abort a deployment.
Our preparation for deployment uncovered a few issues, including a schema change, an update to the latest version of the diff utility and various other small issues which were discovered during the initial deployment to test.wikipedia.org. Pushing to test.wikipedia.org turns out to have been hugely useful, and in future we will take it as a lesson learned that any large deployment must successfully deploy to test.wikipedia.org at least 24 hours prior to general deployment.
When we finally deployed last night, our Apaches started complaining pretty much immediately. We rolled back to the previous version, worked on debugging and thought we had a suitable fix. We attempted deployment again but found the same issue very quickly. What we discovered was that our cache miss rate went from roughly 22% with the old version of the software (1.16) to about 45% with 1.17. The higher miss rate increased the load on our Apaches to the point where they couldn’t keep up, at which point they start behaving unpredictably. This can cause cascading failures (for example, caching bad data served by overloaded Apaches), and can result in strange layout problems and other issues that many people witnessed today.
By the way, whenever we do a large deployment, a number of WMF staff and community developers meet online to work through any issues that might arise. We schedule deployments late at night in the US to take advantage of lulls in request traffic, so everybody is working late. By the second failure, these people had been awake for many hours and we started to be concerned about their ability to work efficiently on little sleep, so I vetoed further attempts at deployment today.
We are currently combing the logs for further clues about how to mitigate risks of a similar outcome when we next attempt to deploy 1.17, which most likely won’t happen until later this week (at the earliest). We’re are also closely investigating the check-ins related to parsing and caching, and evaluating our profiling data. We plan to regroup tomorrow, decide how confident we are in the fixes we are able to implement in the past 24 hours, and make a decision as to when we should target to deploy.
1.17 deployment postponed
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Technology on 2011/02/08
As noted on this blog, we had planned to deploy 1.17 earlier today (February 8, 07:00 UTC). The deployment preparation took us much longer than we anticipated, and once we did attempt to deploy (at around 13:00 UTC), we encountered some unanticipated performance issues. We reverted to the previous version of the software (1.16) and now we’re investigating the performance problems. We do not yet have a schedule for when we’ll attempt another deployment, but when we do, we’ll post more information here.
Update, 16:27 UTC: We have ironed out the performance issue and have resolved a few other immediate problems that arose as well. We are attempting a second deployment now.
Update, 17:55 UTC: Other issues have come up, and we have canceled our second attempt. We’ll investigate further and share our findings. We don’t plan to try to upgrade again today.
February 2011 WMF Engineering Update
Posted by Guillaume Paumier in Technology, WMF engineering reports on 2011/02/04
January 2011 was a tough month for Wikimedia engineers. About 75% of us caught the “WikiPlague” (a.k.a. RSV) and were out of commission between 3 and 10 days. Also, with the end of the Fundraiser coming early, this past month has been a time of re-starting and re-setting priorities as we shift major focus away from supporting money making and on to money spending…
Major accomplishments this month include:
- the completion of equipment specs and negotiations to order all equipment for the new primary data center in Ashburn, Virginia.
- major work on getting MediaWiki 1.17 released, especially by reducing the Code Review queue to releasable levels.
- work on increasing Nagios and Watchmouse monitoring.
Planned deployment of 1.17 branch on February 8
Posted by Rob Lanphier (robla @ WMF) in Deployments, MediaWiki, Operations, Technology on 2011/02/01
The engineering team is busy working on the deployment of the 1.17 branch of MediaWiki. We plan to roll this out next week to all languages and projects, Tuesday, February 8, with work starting at 07:00 UTC (which is 11pm on Monday, February 7 for San Francisco).
If all goes well, you should only notice the improvement. If it doesn’t go well, that’s because there’s something we missed, and that’s where we’d love your help. Please help us test this release! We have a test instance of the software we plan to deploy available at prototype.wikimedia.org. If you find issues, please report them in Bugzilla.
There are many, many little fixes and improvements that have gone into 1.17 (see the draft release notes for an exhaustive list) . There isn’t much that’s visible to users of the site, but one under the hood improvement that should result in some speed improvements: Resource Loader. Resource Loader optimizes the use of JavaScript in MediaWiki, speeding up delivery of JavaScript by compressing it sometimes, and cutting down on the amount of unused JavaScript that gets delivered to the browser in the first place. Much of the work in this development cycle has been centered on ensuring compatibility with the new system. Since it makes such a large shift in the way that JavaScript is delivered to the browser, it’s also an operational aspect we’ll be keeping a close eye on, as load shifts between servers in our infrastructure.
Note that this isn’t a release for download, yet. On and after February 8, the “latest” version of MediaWiki will still be 1.16 as listed on mediawiki.org. We plan to update this to 1.17 sometime after the deployment of the 1.17 branch, after we’ve had time to run it in production for a while and fix the issues we’re likely to find.
So please, help us test this release, and if you find bugs, please report them in Bugzilla. Thanks!
Update on Offline Wikimedia projects
Posted by tomasz in Offline, Technology on 2011/01/26
Greetings,
With the annual fundraiser wrapping up, two sections of Wikimedia engineering are going to start moving more quickly: Mobile and Offline. The offline ecosystem has a lot of moving parts and it’s easy to get lost. The Wikimedia Foundation is currently focusing on three main areas of intervention: selection tools, file formats and offline apps.
Right now, “Offline” refers to supporting read access to Wikimedia content without an internet connection; increasing reach was identified during the Wikimedia strategic planning process as one of the movement priorities, and the first recommendation of the Offline task force was to “Simplify reuse of content from WMF projects”.
The first step in making Wikimedia content available offline is to select it. The Wikipedia Version 1.0 Editorial Team has been steadily releasing new versions of their beta Wikipedia collections, but technical limitations have hampered how quickly those can be finished. We’re going to evaluate the team’s tool set to see how to support them.
For example, we’re looking at extending the Wikipedia Release Version Tools to add features like sub-selection and comments (see an example of how the tool works for the Physics project).
Once the content has been selected, it needs to be packaged into a standard file format. The openZim format is an actively developed format for offline Wikipedia content, and we want to facilitate its integration into our general architecture.
Our first step is going to be the enhancement of the Collections extension to support openZim. This will be done by our partners from PediaPress, who have already started to work on it. They will need help from other community members to help test the new openZim files created by the extension.
After selection and packaging, the last remaining piece is the application that allows readers to access the content. Over the last many years, there have been lots of Wikipedia offline apps: BzReader, MzReader, WikiTaxi, WikiFilter, Kiwix, Okawix, etc. Some have come and gone, while others continue to thrive and are actively releasing new updates.
One thing we’ve learned looking at this ecosystem is that there is a strong need for a featured, easy-to-use and well supported offline app.
During the strategic planning process, one app emerged as a good candidate for the WMF to actively support: Kiwix. Kiwix has been around since 2007 and, through the great work of its lead developer Kelson, has steadily improved its feature set, platform support and overall stability.
In order to support this work and to help make the application even easier to use, we’ll be conducting a usability study on Kiwix, focused on search and browse, during the first quarter of 2011. Later this year, we’ll be focusing on an easier update cycle using openZim as the underlying storage format.
We hope 2011 will be full of exciting news about offline Wikimedia content. If you’d like to get involved, please participate in the strategic product discussion about Offline, or contact me if you’d like to help with development.
Tomasz Finc
Engineering Program Manager – Offline, Mobile, & Fundraising
January 2011 WMF Engineering Update
Posted by Rob Lanphier (robla @ WMF) in Technology, WMF engineering reports on 2011/01/06
Welcome to the January monthly report from WMF Engineering! As always, we’re reporting on what we’ve been working on and what’s coming up. In December, the fundraiser was in full swing, with a portion of the Engineering team (Arthur Richards, Ryan Kaldari, Nimish Gautam, and Tomasz Finc) supporting the fundraising infrastructure. Danese Cooper, Erik Möller, and Alolita Sharma were in India for most of the month, while much of the rest of the team was focused on the ramp-up to MediaWiki 1.17. More below the fold…
Open Web Analytics 1.4
Posted by tomasz in Data analytics, Technology on 2010/12/29
Open Web Analytics 1.4.0rc3 is out! You probably don’t care, do you? You should! At least we do!
Anyway, let’s start in the beginning:
As we strategized about future development of Wikimedia properties, it became abundantly clear that the measurement tools that we have are insufficient to make the decisions we need to make. This was a key recommendation from the Strategy task force. We evaluated several possible analytics frameworks as a supplement or even replacement for our homegrown system(s). After evaluating a couple of open source solutions (while keeping an open mind about the possible need to go with a proprietary solution), we decided to try out Open Web Analytics (OWA) for this year’s fundraiser, with the goal of evaluating it for broader use.
OWA is a PHP-based analytics tool which provides very sophisticated capabilities for real-time data analysis, providing many tools offered by proprietary counterparts. For us, OWA seems to hit the right balance of flexibility and scalability, with the added benefit that there was already an integration plugin for MediaWiki. Over the past few months, we’ve been working with Peter Adams, the designer of OWA, to adapt OWA for our needs and to make sure that it would work at the scale that we operate at.
Many of the features in the 1.4 release were made initially for our use, but are general-purpose features that many OWA users should be able to benefit from. We wanted to track how successful we were at getting people from banners, to letter, to donation, so Peter added a couple of features called “conversion goal tracking” and “goal funnels” which will help us figure out where people might be dropping off, but can also be used for general conversion analysis on any OWA-enabled site. We also needed to keep track of all of this on a per-banner basis, as well as knowing whether the user clicked on the banner or on the “Donate” link in the sidebar, so the “campaign tracking” feature was added.
Finally, we needed to deploy many instances of OWA, so clustered deployment was added in this release. Peter worked with Nimish Gautam here at WMF to make OWA more scalable, with Nimish becoming a committer on OWA. Peter focused on the architecture, while Nimish focused on making sure that all of the work integrated seamlessly into Wikimedia’s environment.
We’ve just deployed OWA for purposes of observing traffic patterns for the fundraiser, and we’ll be reporting on how well it works for us. We’re not using all of the features; for example, we’ve disabled features such as mouse movement recording/playback. We’re being very careful to respect everyone’s privacy and stay true to the WMF donor privacy policy and the Wikimedia privacy policy
We believe the work we’ve done is generally applicable to anyone who wants MediaWiki analytics, and we’re eager to see how it works for others. We are also at a point where we would love help with testing this.
December 2010 WMF Engineering Update
Posted by Rob Lanphier (robla @ WMF) in Technology, WMF engineering reports on 2010/12/06
Welcome to the December monthly report from WMF Engineering! As always, we’re reporting on what we’ve been working on and what’s coming up. In November, our more visible work involved launching the Fundraiser and the Upload Wizard on commons. Behind the scenes, we worked on the next iteration of Article Feedback, continued to improve our infrastructure (e.g. monitoring, media storage, backups, analytics infrastructure, credit card handling) and continued to chip away at our code review backlog. We continue to hire at a rapid pace, looking to fill many different roles. More below….