Google Summer of Code 2009 ideas

For 2010 ideas, go to the Google Summer of Code 2010 ideas page.

Want to spend your summer contributing full-time to Gentoo, and get paid for it? Gentoo is applying for its 4th year in the Google Summer of Code. In the past, almost half of our successful students have become Gentoo developers, so your chances of becoming one are very good if you're accepted into this program.

= For Students = '''Most ideas listed here have a contact person associated with them. Please get in touch with them earlier rather than later to develop your idea into a complete application.''' You can find many of them on Freenode's IRC network under the same username. If there is no contact information, please join the gentoo-soc mailing list or #gentoo-soc on the Freenode IRC network, and we will work with you to find a mentor and discuss your idea.

You don't have to apply for one of these ideas! You can come up with your own, and as long as it fits into Gentoo, we'll be happy to work with you to develop it. Remember, your project needs to have deliverables in less than 3 months of work in most cases. Be ambitious but not too ambitious ;)

We will know whether Gentoo is accepted by 18 March, and you can apply starting March 23 -- the earlier the better! We have a custom application template that we will ask you to fill out. Here it is:

Congratulations on applying for a project with Gentoo! To improve your chances of succeeding with this project, we want to make sure you're sufficiently prepared to invest a full summer's worth of time on it. In addition to the usual application, there are 2 specific actions we would like to see from you:


 * Use the tools that you will use in your project to make changes to code (e.g., source code management [SCM] software such as CVS, Subversion, or git). Please use the same SCM as you will use for your project to check out one of our repositories, make a change to it, and post that change as a patch on a mailing list or bug. You don't necessarily have to fix any real bugs; this is to show that you can use the tools. Your contact in Gentoo can help you determine which SCM and repository you should use for this. If your idea doesn't have a contact, please get in touch with us on the gentoo-soc mailing list or in real-time on IRC at Freenode/#gentoo-soc. Once you've made your change, link to it from your application.
 * Participate in our development community. Please make a post to one of our mailing lists and link to it from your application (archives.gentoo.org holds past postings). The gentoo-soc list would be a good starting point, if you aren't subscribed to any others already.

Both of these actions are things you will do extremely commonly as an open-source developer, and they really aren't that hard, so don't let them hold you back! The remainder of the application is free-form. Please read our application guidelines and Google's FAQ to complete it. Good luck!

= Ideas =

NOTE: Discussions about these ideas should take place on the gentoo-soc Mailing List (CC the contacts if required). Please go through the existing discussions about your idea in the mailing list archives before posting.

Add "tags" support to Portage
Gentoo uses categories now. A package can only be in a single category, which is very limiting because generally things don't fit perfectly into one place without other possibilities. Tags could make it a lot easier to find packages they're looking for by doing Boolean searches like: kde AND mail. This project would add support for tags to Portage and would allow for backwards compatibility of categories as tags.

Note: This project in it's current form is considered very trivial and not suitable for a complete SoC project. If you wish to work on this, you should expand the idea significantly.

Skills:
 * Python

Contacts:
 * [mailto:zmedico@gentoo.org Zac Medico]

Portage/ebuild ability to use file-based capabilities rather than setuid
With recent Linux kernels, file-based capabilities are available. It is thus possible to give the ping command just the minimum capabilities needed to access the RAW socket, rather than leaving it entirely setuid. A long term goal for Gentoo would be to allow the user to choose capabilities over setuid for (at least some) programs. To support this feature however, Portage needs a means to copy over capabilities from one file to another. A python extension may be needed to handle this.

Skills:
 * Python

Contacts:
 * [mailto:flameeyes@gentoo.org Diego Pettenò]

Fastboot on Gentoo
A few months ago, some kernel developers working at Intel got boot times on solid-state drives down to 5 seconds (10 on hard drive). It would be awesome to have this in Gentoo. This would require significant changes to our custom init system, OpenRC, as well as lots of profiling of the kernel boot and userspace boot processes using tools like bootchart. More info about fastboot is here. Note that the work of fastboot was primarily concerned with:
 * 1) sreadahead, for which there already are packages in the tree, and
 * 2) optimizing the init system (which might be a bit tough for OpenRC).
 * 3) optimizing the services in the init system (easier)
 * 4) asynchronous kernel function calling (in 2.6.29)
 * 5) speeding up loading of the desktop environment (early GDM start, profiling etc)

NOTE: There have been a large number of applicants for this idea, and a lot of discussions about this on the mailing list archives. Because of this, we have decided to raise the bar for this idea. Applicants expecting to get selected should show us their capability by plucking some of the low-hanging fruit (aka, easy fixes) in the boot process. Show us the code.

Skills:
 * Shell scripting, C

Contacts:
 * [mailto:darkside@gentoo.org Jeremy Olexa]
 * [mailto:nirbheek@gentoo.org Nirbheek Chauhan] (Got far too excited after reading the LWN article)

Port the new distro-neutral initrd framework, Dracut, to Gentoo
Instead of every distro having its own way to generate an initramfs, many people would like this to ship with the kernel. It's developed primarily by Red Hat with a fresh start and thought put into portability. See this LWN article for more info.

Skills:
 * Shell scripting

Contacts:
 * [mailto:dberkholz@gentoo.org Donnie Berkholz]

Make the clustering LiveCD from last year's GSoC bootable from USB
The clustering LiveCD is a bootable image that allows you to instantly turn a full room of computers into a functional cluster. You only need one CD because the rest of the nodes boot disklessly from the master node with the CD. This project would be a huge improvement to the existing version because it will allow us to use persistent, writable USB media instead of a read-only CD. That means clusters won't need to be reconfigured from scratch on every boot. That will save a lot of admin time on each boot and make it possible to do things like run a cluster in 10 different computer labs with different settings on each, as easily as inserting 1 USB stick in each lab and rebooting all the computers.

Skills:
 * Shell scripting, maybe some Python

Contacts:
 * [mailto:kyron@neuralbs.com Eric Thibodeau]

BSD ports of Gentoo: OpenBSD, NetBSD, DragonFlyBSD, etc.
We've never had a DragonFlyBSD port, and the OpenBSD and NetBSD ports are dead. This project would involve investigating and implementing the necessary porting work to get a Gentoo userland running on a BSD kernel and base system (the base system is also managed by Gentoo).

Skills:
 * Shell scripting, C (patch creation)

Contacts:
 * [mailto:dav_it@gentoo.org Davide Italiano]

EBuild generator
For the basics, writing an ebuild could be some pythonish gui thing to get stuff like dependencies and some common code blocks in there. Plus it's easier if you don't need any special overrides. One attempt to create such a tool was Abeni.

Skills:
 * Python

Contacts:
 * ? (Your name here)

Portage/Pkgcore/Paludis backend adapter for PackageKit
PackageKit is a UI for getting packages installed. Currently there is no backend support for Gentoo-based package managers. This project would consist of authoring the correct adapters to allow PackageKit to utilize one or more existing Gentoo package managers such that users could use PackageKit on Gentoo.

Skills:
 * Python

Contacts:
 * [mailto:zmedico@gentoo.org Zac Medico] (Portage developer)
 * [mailto:dberkholz@gentoo.org Donnie Berkholz] (Has thought about architecture/design philosophy for this)
 * [mailto:nirbheek@gentoo.org Nirbheek Chauhan] (Has thought about it too)
 * [mailto:hughsient@gmail.com Richard Hughes] (PackageKit guru)

Cache sync
The portage tree and all its overlays keep growing. Right now only the official portage tree occupies more than 600Mb on a regular filesystem. However the package manager does not need the whole tree of full ebuilds, patches and manifests to perform most of its work. The idea would be to sync a smaller database or a cache of only needed information for global package manager operations, then fetch the required package only when needed. It would speed considerably tree synchronization and reduce the space occupied by portage tree. Currently the cache system in portage is also really slow and so is the search feature. The project could be inspired by the Debian or RPM system but with the usability and choices offered by Gentoo, and would probably include:
 * design and implement automatic cache builder to be produced by a given repository/overlay
 * make portage/paludis/pkgcore to do delta-sync with a local cache and fetch only the required files to be installed when requested

Skills:
 * Python (portage/pkgcore) or C++ (paludis)

Contacts:
 * Your name here.

Improved binary package support
Portage lacks a few features that would make binary package support much smoother and less prone to breakage, which would make Gentoo better for derived binary distros. One of them is more intelligent handling of library versions with binpkgs. Basically, it's possible to build a binpkg against an old version of a library, then install it against a new version and have it be broken by default because of a shared-library version bump. Ideally, a package would have a way to specify which exact files it depends in the built state on instead of just which versions it can build against from source. Another problem is saving binpkgs with different USE flag and other build settings on the same host. See bug #150031. The way forward is one or more hashes of the metadata. A third problem is the lack of binpkg support for the kernel. This could be changed through modifying the kernel eclass to support a binary USE flag that also did configuration & build, or perhaps some kind of genkernel modification, or both.

Skills:
 * Python, shell scripting

Contacts:
 * [mailto:zmedico@gentoo.org Zac Medico]

Tools and support for multiple FORTRAN compilers
The FORTRAN language is unfortunately not dead yet. Many scientists still use it and your pet program might need it. Gentoo has a few FORTRAN compilers, but the framework is getting old and could be much improved. Some powerful compilers are not allowed. The task could include rewriting the framework to allow any user compiler, FORTRAN profiles with eselect, writing documentation, testing applications, making benchmark on linear algebra. Basically making FORTRAN on Gentoo more configurable, robust and easy.

Skills:
 * Shell scripting, Fortran

Contacts:
 * [mailto:bicatali@gentoo.org Sébastien Fabbro]

Create and release a Gentoo stats server/client
A few Gentoo stats projects have happened in the past, including


 * One for the 2006 GSoC by Marius Mauch (genone) (application, last status update, testing info)
 * One more recent, quick implementation by Alec Warner (antarus)
 * A very simple "client" by Steve Dibb
 * Code for another version, last touched in 2005
 * Yet another, much older version

But right now, we have nothing in a usable state. The one from GSoC 2006 may have come the closest, but it ran into some major issues with authentication and security. Having stats available would be a huge benefit to Gentoo developers (we would know how important packages are to users), to upstream package developers (who would know how well-tested various versions are), and to show people how popular Gentoo is.

One intriguing option is to port Smolt to Gentoo. Smolt is a package written by Red Hat that is intended to be distribution-neutral. Here's an example of the stats. As part of this project, it would be great to enhance Smolt so that it also reported installed packages in addition to hardware.

Skills:
 * Likely Python and/or shell scripting. Possibly some PHP for a webapp.

Contacts:
 * [mailto:beandog@gentoo.org Steve Dibb]
 * [mailto:antarus@gentoo.org Alec Warner] [mailto:antarus@scriptkitty.com Non-gentoo address]

Write G-PEAR (inspired by G-CPAN)
This project involves writing a tool (tentatively named G-PEAR) that generates and installs PEAR packages on the fly. The tool could closely emulate the structure of G-CPAN, that is open to discussion however.

Skills:
 * BASH/ebuild/eclasses
 * PHP

Contacts:
 * [mailto:anant@kix.in Anant Narayanan]

Create a Web-based Gentoo image builder
It would be awesome if people could go to a website and build a custom Gentoo image. This project would entail writing a webapp (along the lines of rPath rBuilder, SUSE Studio, or Angstrom Narcissus). On the backend, it might use a tool like Gentoo's release-builder, Catalyst. This may entail modifications to Catalyst in addition to writing the webapp from scratch. To be successful on this project, you will have to be a very driven, independent student with web development experience, as our potential mentors in this area are extremely busy.

Skills:
 * PHP and/or Python

Contacts:
 * [mailto:solar@gentoo.org Ned Ludd]

Tree-wide collision checking and provided files database
File collisions occur in Portage when a package tries to overwrite a file already installed by another package. Collisions are QA issues (packages involved in a collision need to either block each other or resolve the conflict) and are reported as QA warnings/errors by emerge, but only occur if the offending package is installed on the system. Because of the size of the Portage tree, nobody tests for collisions with all packages.

This project would develop a tool to provide such a test. This would involve setting up a tinderbox (which would eventually become part of the Gentoo infrastructure) to install as many packages from the Portage tree and registered overlays as possible, and prompt the maintainer to install packages which failed to emerge automatically. A database of package contents would be compiled and exposed (read-only) on a server. To perform the check on a new package, a client-side utility would either download the database and check locally, or submit the package's contents for checking to the server. Also, the tinderbox would log all collisions it encountered when installing the packages already in the tree, and add any new packages from the trees nightly.

In addition, the server would provide a Web interface for a user to query a filename and see which packages provide that file.

USE flag changes and live ebuilds can affect package contents, so this would not be a comprehensive test, but it would catch most collisions.

Skills:
 * Likely Python and shell scripting

Contacts:
 * [mailto:weaver@gentoo.org Andrey Kislyuk]

SCM snapshot management infrastructure
A growing number of projects provide binary-only releases of their packages with the source available only as a tag in a version control system. This presents obvious problems for source-based distributions, as we must provide one of two possible solutions:


 * live ebuilds, which pull the tag directly, making them vulnerable to a single point of failure (the upstream SCM server) and the possibility of upstream altering the tagged contents, as well as possibly placing an excess load on the upstream servers; or


 * snapshots, which must be pulled and manually packaged by a developer and hosted on Gentoo infrastructure. Snapshot archives are often not reproducible bit-for-bit and do not have upstream's direct "blessing".

This project would implement a Gentoo snapshot management infrastructure (an extension of Gentoo's existing mirror infrastructure) to provide a better alternative. The process would be:


 * The ebuild writer specifies a scm url and tag


 * If the writer has no access (the ebuild is not uploaded to main tree or a listed overlay), the ebuild behaves as a live ebuild.


 * Otherwise, the snapshot manager daemon packages the snapshot and posts it on Gentoo mirrors. The ebuild fetches the snapshot and uses it. (this will require a manifest update upon/after commit...)


 * The snapshot manager daemon periodically checks the sources of all snapshots for changes that would alter the contents of the snapshot. It alerts the ebuilds' authors to any such changes.

The coding part would involve writing the aforementioned daemon. The mentor would have to be someone from infra, or at least the student would be interacting with infra a lot.

Skills:
 * Likely Python or shell scripting

Contacts:
 * ? (Your name here)

Adapt Kuroo for current portage versions
Kuroo made using Gentoo feel easy - just as it should. Finding and installing packages was very easy, as were most common maintenance tasks, like unmasking a testing ebuild or adjusting USE flags.

Sadly the Kuroo backend couldn't keep up with the pace of development of portage (for lack of time), so it doesn't work any more with current portage versions.

Getting it working again and future proof would regain a great package manager GUI for Gentoo, opening Gentoo up for many new users, especially for people who had someone else install their Gentoo for them and now need a good GUI to maintain their Gentoo.

This project would involve:


 * 1) Porting Kuroo to Portage >=2.1, specifically removing dependencies on /usr/cache/edb/dep files not properly generated by 'emerge --metadata' anymore
 * 2) Porting to QT 4 and KDE 4
 * 3) Finding and implementing a way to make it easy to maintain during future portage changes.

Note: Kuroo's original upstream is dead, and it has been migrated to sourceforge.

Skills:
 * C++
 * QT4/KDE4 porting
 * Python

Contacts:
 * [mailto:galiven@users.sourceforge.net Andrew Schenck]

Octave/Matlab support
Octave support could be really improved in Gentoo. Octave has now a core+plugin architecture. In order to facilitate the inclusion of octave-forge packages, one step would be to create an ebuild installer similarly what is done for perl g-cpan G-CPAN, but it could also use paludis infrastructure. Also there is still open work to integrate matlab programs compatible with octave such that one could use matlab or octave seamlessly. Some work was done by markusle last year on the science overlay the project could rely upon. Skills:
 * Likely shell scripting or possibly Python, C++ if paludis

Contacts:
 * [mailto:bicatali@gentoo.org Sébastien Fabbro]

R packages ebuild generator or installer
Gentoo R users have to rely on R installer to install CRAN packages. The project would involve using Gentoo package manager to install one of the few thousands R packages. Paludis used to have CRAN support that the project could be based on, but doing a g-cpan like tool to generate ebuild would also be good. Skills:
 * Likely shell scripting or possibly Python, C++ if paludis

Contacts:
 * [mailto:bicatali@gentoo.org Sébastien Fabbro]

Universal select tool
eselect is a versatile tool to switch between versions, implementations or in Gentoo. However eselect is not maintained anymore. The project would involve either give eselect some love or start new one. One very useful feature apart from consistency and all the eselect nice stuff it already offers, would be to make it not only sysadmin but also user friendly. A user could easy switch between gcc versions, intel compilers, python versions, etc...Some features are found in the module tool are not in eselect and vice-versa. It would be particularly useful in a common server and clusters to have a universal, robust and consistent tool for switching versions or implementations.

There's eclectic in Exherbo so probably should just switch to that. In that case not enough work here for GSoC.

Looking at the amount of code involved in modules, there's still many features eclectic could add to match. Making eselect/eclectic for compilers is also desirable. Looking at gcc-config sources, it would need some coding translating it to eselect.

Skills:
 * Shell scripting

Contacts:
 * ? (Your name here)

Gemtoo: A Gentoo-specific 'gem install' replacement
Gentoo currently uses the gem installer provided by rubygems to install Ruby gems. The project involves writing a new gem installer in Ruby specific to Gentoo, leveraging as much as possible the existing gem code. This makes it possible to use the normal gems to install software, but much more in sync with Gentoo's phases such as unpack, configure, compile, test, and install. This in turn allows easy patching, QA checks, and running of tests for gems when installing.

Skills:
 * Ruby

Contacts:
 * [mailto:graaff@gentoo.org Hans de Graaff]

Glendix: Create a lean distro based on Gentoo and Plan 9 (Glentoo?)
Glendix is a port of the Plan 9 userspace to the Linux kernel. In it's current state it is just a set of patches to the Linux kernel to make sure Plan 9 binaries are able to run unmodified. However, in order to bring the entire Plan 9 userspace experience to Linux, we need a lightweight Linux distribution that packages all of this nicely. Gentoo provides us a great base for this because of it's customizability.

The objective of this project would be to create a LiveCD based on Gentoo and Glendix, providing the end user with a nice experience of Linux with a non-GNU (namely Plan 9) userspace.

Skills:
 * Catalyst
 * Likely Shell Scripting
 * General knowledge of how a Linux distro is put together, what the init process is and so on

Contacts:
 * [mailto:anant@kix.in Anant Narayanan]

Multiple Repository Support in sys-apps/portage
Be able to easily use overlays (without layman).

each repo should be able to have a differently defined way of sync-ing (within predefined standards), they should also be capable of defining the way that Manifests are handled (git doesn't need more than distfile manifests, other vcs may be similar). Supported repository types should include rsync, git, svn, hg, bzr, darcs, cvs. each SYNC_METHOD should be isolated from the others in code so that they can be improved/extended easily.

Skills:
 * python
 * proficiency in multiple vcs'

Contacts:
 * [mailto:xenoterracide@gmail.com Caleb Cushing]

Maven integration
Be able to easily write ebuilds for projects using maven.

The goal of this project is to be easily able to write ebuilds for upstream projects that use Maven as the build system. The work can either be finishing the work started in java-overlay or a new approach.

Skills:
 * Maven
 * basic Gentoo/Java understading

Contacts:
 * [mailto:betelgeuse@gentoo.org Petteri Räty]

Gentoo/Java IDE integration
Be able to easily work with system libraries in IDEs.

Fix Netbeans or Eclipse so that they are easily able to find system installed libraries and their javadocs. Potentially work with the Package Kit integration project so that the system libraries could automatically be installed based in imports etc.

Skills:
 * Java
 * basic Gentoo/Java understading

Contacts:
 * [mailto:betelgeuse@gentoo.org Petteri Räty]
 * [mailto:serkan@gentoo.org Serkan Kaba]

Rewrite java-config in C++ or python
java-config needs a lot of new features and could use a rewrite as it has a lot of legacy from the generation 1 days.

We could drop the need of python for the Gentoo/Java tools if this is written in C++ but python can also be acceptable. Ideally the need for external libraries is kept to minimum.

Skills:
 * XML transformations
 * basic Gentoo/Java understanding

Contacts:
 * [mailto:betelgeuse@gentoo.org Petteri Räty]