Once upon a time…
There were three data engineers: Carlos, Cyril, and Stefan. As data engineers, their job was to make the lives of people analyzing data easier. Since they were all part of the Open Research Data team, they generally worked with open-source code, open data, and online material created by the community. Can they use any code they find online? No! For code to be re-usable by others, it must come with a license that defines their rights and permissions. Without a license, only the code's original author is legally allowed to (re)use it.
The significant role of open source licenses led Carlos, Cyril, and Stefan to wonder one day:
How many public code repositories have a license associated with them?
Due to the number of repositories and the diversity of platforms used to share the code, it’s challenging to have a proper answer, but in order to get an idea of how popular licenses are, Cyril suggested using the paperswithcode data: a collection of publications focused on machine-learning bundled with their code. Using this resource, we got new insights on the most beloved licenses in the data science community and found out that only 50% of resources had a license allowing re-usability.
From an idea to an opportunity - the big plan
Like many interesting small ideas, this was not a priority and was soon forgotten… That was until we received an email about the AI+Art exhibition at the AI+Summit 2023. This was hugely important as we could finally dedicate time to a small project that we could potentially share with the AI community: a clear win-win. The excitement was high, and looking back now, our first idea was enormous.
Imagine an infographic that feels alive: a picture of a garden representing the code landscape with various beautiful flowers, each representing a unique license. The size of the flower would reflect the popularity of a given license. To make it feel alive, we created a short looping video where some breeze would move the flowers, and flowers would grow or shrink by tracking the number of licenses over time from paperswithcode.
We quickly realized that the whole idea was too much for the little time we had to dedicate. However, just like taming a wild tree, we started pruning off unnecessary branches, trimming complicated ideas, and clearing the path until we reached the core.
The core
Imagine open-source software development as an ecosystem of living organisms with code as genetic information. For this ecosystem to thrive, pieces of code are used, adapted, shared, or refactored, depending on its license. […] If code is like DNA, then licenses are like organs defining whether codes are compatible and fertile. Similar to different species, some licenses are more promiscuous than others; therefore, they follow different reproductive strategies.
Flowers perfectly encapsulate our metaphor as they are a plant’s reproductive organs and are also used to represent fertility in humans.
With this description, we wanted to showcase the beauty of code licenses by linking them to nature, specifically flowers. Carlos shared with us Dear Data, an amazing project by Georgia Lupi and Stephanie Posavec, who try to visualize data in highly creative ways. Inspired by this project, we wanted to combine flowers with hand-made unique visualizations, so we chose an “old-school” approach: old biology books. We also agreed to use AI to draw these flowers, and as Carlos was already familiar with it, we chose Midjourney, an AI tool for image generation that allowed us to use a “seed” image as a reference drawing. However, we were missing the most important aspect: what do we use as a prompt for Midjourney? License texts? Summarized legal text? Boring! This is when our key idea dawned on us: we can let AI describe licenses as flowers, and that’s how magic happened.
Apache-2.0 License Flower: With a bold stance, its vibrant red petals unfurl like a seasoned lawyer, each layer revealing a well-thought-out clause, guarding the garden with a fair yet firm resolve. White and Red.
Above: ChatGPT’s description of the Apache-2.0 License. In the prompt, we also asked it to provide colors.
Experimenting and finalizing
The whole idea behind our piece sounds really simple, right? We thought so, too until we had to make art happen! Carlos still experimented and played with other generative AI tools, their style, the background, and their different combinations of tools to find the best images. The results were then upscaled to improve the resolution of the final result. As the next step, Stefan cleaned the images, removing any background element that wasn’t part of the flower. Very few of the breadth of AI tools available for this task did a good job. After some time cleaning, we also added signatures to the images to make them look even more like drawings. To do that, we used the simple yet incredible https://www.calligrapher.ai/, and of course, each flower received its own unique signature style matching its character.
The next (and last!) step was to bring the images to reality, which turned out to be more challenging than expected. We first selected five notebooks for each license (still trying to match their character to the notebook type); all were in A5 format except for the MIT license, which would be in A4 given its wide popularity (see Cyril’s results above). Now the final question. How can we print the flower in the notebooks? Here is where Carlos' DIY magic happened. Due to his incredible handiness, he could remove notebook pages and bind them on top of a normal A4 printer paper that would go in a classic laser printer. The printer would then print on top of the notebook page instead of the printer paper. And just like a surgeon, Carlos seamlessly reattached the pages back to their original notebooks. The work was done, but one last piece was missing…
The surprise
Throughout this article, we have not mentioned our piece’s last flower, which is by far the most popular and the only one not included in a notebook. Instead, it was printed on A3 paper and put on an easel in the physical exhibition, highlighting its dominance. As seen below,
The flower represents all the code with no license.
It is colored black symbolizing the absence of permissions.
There are no branches because nothing can be created from it.
In our physical print, the flower is also missing a signature, like a forgotten species.
As a special thank you for reading through the article, find the flower WITH its signature below (we find it very telling!) together with all the others. Find all these pictures and more details in our Zenodo entry for License Flowers.
And remember, whether you’re sharing code or other materials, use those beautiful licenses.