License Flowers | Art and AI at SDSC

By
Stefan Milosavljevic
February 21, 2024
Share this post

Once upon a time…

There were three data engineers: Carlos, Cyril, and Stefan. As data engineers, their job was to make the lives of people analyzing data easier. Since they were all part of the Open Research Data team, they generally worked with open-source code, open data, and online material created by the community. Can they use any code they find online? No! For code to be re-usable by others, it must come with a license that defines their rights and permissions. Without a license, only the code's original author is legally allowed to (re)use it.

The significant role of open source licenses led Carlos, Cyril, and Stefan to wonder one day:

How many public code repositories have a license associated with them?

Due to the number of repositories and the diversity of platforms used to share the code, it’s challenging to have a proper answer, but in order to get an idea of how popular licenses are, Cyril suggested using the paperswithcode data: a collection of publications focused on machine-learning bundled with their code. Using this resource, we got new insights on the most beloved licenses in the data science community and found out that only 50% of resources had a license allowing re-usability.

Part of the result of Cyril’s code looking at ~140’000 code repositories from paperswithcode. We found out that only ~50% of the repositories have a license. These results helped us select a license subset for our final piece. The whole code and its description are available here.

From an idea to an opportunity - the big plan

Like many interesting small ideas, this was not a priority and was soon forgotten… That was until we received an email about the AI+Art exhibition at the AI+Summit 2023. This was hugely important as we could finally dedicate time to a small project that we could potentially share with the AI community: a clear win-win. The excitement was high, and looking back now, our first idea was enormous.

Imagine an infographic that feels alive: a picture of a garden representing the code landscape with various beautiful flowers, each representing a unique license. The size of the flower would reflect the popularity of a given license. To make it feel alive, we created a short looping video where some breeze would move the flowers, and flowers would grow or shrink by tracking the number of licenses over time from paperswithcode.

We quickly realized that the whole idea was too much for the little time we had to dedicate. However, just like taming a wild tree, we started pruning off unnecessary branches, trimming complicated ideas, and clearing the path until we reached the core.

During the re-planning process, we considered having one big license flower, with each petal showing the popularity of each license. Carlos did the drawing seen here, and we used it as a seed image for our final piece of art.

The core

Imagine open-source software development as an ecosystem of living organisms with code as genetic information. For this ecosystem to thrive, pieces of code are used, adapted, shared, or refactored, depending on its license. […] If code is like DNA, then licenses are like organs defining whether codes are compatible and fertile. Similar to different species, some licenses are more promiscuous than others; therefore, they follow different reproductive strategies. 
Flowers perfectly encapsulate our metaphor as they are a plant’s reproductive organs and are  also used to represent fertility in humans.

With this description, we wanted to showcase the beauty of code licenses by linking them to nature, specifically flowers. Carlos shared with us Dear Data, an amazing project by Georgia Lupi and Stephanie Posavec, who try to visualize data in highly creative ways. Inspired by this project, we wanted to combine flowers with hand-made unique visualizations, so we chose an “old-school” approach: old biology books. We also agreed to use AI to draw these flowers, and as Carlos was already familiar with it, we chose Midjourney, an AI tool for image generation that allowed us to use a “seed” image as a reference drawing. However, we were missing the most important aspect: what do we use as a prompt for Midjourney? License texts? Summarized legal text? Boring! This is when our key idea dawned on us: we can let AI describe licenses as flowers, and that’s how magic happened.

Apache-2.0 License Flower: With a bold stance, its vibrant red petals unfurl like a seasoned lawyer, each layer revealing a well-thought-out clause, guarding the garden with a fair yet firm resolve. White and Red.

Above: ChatGPT’s description of the Apache-2.0 License. In the prompt, we also asked it to provide colors.

Experimenting and finalizing

The whole idea behind our piece sounds really simple, right? We thought so, too until we had to make art happen! Carlos still experimented and played with other generative AI tools, their style, the background, and their different combinations of tools to find the best images. The results were then upscaled to improve the resolution of the final result. As the next step, Stefan cleaned the images, removing any background element that wasn’t part of the flower. Very few of the breadth of AI tools available for this task did a good job. After some time cleaning, we also added signatures to the images to make them look even more like drawings. To do that, we used the simple yet incredible https://www.calligrapher.ai/, and of course, each flower received its own unique signature style matching its character.

Image generation test examples by Carlos. For the pictures on the left, ChatGPT’s full description of the MIT license was summarized before being used as an input for Midjourney. For the picture on the right, ChatGPT’s full description was used directly as input for Midjourney.
Signatures created by Stefan using calligrapher.ai. From top to bottom, left to right: BSD 3-Clause, Creative Commons (CC), MIT, GPL 3.0, and Apache. BSD 3-Clause and MIT are relaxed compared to the better-defined Apache and GPL. Creative Commons is the artsy one.

The next (and last!) step was to bring the images to reality, which turned out to be more challenging than expected. We first selected five notebooks for each license (still trying to match their character to the notebook type); all were in A5 format except for the MIT license, which would be in A4 given its wide popularity (see Cyril’s results above). Now the final question. How can we print the flower in the notebooks? Here is where Carlos' DIY magic happened. Due to his incredible handiness, he could remove notebook pages and bind them on top of a normal A4 printer paper that would go in a classic laser printer. The printer would then print on top of the notebook page instead of the printer paper. And just like a surgeon, Carlos seamlessly reattached the pages back to their original notebooks. The work was done, but one last piece was missing…

The surprise

Throughout this article, we have not mentioned our piece’s last flower, which is by far the most popular and the only one not included in a notebook. Instead, it was printed on A3 paper and put on an easel in the physical exhibition, highlighting its dominance. As seen below,

The flower represents all the code with no license.

It is colored black symbolizing the absence of permissions.

There are no branches because nothing can be created from it.

In our physical print, the flower is also missing a signature, like a forgotten species.

As a special thank you for reading through the article, find the flower WITH its signature below (we find it very telling!) together with all the others. Find all these pictures and more details in our Zenodo entry for License Flowers.

And remember, whether you’re sharing code or other materials, use those beautiful licenses.

About the author

License Flowers | Art and AI at SDSC
Stefan Milosavljevic
Biomedical Data Engineer

Stefan has a background in Biology and decided to move towards evolutionary bioinformatics for both his MSc and PhD.Over the years, he developed a passion for the entire data analysis process: from collecting data, to analyzing and presenting results. Presentations, particularly opportunities for public speaking, are activities he enjoys since he values communication a lot. In order to follow this passion and deepen his knowledge on systems to collect and manage data, he joined SDSC in 2023 as a Biomedical Data Engineer.Outside work, Stefan is an avid reader of sci-fi books (but not only!), enjoys swimming, running, and biking both competitively and casually and enjoys plenty of activities with friends, especially when beer is involved.

Share this post

More blog posts

March 6, 2024

RAvaFcast | Automating regional avalanche danger prediction in Switzerland

RAvaFcast | Automating regional avalanche danger prediction in Switzerland

RAvaFcast is a data-driven model pipeline developed for automated regional avalanche danger forecasting in Switzerland. It combines a recently proposed classifier for avalanche danger prediction at weather stations with a spatial interpolation model and a novel aggregation strategy to estimate the danger levels in predefined wider warning regions, ultimately assembled as an avalanche bulletin.
Blog

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!