Google storage handling with Git and Jenkins

Documentation versioning and metadata connected to it can change day to day in most of the businesses. For example, we can have functional documentation of applications that has to be changed, or even a big data project where you have to change a large amount of metadata, and all that across multiple storages.

Modern businesses will handle and store their documentation on some of the well known cloud storages. Adding, updating or removing such documents and changing its metadata can be tedious and error prone work when done manually.
Let’s imagine a project where you have to change data versioning, data security parameters or even just a data change date or any other parts of documentation. Moreover, you have to make sure change is made across different storages.
In this blog I provide a solution on how to effectively automate document metadata handling and the whole process around it using Google Storage, some shell scripts, Git and Jenkins CI tool.
Tooling introduction
Google storage (GS) is a service for storing any type of file in Google Cloud – that can be image, document, zip file, or any other. According to the hierarchy of GS, files are stored inside buckets, buckets are combined with a project and project is grouped within an organisation. Besides storing files with the basic metadata (identified properties for objects – date and time, type of file, permissions, etc.), GS also offers creating custom metadata which will be crucial for this demonstration.
Jenkins is an open source automation server that is used for building, testing and deploying parts of the software program. What I like most about it is that it’s quite user friendly and there are a lot of manuals and examples on the internet so it’s not hard to figure out how it works.
Everyone has heard of Git, so I don’t have anything special to say about it that you don’t already know. For this purpose just the basic features of Git will be used.
Why to build an automated process?
The goal is to create an application which can read and download documents.
During application design we distinguished two types of people: developers and testers. Therefore, we will distinguish the environments in which they work.
For developers, we will simply use the DEVenv
. On the other hand for testers we will use QAenv
.
Accordingly, we create two GS buckets. One will be for DEVenv
, the developers bucket. And another for QAenv
, the QAs bucket. Both buckets contain the identical files with the same metadata.
Each file and its corresponding metadata are added manually to the GS bucket. In any case, if there is a need for any change, we have to do that manual and boring routine of: search for that file on the GS and then make a change in three mouse clicks – open object overflow menu (⋮), choose edit metadata, add new metadata key value pairs and press save. Moreover, we can have hundreds of such documents that we constantly delete and put manually, first on one and then on the other GS bucket. Doing this as a manual job, with a large number of files is very prone to errors. In other words – very hard and tedious work.
Building an automated process of adding files on Google Storage
First we create a bucket inside the GS for DEVenv
and call it dev-storage
. Then we do the same for the QAenv
and call it qa-storage
.
Next step is to make a directory within the Git repository named google-storage-handler
.
Now, let’s have two new directories within the project directory – backup
and files
. The backup directory is used to store copies of documents with correspondent metadata from the GS bucket. The files directory is used to store documents that will be uploaded in GS and that are the current active version of documents stored on GS.
Within the project directory we create metadata.json that stores metadata for documents. This metadata is actually what we want to have uploaded on GS.
We also have three shell scripts which will run in subsequent order: download.sh, delete.sh, upload.sh
. We will continue explaining those in the next section.
JSON file metadata.json
have stored metadata values like:
{
filename_key: {
category: “ ”,
title: “ ”,
lang: “ ”
}
}
Reading JSON files inside shell script is possible by installing jq
:
- for installation using brew –
brew install jq
- for Linux (using apt-get) users –
apt-get install jq
Using jq
we can simply process JSON. For example in our scripts we will use it in combination with sed
to replace metadata in JSON files. More on these tools can be found on these links: jq manual and sed.
The scripts
We use download.sh script to make a new directory inside backup directory named with current date and time – this is our copy of files from GS at the present time
mkdir -p -- "$(date +%Y-%m-%d\ %H:%M:%S)"
Next, sort all directories inside backup directory to get the newest one, and save all files from GS inside new added directory with command
gsutil -m cp -R "gs://$1/" "$fullpathtonewdirectory"
And at the end of script, we go through new added directory and for each new added document from GS (doc) save correspondent metadata with command
gsutil ls -L "gs://$1/${doc}" >> "${filename}.metadata"
In delete.sh new files are looped through and corresponding items on GS bucket that share the name of a file are deleted
gsutil rm "gs://$1/${filename}"
Last script, upload.sh, checks files directory and adds all documents in array, then loops through array and adds each file on GS bucket with command
gsutil cp "files/${filename}" "gs://$1/${filename}"
For each document added on GS, it will add three metadata items – category, title, lang. Variable filename_key contains the name of a file that is represented inside metadata.json as a primary key. Subkeys containing final metadata values are category, title and lang. Commands for adding metadata are
gsutil setmeta -h "x-goog-meta-Category:$((jq ".${filename_key} | .category" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Title:$((jq ".${filename_key} | .title" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Lang:$((jq ".${filename_key} | .lang" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
In the example above, we use an input variable indicating for which bucket the script will run – dev-storage
or qa-storage
.
Running with Jenkins CI
Finally, let’s use Jenkins in order to run all these scripts automatically. We are just a few steps from getting there.
First thing to do is adding a new Jenkinsfile within the project directory for Jenkins and configure Jenkins deployment.
Setting up deployment pipeline with parameters
stage('Setup deployment') {
steps {
script {
echo "Start editing files for google storage handler"
targetEnv="${deployEnvironment}"
if (targetEnv == 'dev') {
credsId = "credential for google storage bucket on dev-storage"
bucketName = "dev-storage"
} else {
credsId = "credential for google storage bucket on qa-storage"
bucketName = "qa-storage"
}
}
}
}
Running scripts in order (download.sh, delete.sh, upload.sh) – stage is the same for the remaining two scripts only by modifying the download.sh to delete.sh and to upload.sh
stage('Backup google storage bucket') {
steps {
withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
sh "gcloud auth activate-service-account --key-file=${keyjson}"
sh """
sh download.sh ${bucketName}
"""
}
}
}
stage('Clear google storage bucket') {
steps {
withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
sh "gcloud auth activate-service-account --key-file=${keyjson}"
sh """
sh delete.sh ${bucketName}
"""
}
}
}
stage('Transfer files directory to google storage bucket') {
steps {
withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
sh "gcloud auth activate-service-account --key-file=${keyjson}"
sh """
sh upload.sh ${bucketName}
"""
}
}
}
And also for our new Jenkins pipeline we can use a dropdown menu for choosing parameter like this:

Regarding pipelines, usually they are built by developers because project managers do not have access to such a tool nor do they know how to use it. So, when new documents are added to the project, developers by order of the project manager build storage and then documents are added to the application as well.
Conclusion
To sum up, we can conclude that nowadays with so many automation tools, like ones described in this blog or any others, a lot can be done to save time and some nerves by not having to manually do the work that is very prone to human error.
I hope this blog has found you well, that it was helpful and that something new has been learned!