Blogs

Google storage handling with Git and Jenkins

Category
Software development
Google storage handling with Git and Jenkins

Documentation versioning and metadata connected to it can change day to day in most of the businesses. For example, we can have functional documentation of applications that has to be changed, or even a big data project where you have to change a large amount of metadata, and all that across multiple storages.

Google storage handling with Git and Jenkins

Modern businesses will handle and store their documentation on some of the well known cloud storages. Adding, updating or removing such documents and changing its metadata can be tedious and error prone work when done manually.

Let’s imagine a project where you have to change data versioning, data security parameters or even just a data change date or any other parts of documentation. Moreover, you have to make sure change is made across different storages.  

In this blog I provide a solution on how to effectively automate document metadata handling and the whole process around it using Google Storage, some shell scripts, Git and Jenkins CI tool.

Tooling introduction

Google storage (GS) is a service for storing any type of file in Google Cloud – that can be image, document, zip file, or any other. According to the hierarchy of GS, files are stored inside buckets, buckets are combined with a project and project is grouped within an organisation. Besides storing files with the basic metadata (identified properties for objects – date and time, type of file, permissions, etc.), GS also offers creating custom metadata which will be crucial for this demonstration.

Jenkins is an open source automation server that is used for building, testing and deploying parts of the software program. What I like most about it is that it’s quite user friendly and there are a lot of manuals and examples on the internet so it’s not hard to figure out how it works.

Everyone has heard of Git, so I don’t have anything special to say about it that you don’t already know. For this purpose just the basic features of Git will be used.

Why to build an automated process?

The goal is to create an application which can read and download documents. 

During application design we distinguished two types of people: developers and testers. Therefore, we will distinguish the environments in which they work. 

For developers, we will simply use the DEVenv. On the other hand for testers we will use QAenv

Accordingly, we create two GS buckets. One will be for DEVenv, the developers bucket. And another for QAenv, the QAs bucket. Both buckets contain the identical files with the same metadata. 

Each file and its corresponding metadata are added manually to the GS bucket. In any case, if there is a need for any change, we have to do that manual and boring routine of: search for that file on the GS and then make a change in three mouse clicks – open object overflow menu (), choose edit metadata, add new metadata key value pairs and press save. Moreover, we can have hundreds of such documents that we constantly delete and put manually, first on one and then on the other GS bucket. Doing this as a manual job, with a large number of files is very prone to errors. In other words – very hard and tedious work.

Building an automated process of adding files on Google Storage

First we create a bucket inside the GS for DEVenv and call it dev-storage. Then we do the same for the QAenv and call it qa-storage

Next step is to  make a directory within the Git repository named google-storage-handler.

Now, let’s have two new directories within the project directory – backup and files. The backup directory is used to store copies of documents with correspondent metadata from the GS bucket. The files directory is used to store documents that will be uploaded in GS and that are the current active version of documents stored on GS.

Within the project directory we create metadata.json that stores metadata for documents. This metadata is actually what we want to have uploaded on GS.

We also have three shell scripts which will run in subsequent order: download.sh, delete.sh, upload.sh. We will continue explaining those in the next section.

JSON file metadata.json have stored metadata values like:

{
	filename_key: {
	category: “ ”,
	title: “ ”,
	lang: “ ”
}
}

Reading JSON files inside shell script is possible by installing jq:

  • for installation using brew  – brew install jq
  • for Linux (using apt-get) users – apt-get install jq

Using jq we can simply process JSON. For example in our scripts we will use it in combination with sed to replace metadata in JSON files. More on these tools can be found on these links: jq manual and sed

The scripts

We use download.sh script to make a new directory inside backup directory named with current date and time – this is our copy of files from GS at the present time

mkdir -p -- "$(date +%Y-%m-%d\ %H:%M:%S)"

Next, sort all directories inside backup directory to get the newest one, and save all files from GS inside new added directory with command

gsutil -m cp -R "gs://$1/" "$fullpathtonewdirectory"

And at the end of script, we go through new added directory and for each new added document from GS (doc) save correspondent metadata with command

gsutil ls -L "gs://$1/${doc}" >> "${filename}.metadata"

In delete.sh new files are looped through and corresponding items on GS bucket that share the name of a file are deleted

gsutil rm "gs://$1/${filename}"

Last script, upload.sh, checks files directory and adds all documents in array, then loops through array and adds each file on GS bucket with command

gsutil cp "files/${filename}" "gs://$1/${filename}"

For each document added on GS, it will add three metadata items – category, title, lang. Variable filename_key contains the name of a file that is represented inside metadata.json as a primary key. Subkeys containing final metadata values are category, title and lang. Commands for adding metadata are

gsutil setmeta -h "x-goog-meta-Category:$((jq ".${filename_key} | .category" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Title:$((jq ".${filename_key} | .title" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Lang:$((jq ".${filename_key} | .lang" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"

In the example above, we use an input variable indicating for which bucket the script will run – dev-storage or qa-storage.

Running with Jenkins CI

Finally,  let’s use Jenkins in order to run all these scripts automatically. We are just a few steps from getting there.

First thing to do is adding a new Jenkinsfile within the project directory for Jenkins and configure Jenkins deployment.

Setting up deployment pipeline with parameters

stage('Setup deployment') {
 steps {
   script {
     echo "Start editing files for google storage handler"
     targetEnv="${deployEnvironment}"
     if (targetEnv == 'dev') {
       credsId = "credential for google storage bucket on dev-storage"
       bucketName = "dev-storage"
     } else {
       credsId = "credential for google storage bucket on qa-storage"
       bucketName = "qa-storage"
     }
   }
 }
}

Running scripts in order (download.sh, delete.sh, upload.sh) – stage is the same for the remaining two scripts only by modifying the download.sh to delete.sh and to upload.sh

stage('Backup google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
       sh download.sh ${bucketName}
    """
   }
 }
}
stage('Clear google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
       sh delete.sh ${bucketName}
    """
   }
 }
}
stage('Transfer files directory to google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
         sh upload.sh ${bucketName}
      """
   }
 }
}

And also for our new Jenkins pipeline we can use a dropdown menu for choosing parameter like this:

Google storage handling with Git and Jenkins 02

Regarding pipelines, usually they are built by developers because project managers do not have access to such a tool nor do they know how to use it. So, when new documents are added to the project, developers by order of the project manager build storage and then documents are added to the application as well.

Conclusion

To sum up, we can conclude that nowadays with so many automation tools, like ones described in this blog or any others, a lot can be done to save time and some nerves by not  having to manually do the work  that is very prone to human error.

I hope this blog has found you well, that it was helpful and that something new has been learned!

Next

Blog

What’s love got to do with it? 

Company

Our people really love it here

How it all started

Est. in 2014., gathering eight employees with eyes set on the future. No matter how set they were, they couldn’t predict the success and extent of growth that would ensue. Today there are more than 100 of us, and people are here to stay.

Stability in unstable times

The turmoil of 2020 caused great inconvenience for people all over the world. However, this did not affect our business. Quite the opposite — we not only kept all jobs and salaries intact, but we also grew in size. And we keep expanding. 

Contact

We’d love to hear from you