Restructuring CentOS Container Pipeline using OpenShift - Part 1

In this post I’m going to talk about why we are restructuing the CentOS Container Pipeline service and how OpenShift is key to it. There’s bunch of material available on the Internet so I don’t really need to write about OpenShift.

I gave a talk at about our service and got some great feedback. Most common feedback was that such a service would be immensely useful to the opensource community. But deep down, I knew that it’s not possible to scale the current implementation of service to serve a large community. It needed a rejig to be useful to the users and not a pain in bad places for its administrators! 😉

What does the service do?

Before I talk about the issues and how we’re handling them in new implementation, I’ll quickly jot down the features of the service.

  • Pre-build the artifacts/binaries to be added to the container image
  • Lint the Dockerfile for adherence to best practices
  • Build the container image
  • Scan the image for:
    • list RPM updates
    • list updates for packages installed via other package managers:
      • npm
      • pip
      • gem
    • check integrity of RPM content (using rpm -Va)
    • point out capabilities of container created off the resulting image by examining RUN label in Dockerfile
  • Weekly scanning of the container images using above scanners
  • Automatic rebuild of container image when the git repo is modified
  • Parent-child relationship between images to automatically trigger rebuild of child image when parent image gets updated
  • Repo tracking to automatically rebuild the container image in event of an RPM getting updated in any of its configured repos

Issues with the old implemention

Our old implementation of service has a lot of plumbing. There are workers written for most of the features mentioned above.

  • Pre-build happens on CentOS CI (ci.c.o) infrastructure. In this stage, we build the artifacts/binaries and push it to a temporary git repo. The job configuration then uses this git repo to build container images while another job on ci.c.o keeps looking for update in the upstream git repo.

  • Lint worker runs as a systemd service on one node.

  • Build worker runs as a container on another node and triggers a build within an OpenShift cluster.

  • Scan worker runs as a systemd service and uses atomic scan to scan the containers. This in turn spins up a few containers which we need to delete along with their volumes to make sure that host system disk doesn’t get filled up.

  • Weekly scanning is a Jenkins job that checks against container index, and underlying database of the service before triggering a weekly scan

  • Repo tracking works as a Django project and heavily relies on database which we have almost always failed to successfully migrate whenever the schema was changed.

All of the above is spread across four systems which are quite beefy! Yet, we couldn’t manage to do parallel container builds to serve more requests. A couple of teams evaluated our project to bring up their own pipeline because they didn’t want to use public registry. However, they found the service implementation too complex to understand, deploy, and maintain!

How are we handling (or planning) things in new implementation?

In the new implementation of the service which is still to be moved under the official repo, we are using OpenShift exclusively for everything, for every feature that the service provides.

Although we’re far from done, we have successfully implemented and tested that these features work fine in an OpenShift cluster:

  • Pre-build
  • Lint
  • Build
  • Scan
  • Weekly scan

We’re relying heavily on the OpenShift and Jenkins integration. Every project in the container index has an OpenShift Pipeline of its own in the single OpenShift project that we use. All of the implemented features work as various stages in the OpenShift Pipeline.

For logging, we’re using the EFK (Elasticsearch - Fluentd - Kibana) integration in OpenShift. To be honest, we’re still learning how to use Kibana!

This new implementation hasn’t been deployed in a production environment yet. However, it’s relatively straightforward to deploy than the old implementation and can even be deployed on a minishift environment.

In progress items

We are still working on things like:

  • proper CI (unit and functional tests for the code and service)
  • setting up monitoring and alerting stack using OpenShift’s integration with Prometheus
  • providing useful information to the users in the emails that are sent after every build and weekly scan
  • rewrite the entire repo tracking piece to make it as independent of database as we can

That’s it

This blog went on to be longer than I wanted it to be! But it’s a gist of what we’ve been doing as CentOS Container Pipeline team since past few months. In coming posts, I’ll be talking about individual implementation details!

Until next time… 😄

Talking Containers at IIT Gandhinagar

Series: Speaking

I’m an engineering graduate from India. IITs are the best engineering institutes in India. NITs follow the suite. Although some might want to debate, that’s not the point of this post. I pursued my engineering education from a non-IIT and non-NIT institute. I think most engineering graduates in India have dreamed of being in IIT/NIT and I’m no exception. Studying at IIT was my dream as well but, it’s not an easy feat to get into one.

So when I got an opportunity to speak at IIT Gandhinagar, I was obviously excited. Topic was containers. And the expected audience was extremely dispersed. I was expected to address under-grad, post-grad and doctarate students from Computer Science and Electrical Engineering.

Considering the vast audience, I decided to start from the beginning about how containers evolved, how they compare with VMs, how Docker popularized them, how Kubernetes took over the orchestration world and how containers have made a lasting impact on the development and deployment story. I also discussed about monoliths and microservices.

The Talk

First, the slides:

I started with a show of hands to know how many in the audience had heard of or played with containers. Surprisiginly there were just about 2-3 people. Total audience was around 25-30, I guess. So my decision to start from how it all began proved to be right.

As I went ahead with the slides, I got some questions around the slide that talked about Docker commands.

Although there weren’t many questions around the rest of the slides, it did seem like the topics mentioned in the slides were likely unheard of earlier by the audience.

At the end of the talk, one of the PhD students told me that she found my talk interesting+informative but she’s not much into the architecture stuff. At that time, I was a bit confused about why she said so and didn’t get a chance to talk further. However, I don’t think my talk was around computer architecture. It did talk about the microservices pattern but due to lack of time, I didn’t dive deep into it.

Informal discussions with faculties

I went to IIT Gandhinagar upon invitation from one of the faculty members. And two more faculty members attended my talk. I got an opportunity to interact with three faculties and the discussions were pretty interesting. Having talked at few other colleges, I felt that, in general, challenges they face are pretty much the same.

General opinion of the faculties was that most students weren’t really aware of much of the things happening in the world of containers. One of the surprising discussion was around how students felt that it was more rewarding to work on making web/mobile apps than working on Operating Systems, Networking, and other things that keep you closer to the hardware. Having worked with people doing both sort of things, I certainly feel like working on the latter is rewarding as well. Sure the hype is around making next breakthrough (read multi-million dollar) app but, the core things that make them possible are seeing development as well!

We even planned on having a workshop to have hands-on about containers and developing an app based on microservices pattern. Idea of using OpenShift to do CI/CD, build images and eventually deploy the code seemed interesting to the faculty members.

Above all, and probably the best part, the faculties were more than willing to host community meetups at IIT Gandhinagar campus. I’m really looking forward to hosting a meetup at IIT Gandhinagar in near future! I’m sure many engineers would be excited to visit an IIT. 😉

Mistakes on my part

  • I didn’t manage to get a single picture of the talk.
  • I forgot to bring any laptop stickers for the students.

That’s it

Being a non-IIT grad, it was surely an exciting experience to be speaking to IIT students about the technologies that are changing the landscape. IIT Gandhinagar’s location is pretty amazing and far, far away from any sort of pollution one can think of - hence my kind of place. 😉

Until next time… 😄 2018 and my Europe visit

Series: Speaking

This year’s was my first time at DevConf and also my first time speaking at a conference. It was attended by about 1500+ people! That’s massive, isn’t it? I was super excited from the moment my talk was accepted.

Note: This is not an event report. It’s more of an informal description about my experience attending the conf and visiting Europe!

Giving the talk

I was speaking about CentOS Containr Pipeline - the project I have been working on since some time at Red Hat. So far we had been quietly working on developing the service but at, we wanted to talk about it and tell more people that we’re ready to on-board them to our service. In a nutshell, motto of our project is to help open-source projects build and update their container images so that they can focus on developing awesome stuff instead. 😉

Since it was my first time speaking at a conference, I was obviously nervous. To top that, I saw some really well versed fellows in the list of people interested in attending my talk. I was excited and nervous at the same time! To my surprise, my talk managed to get pretty high in the list of popular talks. At its peak, it was on 13th spot before sliding to 21st spot on the popular talks page. There were about 250 talks in all.

Except going a bit fast, I think it went good. One of the tips I received for the talk was to “learn to breathe while speaking.” 😆

Here’s my talk that explains what we provide to any open-source project that joins us:

I got some interesting questions at the end of the talk and, in general, people loved our work. Few people even wanted to do an on-premise installation of our service. It’s out there on GitHub so practically anyone can do that but, we’ve not made effort in deploying it beyond our infra so, YMMV! 😆

People interactions

Besides the talk, I had a great time talking with people. I’m more of an introvert but I love talking with people. It’s just the starting a conversation is extremely difficult for me! So giving a talk sort of helped me have a topic to talk with at least those people who attended it (my talk had about 100 attendees.) Besides that various colleagues whom I knew only by name were also present at the conf.

At the end of the conference, I think I socialized about 3 months worth of my average socilzation in just 3 days! Since I love talking with people and giving talks (doing meetup talks on regular basis) this was most positive takeaway from the trip!

Travelling Europe

If you have seen the iconic Bollywood movie “Dilwale Dulhania Le Jayenge”, you would know that most of us in India are really fascinated with the idea of traveling to Europe. 😉

I traveled to a few cities on the sidelines of conference. My friends who had been there earlier had warned me about how cold it might get in the part of Europe I was visiting. Fortunately, it didn’t get so cold and I ended up loving the weather instead! The part of world where I live, we don’t feel winters colder than around 10 C. Even then, noons generally end up getting warmer.

But in Europe, it was cold by my standards for the entire day. And I absolutely loved it. To be honest, I was not excited about cold as it might seem from the post. I was planning to cut my trip short and come back to India just because the weather conditions seemed so scary to me (my team’s still making fun of me for this). Glad I didn’t make that change.

Properly laid out public transport system in all the cities I visited (Brno, Prague, Budapest and Vienna) was a surprise to me. The accuracy of Google Maps in India is somewhat flaky at times but in Europe, it was the most used and relied upon app for me. I was shocked when Google Maps even pointed that a tram/metro was running a minute or two late than it’s scheduled time.

European architecture has always fascinated me. It’s elegant and beautiful in its own way. Places like Prague Castle, Parliament building in Budapest, and your average buildings across the city left me dumbstruck on multiple occassions.

History is another reason why I was always so fascinated about visiting Europe. From what I could gather during this short and haphazard trip, World War II has left a lasting impact on Europe. I did study about it during my schooling but it was from Indian point of view. In Europe, you get to know that even after seven decades, World War II is stil among the most crucial events in history.

European countryside is something you normally keep wallpapers of (because wallpapers of Maria Sharapova, Yami Gautam and the likes are distracting.) I didn’t get to explore much of European countryside during the visit. And since it was winter, I didn’t feel disappointed about it. Had it been European summer, I’d have been cursing instead. This is something I want to be able to do if I get a chance to visit Europe again.

That’s it

Okay, I need to stop somewhere. My trip to the conference and Europe in general was filled with lots of experiences and if I try to write it all down, it’ll get too long.

Knowing more about European history by visiting various memorials, palaces, churches, etc. is one thing I wish I had done better. Tasting different beers is something I did well on my first visit already. Realizing I could enjoy cold weather (not Antarctic cold!) was a pleasant surprise! Being a pure vegetarian guy who doesn’t even eat eggs was a major pain in the ass. Meeting so many awesome fellow Red Hatters was amazing.

Hopefully, I’ll be writing about DevConf 2019 and maybe FOSDEM as well in about a year’s time. 😉

Until next time… 😆

Setup a 2-node OpenShift cluster

Recently I started working more than usual on OpenShift. I’ve contributed to the minishift project and also spoke about OpenShift at a local meetup. But now we’re planning to move to microservices architecture based on OpenShift.

I use CentOS’s DevCloud infrastructure to setup test instances. And it is on the same infra that I brought up my first real OpenShift cluster deployed using openshift-ansible. I used this hosts file along with the playbook available under byo directory for OpenShift 3.6.1. It’s a 2-node cluster where one system behaves as master and other as a node.


For a smooth installation and setup, I had to ensure a few things like:

  • Install NetworkManager and firewalld on both the nodes.

  • Start NetworkManager and firewalld manually on both the nodes. To get this working, I had to set SELinux to permissive. I didn’t dig much into this but I think with proper context, I could have got it working in Enforcing mode as well.

  • Ensure that /var partition has about 40 GB of free space. I think the requirement is 15 GB on master and 40 GB on node. But I ensured 60 GB of space for /var on both nodes.

  • Modify /etc/hosts file on both nodes so that they are able to access each other by their hostnames.

  • Install the RPM python-rhsm-ceritificates so that Ansible can pull an image from

After finishing these steps, perform the installation:

$ ansible-playbook openshift-ansible/playbooks/byo/config.yml

I did it off the release-3.6 branch.

Post-install setup

Add a new user

Having used simple OpenShift cluster in past, it was a bit of struggle this time to get into the OpenShift console. minishift does this very nicely so that end user doesn’t have to bother.

It also took me some time to add a user to the cluster. I probably couldn’t find my way around the huge OpenShift documentation.

We just need to execute below as system:admin user:

$ oc create user dharmit --full-name="Dharmit Shah"
$ htpasswd /etc/origin/master/htpasswd dharmit

# and if you want the user to have cluster-admin privileges
$ oadm policy add-cluster-role-to-user cluster-admin dharmit

This user can now login to the OpenShift web console using the credentials you’ve just set. It took me really long to get up to this step!

Deploying an image to OpenShift cluster

Now I wanted to run a simple beanstalkd image on OpenShift cluster. All I did was use to build an image and pull it. After successfully logging in and navigating to the project page, OpenShift shows you an option at the top called “Add to Project”. I chose “Deploy Image” option from this and gave the image name to be deployed.

OpenShift automatically created a DeploymentConfig and created a service based on the metadata of the image. It also provides the name of the service. This name can be used by other objects in the same project to access beanstalkd seamlessly!

Still learning

I’m still learning OpenShift through the docs. The Persistent Volumes concept has been interesting! I’m working on creating an architecture wherein various workers can run in tandem as containers and read/write things to a remote NFS server configured as a PV.

There’s lot to learn and do before we can achieve a microservices based architecture. I’ll try to keep this space udpated. 😉

An Extra Mile

Last week, most of my time was occupied in debugging an issue whose cause was not very obvious when we hit it. A teammate of mine opened a pull request that failed CI check every single time. He updated the PR about four times but, CI never felt satisfied. Finally when there was no more code to add to the PR and CI seemed stubborn, there was a need to dig deeper into it.

A few points (disclaimers, maybe):

  • Our project’s CI check is not very comprehensive. It’s mostly a bunch of functional tests that were six months back with then master branch as base for tests.

  • There are no unit tests.

  • CI check prints a lot of logs and does that only when the check fails. When everything’s OK, it doesn’t print more than basic information. When something fails assertion checks, it prints logs like crazy:

    • logs from Ansible provisioning
    • logs from all workers in our service
    • nothing in a structured manner
  • Tests were written by a single developer with minimal input from others in the team. Although that developer did a great job, we didn’t add more than a few tests or refactor the existing ones.

I had played almost no role in coming up with the tests. So when I had to go figure why CI checks were failing, I had to learn from scratch about how the current setup worked.

Our code-base takes good deal of time to get deployed in test environment. We use Ansible playbook for deploying things in dev, pre-prod and prod environments. Doing deployment on freshly provisioned nodes would take up to 30 minutes while deployment on existing VMs would take about 15 minutes.

After doing the deployment, I had to go to IPython shell and create a few dictionaries that were then served to the test suite. These dictionaries mainly contained IP addresses of the VMs on which tests were to be executed. Finally, the test suite, created using nosetests, was executed.

Surely, I was seeing similar results in dev environment as CI when the test suite was executed. Logs made no sense! Assertions failed and it seemed like they were getting executed before the tasks finished. All I was wondering was, how are assertions going to succeed when they check result of a task that’s still not finished? And it seemed that way to my teammates as well.

I was about to tell the team that this process is becoming frustrating and not taking us anywhere when a strange voice in my head asked me to look one more time - the extra mile!

It took me about 15-20 minutes to find error in one of the worker services. Surprisingly, the error didn’t appear in the CI logs! Nor did it appear in the log file we generate for every job that passes through the service. It might have seemed obvious that I should’ve checked the particular service right from the beginning. I didn’t do that as our team was living in an assumption that we’re spitting out logs for every service in the CI tests; specially when something fails.

Key lessons that I took away:

  • There’s immense scope for improvement in our existing CI framework.

  • We’re doing ourselves and our users a major disservice if we keep focusing on adding features and not on fixing a core part like tests.

  • Log aggregation needs to improve big time. More importantly, we need to ensure that useless actions are not logged!

  • Break down the CI check into multiple pieces. One Jenkins job can do the deployment, other to run unit tests, yet another to run functional tests, so on and so forth. If any one of the jobs fails, we stop the entire thing. That helps us save time and, more importantly, figure what piece is exactly failing.

  • Use less Python magic wherever we can. For example, Ansible playbook can be directly executed from shell instead of using Python.

  • Focus on this right away instead of waiting for a major issue to bring things down!

  • Going an extra mile is often rewarding. 😉

I wrote this blog to bring out my thoughts. In process, if it helps someone else, I’d be surprised! But if you have any suggestions or comments about how we can improve, please let me know in comments.

Until next time. 😄

Teach myself Go through a side project - Hacker News on Terminal

I have had a pretty bad track record at learning new programming language by myself, at consistently writing a blog, at continuing working on (or being excited about) the side project I’m doing. That sounds like having a bad track record at pretty much everything that matters in the world of open source and programming. 😞

It took me really long to learn Python. I remember starting from print Hello World a number of times because it had been really long since I last wrote anything in Python. I’ve been bad at coming up with side project ideas or contributing to many open source projects. It wasn’t until my job required me to write Python full time that I started getting good at it. 😏

Same goes for Go. I have been willing to learn it since mid 2015. Coincidentally, the job change I made around the time required me to learn Go along the way. But I left that job too soon and moved back into the Python world. I did contribute little something to a Go based project that I like. And, secretly, I feel happy when every release announcement of the project requires a mention about the command I contributed. But that’s pretty much it.

So what the heck am I gonna do now? Well, turns out I’m going to try yet another time. 😄

Recently I came across this tweet which resonated with me in opposite way of how it was intended to. 😉:

We all have some spare time and I figured I could use some of it to write something in Go.

So, I planned to do a tiny side project that’s not going to be of much use to anyone except me. But, it would help me learn Go - the growth I’ve wished to make to my profile. This time, I’m going to try and blog as I make progress on the project. That way, I might actually end up making a helpful contribution to myself.

The project’s hosted on GitHub and, at the moment, has nothing but a short README of planned features.

Hope to take this to a closure. Best of luck to me! 😉

Udaipur Trip 2017!

This Diwali, me and my wife decided to visit Udaipur. We started on the morning of New Year day, which is celebrated right after Diwali in Gujarat.

Day 1

mostly travel and getting trapped in city traffic

On the way, we visited a Jain tirth called Rishabhdev which seemed more of a Digambara tirth than Shwetambara tirth. The temple was nice. I had already been there once about 10+ years back. But didn’t remember much of it.

After reaching Udaipur around 5 PM, we went out to have food. And in the quest of visiting couple of places of interest, I got us trapped in city traffic near Fateh Sagar Lake. I took the car to really narrow lanes which were not well suited to drive a car through it. 😉

Day 2

we absolutely regretted visiting Udaipur

Our first full day in Udaipur started with breakfast with our hosts (we booked homestay via Airbnb).

Our first stop was Sajjan Garh Palace or the Monsoon Palace. Except the view from the top and an ambient cafe, we didn’t find anything really interesting in the palace itself. Kullad chai at the cafe was nice!

Our next stop was the Sajjangarh Biological Park which is just beneath the Sajjan Garh Palace. Park itself is pretty nice but the crowd and the time of day ensured that we wouldn’t see many animals. Those we saw, were in rest mode after their lunch. Apparently, like humans, even animals prefer having a nap after lunch. 😉

The biological park is pretty big and considering the hot weather we visited Udaipur in, it was only wise to roam around the park in a golf car.

Next we went to Gulab Bagh which is the biggest garden in Udaipur. We wanted to visit the Gulab Bagh Public Library inside the garden but it was closed. Maybe due to Diwali holidays. Gulab Bagh was nice to walk through. It’s a lush green and peaceful area in the most visited part of the city.

From Gulab Bagh, we walked to Udaipur City Palace which is probably the most visited spot in the city. The rush inside the palace museum was too much for me and my wife to handle. And there was persistent nuisance from some fellow tourists - a group of 6-7 guys. It got suffocating due to the rush and we decided to leave the palace after seeing only a couple of rooms in the museum.

Thoroughly disappointed because of the insane crowd we were seeing this day, me and my wife went back to the homestay and decided to rest the next day.

Day 3

it was surprisingly awesome!

We decided a day before to relax at the room and not go out to any tourist spots in the city. Out of habit, I woke up early and decided to read a book on Mewar history that I found at the hosts’ place.

Few pages into the book, I got really excited reading about the history. By the time my wife woke up, I had mentally decided to not relax on this day. 😄

After some discussion with our host, we decided to visit Eklingji temple because of the historical significance it has in Mewar. The temple is about 25 km from Udaipur and our host told us that it might not be as overcrowded as the city itself. We ensured to visit during the darshan timings so that we can take a look at all the parts of the temple. It turned out to be a really good decision on our part and we also got a book from outside the temple that had more historical facts about the temple detailed in it.

While checking out the nearby places around Eklingji, I found that there are two more temples which are as old as Eklingji (built in 8th century CE) or older than it. These were:

  • Sahastra Bahu temples
  • Shantinath Jain temple

While Sahastra Bahu temples were built in early 10th century CE, the Jain temple was built around 3000 years back. The history, carving and landscape of this temples completely amazed us. And they were not even 10% as crowded as the main city.

I’m glad I read up some history and ended up visiting places of historical significance. We surely witnessed some amazing architecture that has stood through the centuries.

In the evening we visited Maharana Pratap Smarak. It is a museum full of history and miniatures of Chittorgarh, Kumbhalgarh, City Palace, and battlefield of Haldighati. The view from the top of Moti Magri hill where the museum rests is also nice!

Day 4

we were left dumbstruck

While reading about Mewar’s history the previous day, I stumbled upon Jaisamand Lake. Since it was a few hours detour on the highway back to Ahmedabad, we visited this lake on the last day.

The road from Udaipur to Jaisamand lake is absolutely amazing. It’s got a lot of turnings and a terrain that rises and falls very, very often. Driving car on this road was nothing short of a pleasure!

Upon reaching Jaisamnd Lake, we were completely blown away by its expanse. It is an artifical lake and second-largest of its type in India (per some Udaipur locals, second-largest in Asia). There’s also an island in the lake which hosts a resort. Should be interesting staying there. 😉

It surprised us how they created such a huge lake so many years back! Technological advancements back then were almost inexistent yet, the constructions (lakes and palaces) from that era were surprisingly amazing.

That’s it!

Udaipur is about 4-5 hours drive from Ahmedabad. Roads are pretty good but, if you’re driving at night, beware of the potholes that show up surprisingly on the highway. There’s a good number of restaurants on the highway.

We plan to visit Udaipur again during off-season period. I’ll be updating the post with few pics of the places I’ve mentioned. Until next time! 😄

Talking Python with students

Series: Speaking

Those who know me are aware that I’ve given talks at various Meetups in the past. I’ve presented at colleges as well. But I’ve never written about it. This time, I decided to write my thoughts (in general). Also I’d write about my experience while talking at Nirma University, Ahmedabad during past two days about career options for Python programmers.

One of the reasons I’m always eager to talk at colleges/universities is that it gives me an opportunity to go back to that lively part of the world where I myself had a great time. School and college days are the days when most of us make best friends and memories. I love and enjoy interacting with the students; and, more often than not, end up learning something from their questions/feedback.

Another reason I like to go there and talk is that I see a pretty big gap between students’ knowledge and industry expectations.

General thoughts (not specific to any college/university)

There’s no shortage of engineers in India - for pretty much any domain. But most companies don’t hire because they don’t find students’ knowledge up to the mark. Students excel at academics but have very less or no clue of what’s really going on in the industry.

For example, Python is gaining momentum in Indian IT industry since past few years. And in spite of being such a beautiful and easy-to-learn programming language, numerous students are unaware of it. I’ve interacted with number of students in past few years, and more than few have mentioned that they get so scared of programming while studying C and C++ that they contemplate giving up on programming! In worse case, sometimes they even make up their mind to do MBA instead. Most, if not all, have felt that Python should be the first programming being taught in college/university instead of C. Debatable topic, nonetheless.

Having studied these programming languages myself, I agree with the notion of making Python the first programming language to be taught instead of C. It helps get students excited about programming. And then it would make more sense to teach C and C++.

Experience at Nirma University


Nirma University was my dream university as a 10+2 student. But, due to health reasons, I was bedridden for almost four months before the final (board) exams and didn’t get to work hard enough to realize the dream. Naturally I was super excited about giving a talk at Nirma! 😄

The students I interacted with were from Electronics & Communication (EC) branch and had a fractional course on Python in their fifth semester curriculum. As a part of it, I was invited as an industry person to talk to them about how Python is used in industry and what kind of jobs/roles/opportunities are open for Python programmers.

To be honest, it’s my favorite part to talk to students about the plethora of opprtunities that Python, Linux and open source in general would open up for them.

Topics I covered (link to slides):

  • History of Python
  • Which organizations use Python and how they use them
  • Detailed discussion about
    • Web Frameworks
      • How Instagram uses Django. Link to their engineering blog
    • How hot the domain of Data Science is and awesome collection of projects under the SciPy ecosystem
    • How Machine Learning and Artificial Intelligence are changing things. I recommended them to watch Person of Interest and Eagle Eye to get a better understanding. 😉
    • Python for Embedded Systems
  • How to leverage MOOC content to learn more about Python and its applications!
  • …and more

While talking about web frameworks, I explained what backend and frontend are. We take the 24x7x365 availability of our favorite apps (WhatsApp, Instagram, FB, Twitter, etc.) lightly; even, for granted. With examples, it was easy to explain them that it’s not really an easy task and it requires good deal of engineering effort. Ironicaly, WhatsApp faced glitches the same day in the afternoon and hashtag #whatsappdown was trending in India for almost rest of the day! I guess they’ll never forget frontend-backend for rest of their career. 😉

That’s it!

I’ve uploaded the slides to Slideshare. If you are one of the students from Nirma University that attended the seminar, consider providing your feedback on the form at the end of the slides.

Let me know what you think in the comments below! Until next time. 😄

Use 'git subtree' to create new repository from a sub-directory

As a part of my ongoing Ansible blog series, I plan to walk-through the provisioning code of CentOS Containr Pipeline service which lives under provisions directory.

My plan:

  • Walk-through the existing code base
  • Make changes to the code without touching my fork of the repo
  • Push and test the new code

For the second part, i.e., modifying the code without touching my fork, I needed to create a new git repository from the said provisions directory.

A bit of Google-fu and this stackoverflow answer came to rescue! All I did was, use this git subtree command to create a new branch that holds the code I’m interested in:

$ git subtree split -P provisions -b provisions-only

Here, provisions is the directory I want to create a new git repo out of and provisions-only is the branch that will hold the code that’s inside provisions directory

Create a new directory outside the directory holding my fork of the repo and do git init:

$ mkdir ~/repos/cccp-provisions
$ cd ~/repos/cccp-provisions
$ git init

And git pull from the fork to new directory:

$ git pull ~/repos/container-pipeline-service provisions-only

Here, ~/repos/container-pipeline-service is the path to the forked repo that I don’t want to touch and provisions-only is the branch in that repo that holds the code I’m interested in.

Whenever there’s a change in the big repo (container-pipeline-service) and I want to copy the changes to cccp-provisions repo, all I need to do is execute the same git subtree command I used above in the big repo, and do git pull --rebase in cccp-provisoins :

$ cd ~/repos/container-pipeline-service

$ git diff --unified=0
diff --git a/ b/
index fd32557..fb14f4f 100755
--- a/
+++ b/
@@ -0,0 +1,2 @@
diff --git a/provisions/ b/provisions/
index f956524..67f70dc 100644
--- a/provisions/
+++ b/provisions/
@@ -0,0 +1,2 @@

$ git add -A

$ git commit -m "Test"

$ git subtree split -P provisions -b provisions-only
Created branch 'provisions-only'

$ cd ~/repos/cccp-provisions

$ $ git pull --rebase ~/repos/container-pipeline-service provisions-only
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /home/dshah/repos/container-pipeline-service
 * branch            provisions-only -> FETCH_HEAD
Updating d203d40..3dc4752
Fast-forward | 2 ++
 1 file changed, 2 insertions(+)

What I did above was: modify two files; one inside the provisions directory and one outside it; committed the changes it to my master branch and using the git subtree command, created a provisions-only branch like we did earlier. When I go to the cccp-provisions directory and do git pull --rebase, git knows that only one of the two files modified in that commit is of our interest and so it changes only one file:

Fast-forward | 2 ++
 1 file changed, 2 insertions(+)

That was pretty cool for me! Now I can go ahead and walk-through the provisioning bits and also make changes to it without affecting my fork of the repo.

Ansible Series: Introducing Playbooks

Series: Ansible

In previous post, we did a combination of trivial and non-trivial tasks on the command line:

  • install httpd package on the servers in group servers,
  • modify a line in /etc/httpd/conf/httpd.conf file to set our desired value for port and
  • set the httpd service into restarted mode.

At the end of the post, I mentioned that we can automate these tasks by writing a playbook. That’s what we’ll do for this post.

Ansible Playbooks

As official documentation suggests, playbooks are Ansible’s configuration, deployment and orchestration language. They provide a very nice analogy as well:

If Ansible modules are the tools in your workshop, playbooks are your instruction manuals, and your inventory of hosts are your raw material.

With the help of playbooks, we can:

  • download and install packages
  • configure services
  • start/stop/restart server processes
  • perform rolling upgrades
  • interact with load balances and monitoring systems

Playbooks can be used for various things and can have various tasks written in them. It would be nearly impossible to cover each and every aspect of it.

In this post, we will cover its basics by creating a playbook out of the tasks we performed in last post.

Plyabooks are written in YAML format. Each playbook can have one or more ‘plays’ in it. Every play is targeted at a group of hosts (serversin our example.)

Let’s create a playbook (store it in, say, playbook.yaml file) for the tasks we performed in previous post:

- hosts: servers
      - name: Install httpd
        yum: name=httpd state=present

      - name: Change port to listen on
            path: /etc/httpd/conf/httpd.conf
            regexp: "^Listen"
            state: present
            line: "Listen {{ http_port }}"

      - name: Restart and enable the service
        systemd: name=httpd state=restarted enabled=yes

Here we have written only one play and that is for the group servers. tasks is a list of ad-hoc Ansible commmands that we want to execute on the remote hosts in the group servers. Earlier we executed these ad-hoc commands from the command line with ansible command. There are three tasks in this playbook:

  • First task uses yum module to install the httpd package.
  • Second task uses lineinfile module to replace the specific line in the file /etc/httpd/conf/httpd.conf with the one we’re interested in.
  • Third task restarts and enables the httpd server using systemd module.

Let us first uninstall httpd package from the remote systems. This is not really necessary to run the playbook but, we’re doing it to validate that all tasks we did earlier are performed as expected by the playbook.

$ ansible -m yum --args="name=httpd state=absent" servers

And now run the playbook:

$ ansible-playbook playbook.yaml

PLAY [servers]

TASK [Gathering Facts]
ok: [host1]
ok: [host2]

TASK [Install httpd]
changed: [host1]
changed: [host2]

TASK [Change port to listen on]
changed: [host2]
changed: [host1]

TASK [Restart and enable the service]
changed: [host1]
changed: [host2]

host1                      : ok=4    changed=3    unreachable=0    failed=0   
host2                      : ok=4    changed=3    unreachable=0    failed=0

See how the name we set for every task in the playbook.yaml file is shown in above output. As an aside, execute the very same command again and observe the output. Everything that shows changed in above output will instead show as ok because nothing changed in remote systems as everything was setup just a few seconds back. 😉

That’s it for this post

In upcoming posts, we’ll be creating an example application and deploying that using Ansible Playbooks. Although it won’t be as complex as most real-life applications and their deployments, it’ll give a fair idea of how Ansible can be used to deploy non-trivial applications.

If you have any feedback/suggestions, leave it in the comment section at the bottom of the post. Until next time. 😉

Ansible Series: The inventory file - variables, aliases and more

Series: Ansible

In the previous post on inventory file, we saw that inventory file is central location that stores information about the remote hosts. It’s not necessary that we would always want to deal with remote hosts. Ansible can also work on the control system (localhost).

$ cat /etc/ansible/hosts


And let’s try the ping module:

$ ansible -m ping all
localhost | UNREACHABLE! => {
    "changed": false, 
    "msg": "Failed to connect to the host via ssh: Permission denied
    "unreachable": true

} | SUCCESS => {
    "changed": false, 
    "ping": "pong"

} | SUCCESS => {
    "changed": false, 
    "ping": "pong"


Oops, that failed to ping localhost - the only ping that we would expect to work even if everything else failed. Let’s see the msg part of the output - Failed to connect to the host via ssh: Permission denied(publickey,gssapi-keyex,gssapi-with-mic).. Why would I want to connect to localhost over SSH? Let’s set ansible_connection in the inventory file:

$ cat /etc/ansible/hosts
localhost ansible_connection=local


That tells Ansible to connect to localhost using connection type local. So, no more SSH’ing into localhost:

$ ansible -m ping all 
localhost | SUCCESS => {
    "changed": false, 
    "ping": "pong"

} | SUCCESS => {
    "changed": false, 
    "ping": "pong"

} | SUCCESS => {
    "changed": false, 
    "ping": "pong"


Voila! That was successful.

As you might have guessed by now, default ansible_connection type is SSH. And it can be changed by setting ansible_connection to desired (and valid) value in the inventory file. Other connection types include:

  • SSH protocol types
    • smart (default)
    • ssh
    • paramiko
  • Non-SSH connection types
    • local
    • docker

Throughout the series, we’ll mainly be working using the default SSH connection type. We’ll be deploying containers as well and will see if we need to use the docker connection type then. 😄

Host Variables

We can easily assign host variables in the inventory file. These variables can then be used in playbooks.

$ cat /etc/ansible/hosts
[servers]    http_port=8080    http_port=9000

Above snippet defines http_port to different values for the two hosts. Let’s bring up httpd webserver on these hosts on the specified port.

$ ansible -m yum --args="name=httpd state=present" servers

$ ansible -m lineinfile --args="regexp=\"^Listen\" path=/etc/httpd/conf/httpd.conf state=present line=\"Listen {{http_port }}\" " servers

$ ansible -m systemd --args="name=httpd state=restarted" servers

That’s handful of commands to execute. Let’s see what each one does.

  • First we install httpd package using the yum module. state=present installs a package if it’s not already installed on the system.
  • Next, we replace a line in file (lineinfile) that starts with Listen (regexp="^Listen"). The \ are used as escape characters so that a " is not considered as closing quote.
  • Finally we start httpd service in those systems using systemd module.

Now if we curl these systems, we’ll be greeted with default Apache web server page. All we need to do is curl or curl

Group Variables

If we’d like to set variables for entire group, servers in our example, all we need to do is:

$ cat /etc/ansible/hosts


And then executing similar commands as those we did earlier would start httpd server for us on port 8080 of both the systems.


If you’ve used Linux command line for some time, chances are you have already heard of alias. It’s similar concept in Ansible as well.

An alias helps you define name for a host. In our example, we’ve specified IP addresses of the hosts in our inventory file. If we’d like to ping only one system, we’ll have to do:

$ ansible -m ping

That doesn’t look cool. Let’s set aliases for our two hosts:

$ cat /etc/ansible/hosts
host1 ansible_host=
host2 ansible_host=

And now we can simply do:

$ ansible -m ping host1

That’s it for this post

In this post, we looked at some non-trivial ad-hoc commands of Ansible. It looks cool but, executing a bunch of commands every time on the command line is not really fun. Instead, we can use Ansible Playbooks to perform all tasks we did above with a single command. We’ll soon look at Ansible Playbooks!

As always, if you have any comments/feedback/suggestion, please let me know below! Until next time. 😉

Ansible Series: The inventory file - working with remote systems

Series: Ansible

In this post we’ll talk about the concept of Inventory. When we installed Ansible, an example inventory file was automatically placed at the location /etc/ansible/hosts for us. This is the default inventory file for Ansible.

Inventory file

An inventory file consists of the information of various remote hosts that Ansible knows of. This file needs to be configured before we can start using Ansible to work with remote systems.

Hosts specified in the inventory file can either belong to a group or be ungrouped. A group is specified like below:


Ungrouped hosts should be specified before specifying any grouped hosts. You can either provide FQDN or IP address of the host. Make sure that the remote host(s) is/are reachable from the system you’re using to run Ansible commands through the FQDN or IP provided in inventory file.

Inventory file can be of different formats. The one you saw above and that we’re going to follow throughout the series is INI-like syntax (Ansible’s default). You can read about other formats in Ansible’s documentation.

Playing with remote systems

For this post, we’re going to work with two remote systems. You can work this out in various ways by creating two Linux virtual machines on your laptop or a cloud provider. Two systems I’m going to work with have IP addresses and

My inventory file (/etc/ansible/hosts):

$ cat /etc/ansible/hosts

Let’s start throwing some Ansible magic to them:

$ ansible -m ping all | UNREACHABLE! => {
    "changed": false, 
    "msg": "Failed to connect to the host via ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n", 
    "unreachable": true

    "changed": false, 
    "msg": "Failed to connect to the host via ssh: Permission denied
    "unreachable": true


Oops, that was embarassing. Our first real Ansible command failed. 😢

Ah, our host system doesn’t have SSH access to the remote systems! We need to configure that first by enabling password-less SSH access from host system to both the remote systems. This can be done by creating ssh keys and copying them to remote system.

And then execute same command again:

$ ansible -m ping all | SUCCESS => {
    "changed": false, 
    "ping": "pong"

} | SUCCESS => {
    "changed": false, 
    "ping": "pong"


Now that worked like a charm. One thing you need to ensure when configuring SSH access is that, by default, Ansible will use same user to connect to remote system via SSH as that on the host you’re executing Ansible from. That means, if on the host system you’re logged in as user randomuser, Ansible will try to connect to remote system as the user randomuser only.

But what is above command doing anyway? It’s using the module ping on hosts that belong to group all. In response, since SSH connectivity is OK, it’s getting a pong response and the result is SUCCESS.

But we didn’t configure all group; we configured servers group. By default, Ansible has two groups: all and ungrouped. Hosts that are not a part of any user defined group belong to the group ungrouped and all contains all hosts in the inventory file.

Let’s do something else on this remote systems. Let’s do yum -y update (assuming they are running CentOS/RHEL) on these systems:

$ ansible -m yum --args="name='*' state=latest" all

We use yum module of Ansible and ask it to work on all installed packages (name='*') such that their updated to their latest versions (state=latest). * is a regular expresssion which indicates “all installed packages” to yum.

Expect this command to take some time and print ugly output. It’s going to return only when the updates have been downloaded and installed on the remote system. Time it will take depends on: number of updates available, internet speed and type of disk (HDD vs. SSD).

Let’s try shell module which executes the specified command on remote systems:

$ ansible -m shell --args="date" all | SUCCESS | rc=0 >>
Fri Sep 29 15:00:30 UTC 2017 | SUCCESS | rc=0 >>
Fri Sep 29 15:00:30 UTC 2017

$ ansible -m shell --args="df -h" all | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        10G  1.2G  8.9G  12% /
devtmpfs        900M     0  900M   0% /dev
tmpfs           920M     0  920M   0% /dev/shm
tmpfs           920M   17M  904M   2% /run
tmpfs           920M     0  920M   0% /sys/fs/cgroup
tmpfs           184M     0  184M   0% /run/user/0 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        10G  1.2G  8.9G  12% /
devtmpfs        900M     0  900M   0% /dev
tmpfs           920M     0  920M   0% /dev/shm
tmpfs           920M   17M  904M   2% /run
tmpfs           920M     0  920M   0% /sys/fs/cgroup
tmpfs           184M     0  184M   0% /run/user/0
tmpfs           184M     0  184M   0% /run/user/1000

rc=0 that we see in the first line of output for both systems indicates that return code of executing the command was 0 (which means there were no errors.)

That’s it for this post

There’s more that can be talked about inventory file: host variables, group variables, aliases, and more. But I’ll keep it for the next post to avoid making this too long. I personally feel bored of reading really long posts written across the Internet. 😉

As always, if you have any comments/feedback/suggestion, please let me know below! Until next time. 😉

Ansible Series: Set things up

Series: Ansible

As I mentioned in my previous post, I will be starting with Ansible series. Best way to find all posts in Ansible series to use this link.

If you’re reading this post, chances are you already know what Ansible is and what it is used for. So, instead of repeating what numerous other posts across the Internet already have to say, I’d try to put it in short as to what we use Ansible for and how it helps us.

We use Ansible as a configurtation and deployment tool. It helps us configure systems with the packages and tools we need on it to deploy the end product. We use Ansible Playbooks (will talk about this in later posts) to install packages, start services and ensure things are up and fine. We plan to use its “Continuous Deployment” feature to automate deployments on different environments (pre-prod, prod, etc.)

Let’s get into setting up Ansible on our system and perform some tasks with it. Ansible is generally used to configure remote systems but, for this post, we’ll be using it on localhost only. That is, install it on localhost and perform operations on localhost as well.

We’re going to use CentOS 7 system for the purpose of this series. Except installation, all other steps should work just fine irrespective of the underlying distro.


$ yum -y install ansible

At the time of writing this, above command will install version for us. If you’d rather want to install the latest (bleeding edge) version, follow below steps:

$ yum -y install epel-release
$ yum -y install python-pip
$ pip install ansible

Benefit of installing via yum is that packages in official RHEL/CentOS repositories undergo a good deal of testing to ensure enterprise stability and security.

What we get upon Ansible installation?

Once you’ve installed Ansible, on your command line, type ansible and hit Tab key twice to load all possible commands that start with ansible.

Ones we will be discussing in this series are:

  • ansible: runs a specified task on target host(s).

  • ansible-console: drops you into a shell that works as REPL (Read-Eval-Print-Loop). It allows running ad-hoc tasks against a chosen inventory.

  • ansible-doc: provides documentation on the command prompt. It’s really helpful for quick references where we are not completely sure but have a rough idea.

    ansible-doc yum would print help for the yum module whereas ansible-doc -s yum would print a snippet which can then be copied to playbook and modified.

  • ansible-galaxy: helps us manage roles using Ansible Galaxy.

  • ansible-playbook: is the most interesting and heavy lifter among all. It executes a playbook passed to it as an argument. We’ll have quite a few posts on this. 😉

  • ansible-pull: pulls playbook from VCS server and run them on the machine executing ansible-pull. It helps invert default push architecture of Ansible into pull architecture.

  • ansible-vault: helps safeguard sensitive information stored in a data file used by Ansible.

Performing actions on localhost

As mentioned in the beginning of this post, we’ll perform some actions on localhost using ansible command.

  • Install httpd on our CentOS system:

    $ ansible localhost -m yum --args="name=httpd state=present"

    This tells ansible to execute on localhost, use the module yum and pass additional arguments "name=httpd state=present" to the yum module. As a result of executing this command, httpd package will be installed on the localhost.

  • Start httpd server we just installed:

    $ ansible localhost -m systemd --args="name=httpd state=started"

    This tells ansible to start the httpd server we installed in previous command. It does so using systemd module.

    Check if it was actually started:

    $ curl localhost
  • Enable httpd server to start at boot time:

    $ ansible localhost -m systemd --args="name=httpd enabled=true"

    You can verify that httpd server starts upon boot by rebooting the localhost using the same curl command mentioned earlier.

That’s it for this post

In the next post, we will take a look at the concept inventory in Ansible. If you have any comments/feedback/suggestion, please let me know below! Until next time. 😉

Writing Series of Technical Posts

When I decided to make some use of this domain, one (and probably most important for me) thing that I had in mind was to write a series of posts/articles on tech that I work on or that interests me. I had already decided on using Hugo but was not sure how to group articles in the form of a series.

That’s when I came across Nate Finch’s website and read his post on how he managed to group articles into a series. Knowing how greatly I suck at frontend, I decided to clone his repo and modify things to suit my requirements. I didn’t want my discomfort with frontend get in the way of writing!

In coming few days, I’ll be writing a tech series, starting with Ansible. At the moment, I have not decided how many posts I’ll write or what specific topics I’ll dive into or how frequently I’ll write. The only thing I’ve decided to do is to write.

Through writing, I want to share what I know (with the hope of being helpful to someone some day) and learn from the feedback/suggestions of readers.


This is not really my first attempt at blogging. I’ve tried and failed numerous times so far. 😃 This might or might not be another such attempt at writing.

I enjoy hosting and talking at meetups. During one such recent meetup, I volunteered to give a talk about technologies I had not played much with.

It was the result of challenging myself to prepare for this talk that I decided to start writing - as a challenge. And as every other living creature, I hope to succeed. 😄

I intend to write about my learnings on various technologies that I play around with. I hope it helps someone else out there some time.

I don’t think of myself as an expert at any of the technologies I intend to write about; so if you have any comments/suggestions/feedback, please feel free to ping me. 😉