Appknox-tech-stack1 Learn Tech

Tech Stacks in Asia: Inside Appknox’s tech stack

screen shot 2017-09-13 at 7.13.34 pm Learn Tech

Alexander Jarvis

Startup sucks and fundraising is a nightmare. I make awesome (allegedly) tools and write no BS content to help founders be more awesome and not get taken advantage of. If I can help, reach out.
screen shot 2017-09-13 at 7.13.34 pm Learn Tech

For all the promise that exists for startups in Asia, there persists a list of reasons why companies cant succeed. Talent, in particular, tech talent is certainly an often stated constraint. The reality is that there is home grown talent kicking ass!

I order to show-case the talent that exists in Asia, I’m running a series of guest posts from awesome CTOs and their dev teams from across Asia. My hope is that by illustrating the great talent that is already hacking away we can inspire the next generation of great coders and draw more talent into the ecosystem (From big, boring corporations ;)). Also if you are a dev looking to join the companies featured, reach out!

Our first episode is from Subho Halder and Dhilip Siva of Appknox.

appknox Learn Tech

Describe Appknox in 2-3 sentences.

Appknox helps developers and enterprises to uncover and fix security loopholes in mobile applications.

Securing your app is as simple as submitting your store link / uploading your app. We scan for security vulnerabilities and report back to you with vulnerabilities.

What are your primary programming languages?

Our primary programming languages consist of Python & Shell for the Back-end and CoffeeScript and LESS for Front-end.

Our full stack is:

  1. Django
  2. MySQL
  3. RabbitMQ
  4. Celery
  5. Redis
  6. Memcached
  7. Varnish
  8. Nginx
  9. Ember

What is your architecture and how does it work?

Architecture

To graphically illustrate how it works is as follows:

architecture.001-1024x768 Learn Tech

How it works?

Our back-end architecture consists of 3 subsystems: Client, Data and Workers.

  1. Client Subsystem

The client subsystem consists of two different load-balanced, auto-scaling App & Socket Servers. This is where all user-interactions takes place. We took much care not to have any blocking calls here to ensure lowest possible latency.

App Server:

Each App server is a single Compute unit loaded with Nginx and Django-gunicorn server, managed by supervisord. User requests are served here. When a user submits the url of their app, we submit it to the RabbitMQ ‘download’ queue and immediately let the user know that the URL has been submitted. In case of uploading any app, a signed-url is fetched from the server. The browser uploads data directly to the S3 with this signed-url and notifies the app server when it is done.

Socket server:

Each socket server is a single compute unit loaded with Nginx and a node (socket-io) server. This server uses Redis as its adapter. And yes, of course, this is used for real-time updates.

  1. Data subsystem

This system is used for data storage, queuing and pub/sub. Which is also responsible for a decoupled architecture.

Database Cluster:

We use MySQL. It goes without saying that it consists of a Write-Heavy master and few Read-Heavy replicas.

RabbitMQ:

A broker to our celery workers. We have different queues for different workers. Mainly ‘download’, ‘validate’, ‘upload, ‘analyse’, ‘report’ and ‘mail’. The web server puts data into queue, the celery workers pick it up and run it.

Redis:

This acts as adapters to socket-io servers. Whenever we want to notify user an update from any of our workers, we publish it to Redis, which in turn will notify all users trough Socket.IO

  1. Worker Subsystem

This is where all the heavy lifting works are done. All the workers gets tasks from RabbitMQ and Published updates to users through Redis.

Static Scanners:

This is an auto-scaling Compute unit group. Each unit consists of 4-5 celery workers. Each celery worker scans single app at a time. The main library that is working here is the Androguard library.

Other tasks:

This is an auto-scaling Compute unit group. Each unit consists of 4-5 celery workers which does various tasks like download apps from stores, generating report pdf, uploading report pdf, sending emails, etc.

Dynamic Scanning:

This is platform-specific. Each Android dynamic scanner is a On-Demand Compute instance that has android emulator (With SDKs) and a script that captures data. This emulator is shown on a canvas in the browser for user to interact. Each iOS scanner is in a managed Mac-Mini farm that has scripts and simulators supporting the iOS platform.

Tell us some more about your stack?

Web frameworks

We balance between vanilla for flexibility and frameworks depending on what we need to do. When the framework is a hindrance to what we want to do and if no other frameworks seems to solve that issue, we would go for vanilla stuff. Else why re-invent the wheel, right?

Data Layer

Our data layer consists of MySQL (Google Cloud SQL)/ RabbitMQ and Redis. I guess we don’t need to explain what does what.

Reasons for choosing the stack

Python – We chose Python because the primary libraries that we use to scan applications is in Python. Also, we love Python more than any other languages that we know!

Django – We chose Django because it embraces modularity.

Ember – We think that this is the most awesome Front-end framework that is out there. Yes, the learning curve is more steep than any-other, but once you climb that mountain, you will absolutely love ember. It is very opinionated. So as long as you stick to its conventions, you write less to do more.

Other – The rest are de-facto. MySQL because of relational data. RabbitMQ for Task Queues. Celery for Task Management. Redis for Pub/Sub. Memcached & Varnish for caching.

Things don’t always go as expected- scaling sockets!

One of the things that didn’t go as expected is scaling sockets. We were using Django-socket.io initially. We realised that this couldn’t be scaled to multiple servers. So we wrote that as a separate node module. We used node’s socket-io library that supported Redis-adapter. Clients are connected to the node’s socket server. So we now publish to Redis from our python code. Node will just push the notifications to clients. This can be scaled independently of the app-server that acts as a JSON endpoint to the clients.

Which DevOps tools do you use?

The tools we use are Fabric, Jenkins, GitHub, Graphite. I think ‘fabric’ is de-facto for python deployments.

For hosting we are with Google Cloud. We use shell scripts to deploy Irene to to Google Cloud Storage.

The code is auto-deployed from the ‘master’ branch. We follow Vincent Driessen’s Git branching model. Jenkins build commits to ‘develop’ branch. If it succeeds we do another manual testing, just to be sure and merge it with `master` branch and it gets auto deployed.

Which part of your tech stack are you most excited about?

We love modular design! We went to modularize things in so far that we de-coupled our front-end from our back-end. Yes, you read it right: all the HTML, CoffeeScript and LESS code is developed independently of the back-end! Front-end development does not require a server to be running. We rely on front-end fixtures for fake data during development.

We love modular design! We went to modularize things in so far that we de-coupled our front-end from our back-end. Yes, you read it right: all the HTML, CoffeeScript and LESS code is developed independently of the back-end!

Our back-end is named ‘Sherlock’. We detect security vulnerabilities in mobile applications. So the name seemed apt. Sherlock is smart.

And our Front-end is named ‘Irene’. Remember Irene Adler? She is beautiful, colorful and tells our user’s whats wrong.

And our Admin is named ‘Hudson’. Remember Mrs. Hudson? Sherlock’s land-lady? Thinking of which we should have probably given a role to poor Dr. Watson. Maybe we will.

So, Sherlock does not serve any HTML/CSS/JS files. I repeat, It does not serve ANY single static file / HTML file. Both sherlock and Irene are developed independently. Both have separate deployment process. Both have their own test-cases. We deploy sherlock to Compute instances and we deploy Irene to Google Cloud Storage.

The advantage of such architecture is that:

  1. The Front End team can work independent of the back-end without stepping on each other toes.
  2. The heavy lifting work like rendering pages on the server is taken off of server.
  3. We can open-source the front-end code. Making it easy to hire front-end guys. Just ask them to fix a bug in the repo and they are hired. After all, front-end code can be read by anyone even if you don’t open-source it right?

We can open-source the front-end code. Making it easy to hire front-end guys. Just ask them to fix a bug in the repo and they are hired. After all, front-end code can be read by anyone even if you don’t open-source it right?

Our one big recommendation

If we were to give a recommendation others can follow it would definitely be to Decouple. Decoupling is the first step towards scaling your product.

We think open-sourcing is important and we try and contribute back to the community as much as possible.

Decoupling is the first step towards scaling your product.

How have you found recruiting staff for your stack?

We hire locally and no we don’t like remote work. We typically source people through our community of startups in Bangalore. The languages we use are a selling point for us when recruiting.

We picked our stack according to our requirement not what was ideal.

So what do you think about their stack? Sound off in the comments, would love to hear your thoughts!

Leave a Reply

Your email address will not be published. Required fields are marked *