← Data Services

Basic DS Firefighter Info

DS Firefighter is covered during business hours by responsible teams and during off hours by a primary/secondary pair. Both are set up in Pagerduty to schedule hours correctly. The off hours firefighters are rotated every week on Thursday. See Generating Firefighter Rotation Schedule for more info on how that’s created.

CONTACT

Most teams have PagerDuty Accounts and cycling on-call rotations.

PagerDuty alerts can be generated directly from Slack with /pd-[team] message slash commands. This will ping the relevant on-call staff from the team with your message via pagerduty’s contact waterfall. ie. sms, app, email, call at whatever interval the teams have defined.

Hotfix / “Hot Fix”

Sometimes a fix needs to go out for an app and the pipeline to get it to production is either too slow or clogged up, this is when we create a hot fix.

Before you begin the hot fix process, make sure to announce it in the #org-reliability-ff and #team-mobile-ds Slack channels so that people can plan accordingly. The next step is to create a branch in the app’s repo of the production version. Finding which version the app is on in production can be done via the following command. Here (2.663.0) is the deployed version.

$ bub health check api --env=production-2
Health Checks as of 2018-01-08 10:44:30 -0700
checking keen in service_group alpha (2.663.0) - http://10.211.0.233:7021/health : OK

You can also grab the version from marathon

Make sure to pull down all the tags: git fetch --tags

Then creating the fixes branch of that version can be done with

git checkout -B <version_number>-fixes <version_number>

The second step is to cherry pick the commit that is needed for the hot fix. The commit hash is needed for this which can be found either by finding it under the commits tab on the github site or by scrolling through the commits from git log. If the commit comes from a PR, use the hash from the merge commit. Since git looks locally for the hash you will need to make sure to do a git pull on the master branch before doing the cherry pick. Next, cherry pick the commit by using

git cherry-pick -m 1 <commit_hash>

This will pull the changes from the commit but no changes between where the tag is and the commit.

The version of the app needs to be bumped up in the version.sbt file

We loosely follow the conventions outlined here. An example of bumping a version number for a hot fix would be if the current version is 1.318.0, the next version would be 1.318.1.

Next, commit and push the cherry-picked commit(s) to the branch

git commit -am "Bumping the version" or similar

git push origin <branch_name>

Then, bumping the tag version needs to be done via

git tag <bumped_version_number>

git push --tags

Next, have jenkins build the job https://jenkins.dev.banno-internal.com/job/mobile-data-services-hot-fix/ . Click “Build with Parameters” and give it the branch name from before. It will build and publish it up.

While the job is building you should send out an approval email containing the changelog and instructions:

Changelog

To generate the changelog head to https://jenkins.dev.banno-internal.com/job/mobile-data-services-generate-release-notes/ . Click “Build with Parameters” and give it the hotfix version. This will send out an approval email and provide a script for ops to run the hotfix

Deploy Steps

Production hotfixes are deployed by Technical Operations or Infrastructure team members.
Updated procedure can be found in the [https://github.com/Banno/operations/blob/master/docs/Deploy.md](operations repo)

Note: If needed the *-batch apps can be hotfixed. This is done by modifying their version in the environments/ repo like any other marathon deployed app.

Adding Encrypted Values

When encrypted values need to be added manually because they can not be part of the migration or are not just done programmatically, you can follow these steps.

  • Generate keys and encrypt the values for the environment by following this
  • If the base64 version is what you want, you can stop here. Otherwise, for example, if you are inserting the value into a postgres column of type bytea then you can use the decode function of postgresql to do that. ie
update table_to_update
set column_to_update = decode('base64valuetoinsert', 'base64')
where something_to_match_on = 'matched';

Fetch Proxy

Currently the Fetch Proxy is on a 1 day restart by a cron job. You still may see a timeout from pager duty though. In order to check if the proxy is still working you can run a curl on the proxy with a website being fed through. An example is below:

$ curl -x socks5://fetch-proxy1.internal.banno-internal.com:5000 https://www.google.com/

We are able to cycle the public ip used by running the following command bub aws cycle-fetch-proxy-ip.

SQL Queries

SQL

  • User Activity
    • How many users ran during some period of time?
    • How many users ’timed out'?
    • Net online aggregations in the past week, broken down by hour.
    • What are some of the most frequent users, and how much are they aggregating?
    • Find users with the most accounts running within a period of time.
  • Institution Activity
    • What were the most active institutions (e.g. users grouped by online aggregations) during a period of time?
  • Push Notifications
    • What notifications did we send to a user?
  • Batch Aggregation
    • How many logins need to run in batch right now?
  • RDC
    • How many users have signed up for an institution?
    • How many RDC accounts have we approved in the past few days?

jXchange Site Transitions

Occasionally (like every 3-4 months) the hosted jxchange at jxappgtw.jhahosted.com will under go a site transition in which they move the running of jXchange and it’s supporting infrastructure from Branson MO to Monett MO or vice versa. During this transition, there is a short downtime of about 30 minutes. Our history of these transitions at Banno is bumpy.

jxappgtw.jhahosted.com is primarily used by Banno for iPay operations from Che.

There are a few checks we can do after a transition to ensure that from Banno’s viewpoint, jxappgtw is working correctly.

  • DNS: JHA controls ingress to the active jxappgtw via DNS. So it’s worth checking before and after the transition to make sure that DNS has been propogated correctly and we’re resolving the jxappgtw.jhahosted.com address correctly.
$ dig @8.8.8.8 +short jxappgtw.jhahosted.com
74.200.55.154

$ dig @10.3.0.53 +short jxappgtw.jhahosted.com
74.200.55.154

$ ssh prodche1.banno.com dig @8.8.8.8 +short jxappgtw.jhahosted.com
74.200.55.154
  • IIS Splash Page: Hitting the IIS splash page ensures that the F5 load balancers they use in front of the jXchange services are configured correctly. If you can curl https://jxappgtw.jhahosted.com or load it in a browser and see a splash page, then that part has been configured correctly.

  • Other known issues: After those checks, making sure that Che is not reporting any errors or funkiness in the che.warn.log is the next step. There had been issues in the past with the applications continuing to keep connections to the old jXchange, but they should have been resolved in https://github.com/Banno/jxchange-scalaxb/pull/66

  • Reporting Issues after transitions: In the past, we’ve emailed our iPay Integration Support ipayintegrationsupport@jackhenry.com (usually replied back by Paul Kirchner PKirchner@jackhenry.com). However, now you should enter in a jSource case for the affected institutions with a high enough priority.

Generating Firefighter Rotation Schedule

Generating the rotation is created by nice little ruby script which makes sure that people aren’t scheduled twice in a row and every one gets a chance to be both primary and secondary. It’ll also exclude those that are on the reliability team.

The script requires a little bit of modification before generating:

  • $start_date needs set to the start date
  • $last_primary needs set to last primary FF
  • $last_secondary needs set to the last secondary FF
  • $people_on_reliability_team needs set to the people that are currently serving on the reliability team.
  • $ff_people needs set to the people that are FF. It might be modified that often.

The %w(...) ruby syntax is an easy way to make an array of strings. %w(abc def) is the same as ["abc", "def"]

The script is ran, the rotation gets email out to firefighters@banno.com to ensure that those on rotation will work for everyone. After that’s :+1:’d by people, it is put on the Off Hours DS Firefighter Calendar (please contact Luke Amdor if you don’t have access to edit that calendar with Google Calendar)

Note: This gem requires text-table. gem install text-table

  require 'date'
  require 'text-table'
  $start_date = Date.parse('2016/02/11')
  $last_primary = "Adam"
  $last_secondary = "Luke"
  $people_on_reliability_team = %w()
  $ff_people = %w(Luke Trent Adam Zach Joe Nick Dustin)
  $people_in_rotation = $ff_people - $people_on_reliability_team

  def bad_rotation?(rotation)
    is_primary_and_secondary = rotation.any? { |p,s| p == s }

    was_on_duty_week_before = rotation.each_with_index.any? do |ps, i|
      p,s = ps
      if i > 0
        prev_p, prev_s = rotation[i - 1]
        p == prev_s || s == prev_p
      end
    end

    was_primary_or_secondary_last = rotation.first.any? do |p|
      $last_primary == p || $last_secondary == p
    end

    is_primary_and_secondary ||
      was_on_duty_week_before ||
      was_primary_or_secondary_last
  end

  def generate_rotation
    primary = $people_in_rotation.shuffle
    secondary = $people_in_rotation.shuffle
    result = primary.zip(secondary)
    if bad_rotation?(result)
      generate_rotation
    else
      result
    end
  end

  def add_start_dates_to(rotation)
    start_dates = (0..(rotation.size)).map { |i| $start_date + i * 7 }
    rotation.zip(start_dates).map { |(p,s),d| [d.to_s,p,s] }
  end

table = Text::Table.new
table.head = ["Start Date", "Primary", "Secondary"]
add_start_dates_to(generate_rotation).each do |row|
  table.rows << row
end
puts table.to_s