import React from 'react';

import './stylesheets/Troubleshooting.css';

const Troubleshooting = (props) => {
  return (
    <div className="troubleshooting">
      <h1>Troubleshooting Instructions</h1>
      <p>
        Below are suggestions on how to approach debugging issues with the indexing-service.
        Find the error you are seeing, make the fix, and then debug after the service is running again on how to prevent it in the future.
        <br/>
        <br/>
        Feel free to add these aliases to your <code>~/.bash_profile</code> or <code>~/.bashrc</code> (and restart your terminal or <code>source</code> the file after editing)
      </p>
      <p>
        Note: You will need to login to the Heroku CLI in order for these commands to work. Run <code>heroku login</code> after <a href={"https://devcenter.heroku.com/articles/heroku-cli#download-and-install"}> installing the Heroku CLI </a>
      </p>
      <p>
        <pre>
          <code>
            # Aliases to login to Postgres for each app
            <br/>
            alias psql-prod="heroku pg:psql postgresql-rugged-11940 --app indexer-prod"
            <br/>
            alias psql-standby="heroku pg:psql postgresql-deep-39747 --app indexer-standby"
            <br/>
            alias psql-stg="heroku pg:psql postgresql-octagonal-27286 --app indexer-stg"
            <br/>
            alias psql-uat="heroku pg:psql postgresql-closed-44018 --app indexer-uat"
            <br/>
            alias psql-qa="heroku pg:psql postgresql-shaped-33415 --app indexer-qa"
            <br/>
            alias psql-dev="heroku pg:psql postgresql-curly-42116 --app indexer-dev"
            <br/>
            <br/>
            # Aliases to login to Redis for each app
            <br/>
            alias redis-prod="heroku redis:cli -a indexer-prod -c indexer-prod"
            <br/>
            alias redis-standby="heroku redis:cli -a indexer-standby -c indexer-standby"
            <br/>
            alias redis-stg="heroku redis:cli -a indexer-stg -c indexer-stg"
            <br/>
            alias redis-uat="heroku redis:cli -a indexer-uat -c indexer-uat"
            <br/>
            alias redis-qa="heroku redis:cli -a indexer-qa -c indexer-qa"
            <br/>
            alias redis-dev="heroku redis:cli -a indexer-dev -c indexer-dev"
            <br/>
            <br/>
            # Aliases to tail the logs for each app in terminal
            <br/>
            alias logs-prod="heroku logs --tail -a indexer-prod"
            <br/>
            alias logs-standby="heroku logs --tail -a indexer-standby"
            <br/>
            alias logs-stg="heroku logs --tail -a indexer-stg"
            <br/>
            alias logs-uat="heroku logs --tail -a indexer-uat"
            <br/>
            alias logs-qa="heroku logs --tail -a indexer-qa"
            <br/>
            alias logs-dev="heroku logs --tail -a indexer-dev"
          </code>
        </pre>
      </p>
      <a className="anchor" name="object_status" href="#object_status"><h2>Object Status</h2></a>
      <p>
        This issue is usually driven from when the org is timing out and the batch jobs are processing too slowly. Normally, the issue will resolve itself when the instance is less overworked and the org gets time
        to process its batch jobs. You can confirm this issue by going to the healthz page or the state tab of this browser and see if the 'errors' field contains timeout errors.
      </p>
      <p>
        This can also happen after a deploy to either the indexer or the AppExchange. If you are syncing a new object or new fields, and something was configured incorrectly, objects will begin to fail
        and this is an error that should be looked at immediately. A common cause might be forgetting to add a field/object to the Listing Indexer profile on the AppExchange Org. These issues are very easy
        to figure out as the 'errors' field should point you to exactly which fields were failing. Normal state for the indexer is 0 failed records.
      </p>
      <p>
        Note: If the indexer was failing during a full-refresh and needed a restart, it will not re-try those records (which is fine, just a full refresh we did not miss any updates). However, the records left in a failed state
        (ie: <code>E_CONN_RESET</code>) will not moved out of the failed state until the next full refresh. This means Smokey will keep gacking. You need to reset the state of those records.
      </p>
      <p>
        <ol>
          <li>Log into the indexing-service Postgres db using the aliases above (ie:  <code>psql-prod</code>)</li>
          <li>Update the failed records to success</li>
            <ul>
              <li><code>UPDATE trace_log SET status='success' WHERE status='failed';</code></li>
            </ul>
          <li>Now you should have 0 failed listings and Smokey will stop complaining.</li>
          <li>It is important to first look through all the failed listings and make sure you know they were all expected issues - like timeouts.</li>
        </ol>
      </p>
      <a className="anchor" name="redis_mem" href="#redis_mem"><h2>Redis Memory</h2></a>
      <p>
        This issue periodically happens when the Redis instance runs out of memory. The indexing-service uses Redis for queue management as well as storing timing information. As part of the queue management,
        it holds information about the jobs to be processed so a worker can take them. As we process 60k+ listing objects, each queued job has a LOT of information. When the AppExchange org is being slow at processing,
        sometimes we run into the issue of the harvester job pushing on jobs as it gets them, but the upsert jobs not processing fast enough to clear out room for the new jobs. This creates a race condition which
        can unfortunately result in our Redis instance running out of memory.
      </p>
      <p>
        This can also happen when the indexer has been running for a long time without a Redis flush. When jobs fail (happens quite often on PAAS like Heroku) those failed jobs are stored in Redis. There is probably a
        programmatic way to clean these jobs up (the indexing service retries failed jobs anyways), but for the time being they need routinely flushed following the process below.
      </p>
      <p>
        Either of the above issues will require a full flush/restart of the indexing service.
      </p>
      <p>
        <ol>
          <li>Log into the indexing-service pipeline and select the app that is failing.</li>
          <li>Go to Resources tab, and scale the 'worker' dyno count to 0. This will shut off the indexing service.</li>
          <li>Login to the Heroku CLI if you have not already, and add the above aliases to your <code>~/.bash_profile</code> or <code>~/.bashrc</code></li>
          <li>Login to the apps redis instance. <code>redis-prod</code></li>
          <li>Before we wipe the entire db, we want to capture the lastUpdate and lastDelete information to repopulate the db before restarting the service. You will need your provider Name (found in the state page under Time log tab).
            <ul>
              <li>Run the following commands (each should print out a time stamp, use your providerName in place of Tz if different)
                <ul>
                  <li><code>hget updateTimeLog Tz</code></li>
                  <li><code>hget updateTimeLog TzFullRefresh</code></li>
                  <li><code>hget deleteTimeLog Tz</code></li>
                </ul>
              </li>
            </ul>
          </li>
          <li>Flush the database
            <ul><li><code>FLUSHALL</code></li></ul>
          </li>
          <li>Verify no keys exist, the following command should print nothing.
            <ul><li><code>KEYS *</code></li></ul>
          </li>
          <li>Repopulate timing information (replace TIMESTAMP with the output of the respective command from step 5)
            <ul>
              <li><code>hset updateTimeLog Tz TIMESTAMP</code></li>
              <li><code>hset updateTimeLog TzFullRefresh TIMESTAMP</code></li>
              <li><code>hset deleteTimeLog Tz TIMESTAMP</code></li>
            </ul>
          </li>
          <li>Now you should be able to turn the indexer back on using the dyno count slider (push back to 1) from step 2.</li>
          <li>Watch the logs, and make sure that the indexer picks back up where it left off once a minute or two goes by.</li>
        </ol>
      </p>

      <a className="anchor" name="last_success" href="#last_success"><h2>Last Success</h2></a>
      <p>
        This error occurs when the indexing service seems to be stuck and has stopped processing updates. This issue is unfortunately harder to debug, because there are a multitude of things that could lead up to it
        and will unfortunately just require digging into the logs.
      </p>
      <p>
        Possible issues include
        <ul>
          <li>Cycling in the middle of the FullRefresh (best practice is to never stop/restart the dyno anywhere near 4pm MT, this resets the 24 hour cycle window right in the middle of the FullRefresh)</li>
          <li>Sometimes a worker completely fails before it can even report that it failed, leaving the scheduler waiting for a response it will never get. (look at the logs and look for the last status count, see if a page is missing)</li>
        </ul>
      </p>
      <p>
        In the cases above, the logs should show the message <code>Update Job in Progress</code> every minute it tried to schedule a new one.
      </p>
      <p>
        Luckily, this issue is easy to fix, though you should try and determine the cause to stop it from happening again.
        You should be able to just restart the dyno and watch the logs to see if the updates pick back up. Rarely, I have seen the scheduler seem to not schedule the jobs after the restart. Usually a Redis Flush (go to Redis Memory error) will fix the issue.
      </p>
      <a className="anchor" name="healthz_response" href="#healthz_response"><h2>Healthz Response</h2></a>
      <p>
        I'll be honest, I have never seen this error happen. This error just means that something is going wrong when trying to build the healthz response.
      </p>
      <p>
        Possible issues could include:
        <ul>
          <li>Cannot connect to postgres</li>
          <li>Cannot connect to redis</li>
          <li>Healthz queries are failing</li>
        </ul>
      </p>
      <p>
        The easiest way to debug this would be to open the logs, and refresh the state page or go to the service's healthz page and see what the process is failing on.
      </p>
    </div>
  );
};

export default Troubleshooting;
