If you’re running a self-hosted GitLab instance, you might eventually accumulate a large number of old pipelines and jobs that are no longer useful. These can clutter your project dashboards and consume unnecessary storage space. In this guide, we’ll walk through how to clean up these unused resources using a Python script.

This solution assumes that all your projects and their associated pipelines/jobs belong to a single GitLab group.

Prerequisites

  • GitLab API Access: Ensure you have the proper permissions to access the GitLab API and manage jobs and pipelines.
  • Python 3+ installed on your system.
  • Gitlab Python SDK: Install the python-gitlab package, which allows interaction with the GitLab API.

Installation and Setup

Install Dependencies

pip install python-gitlab

Set Up Credentials

  • Replace ‘https://gitlab.domain.com‘ in the script with your GitLab instance URL.
  • Replace ‘CHANGME‘ with your GitLab private token (you can generate one under your user settings).

Debug or not to Debug

The debug var allows you to enable or disable changes, leave it as ‘False‘ to make changes or set to ‘True‘ to just show what it would change without making any modifications

Run the Script

python3 cleanup_script.py

The Script

Here’s the complete Python script to clean up old pipelines and jobs:

import gitlab, sys, time, json
from datetime import datetime
from pprint import pprint

# Replace with your GitLab instance URL and private token
gl = gitlab.Gitlab('https://gitlab.domain.com', private_token='CHANGME')

group_id = gl.groups.list(search='CHANGME', get_all=False)[0].id
group = gl.groups.get(group_id)

debug = False

jobTimeLimit = 30
jobLifeLimit = 1
pipelineLifeLimit = 1

# List all projects - all=True, include_subgroups=True
try:
    projects = group.projects.list(iterator=True)
except gitlab.GitlabGetError as e:
    print(f"Error fetching projects: {e}")
    sys.exit(1)

for project in projects:
    project_obj = gl.projects.get(project.id)
    print(f"\nProcessing Project: {project_obj.name}")
    kept_jobs = 0
    deleted_jobs = 0
    deleted_pipelines = 0

    # Process Jobs
    try:
        for job in project_obj.jobs.list(pagination="keyset", order_by="id", per_page=100, iterator=True):
            job_obj = project_obj.jobs.get(job.id)

            if job_obj.status == 'success':
                duration = job_obj.duration
                if duration < jobTimeLimit and job_obj.erased_at is None:
                    try:
                        if debug:
                            print(f"Would have deleted Job {job_obj.id}.")
                        else:
                            job_obj.delete_artifacts()
                            time.sleep(1)
                            job_obj.erase()
                        deleted_jobs += 1
                    except gitlab.GitlabJobEraseError as e:
                        if str(e.response_code) == '403':
                            continue
                        print(f"Error erasing job {job_obj.id} in project {project_obj.name}: {e}")
                else:
                    kept_jobs += 1
                    print(f"Job {job_obj.id} took longer than {jobTimeLimit} seconds, no action needed.")
            elif job_obj.status in ['failed', 'canceled', 'skipped']:
                if job_obj.finished_at is not None:
                    date_str = job_obj.finished_at
                elif job_obj.started_at is not None:
                    date_str = job_obj.started_at 
                elif job_obj.created_at is not None:
                    date_str = job_obj.created_at
                else:
                    print(f"Error evaluating age of Job...")
                    print("DEBUG: {p}".format(p=json.dumps(job_obj.attributes, indent=4)))
                date_check = datetime.strptime(date_str, '%Y-%m-%dT%H:%M:%S.%fZ')
                now = datetime.now()
                if (now - date_check).days > jobLifeLimit:
                    try:
                        if debug:
                            print(f"Would have deleted Job {job_obj.id}.")
                        else:
                            job_obj.delete_artifacts()
                            time.sleep(1)
                            job_obj.erase()
                        deleted_jobs += 1
                    except gitlab.GitlabJobEraseError as e:
                        if str(e.response_code) == '403':
                            continue
                        print(f"Error erasing job {job_obj.id} in project {project_obj.name}: {e}")
            elif job_obj.status == 'running':
                continue
            else:
                print(f"Job {job_obj.id} is not running, success, failed, skipped, or canceled, unknown action needed.")
                print("DEBUG: {p}".format(p=json.dumps(job_obj.attributes, indent=4)))
    except gitlab.GitlabListError as e:
        if str(e.response_code) == '403':
            continue
        print(f"Could not get list {e}")

    # Process Pipelines
    for pipeline in project_obj.pipelines.list(iterator=True):
        pipeline_obj = project_obj.pipelines.get(pipeline.id)
        
        total_jobs = 0
        for job in pipeline_obj.jobs.list(iterator=True):
            job_obj = project_obj.jobs.get(job.id)
            if job_obj.erased_at is None:
                total_jobs += 1

        if total_jobs == 0:
            try:
                if debug:
                    print(f"Would have deleted Pipeline {pipeline_obj.id}.")
                else:
                    pipeline_obj.delete()
                deleted_pipelines += 1
            except gitlab.GitlabDeleteError as e:
                print(f"Error deleting pipeline {pipeline_obj.id} in project {project_obj.name}: {e}")
                print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))

        # Additional Cleanup for Non-Successful Pipelines
        if pipeline_obj.status != 'success':
            try:
                date_str = pipeline_obj.started_at or pipeline_obj.created_at
                date_check = datetime.strptime(date_str, '%Y-%m-%dT%H:%M:%S.%fZ')
                now = datetime.now()

                if (now - date_check).days > pipelineLifeLimit:
                    try:
                        if debug:
                            print(f"Would have deleted Pipeline {pipeline_obj.id}.")
                        else:
                            pipeline_obj.delete()
                        deleted_pipelines += 1
                    except gitlab.GitlabDeleteError as e:
                        print(f"Error deleting pipeline {pipeline_obj.id} in project {project_obj.name}: {e}")
            except:
                print(f"Error processing pipeline {pipeline_obj.id} in project {project_obj.name}")
                print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))

    # Final Summary
    print(f"\nProject '{project_obj.name}' Cleanup Results:")
    print(f"Kept Jobs: {kept_jobs}")
    print(f"Deleted Jobs: {deleted_jobs}")
    print(f"Deleted Pipelines: {deleted_pipelines}")

    # Optional Housekeeping
    if not debug:
        project_obj.housekeeping(task='prune')
        time.sleep(2)

Explanation

Key Features of the Script

Cleanup Criteria:

  • Deletes jobs that were successful but took less than 30 seconds.
  • Removes failed or canceled jobs older than 1 day.
  • Deletes pipelines with no active jobs and non-successful statuses.

Error Handling:

  • Includes error handling for API calls.
  • Skips deletion if the resource doesn’t exist or cannot be accessed.

Efficiency:

  • Uses pagination to fetch large numbers of projects, jobs, and pipelines efficiently.
  • Implements time.sleep() to avoid overwhelming the GitLab API with too many requests in a short period.

Best Practices

Test First:

Before running the script in production, test it on a small subset of your projects or in a staging environment.

Backup Data:

Ensure you have backups of your GitLab instance before running any cleanup scripts.

Monitor Logs:

Review the output logs to identify and address any issues.

Conclusion

This script provides an automated way to clean up unused pipelines and jobs in your GitLab self-hosted environment, helping you maintain better performance and organization. You can customize the criteria and thresholds based on your specific needs.

If you have feedback or encounter any issues, feel free to leave a comment below!

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.