If you’re running a self-hosted GitLab instance, you might eventually accumulate a large number of old pipelines and jobs that are no longer useful. These can clutter your project dashboards and consume unnecessary storage space. In this guide, we’ll walk through how to clean up these unused resources using a Python script.
This solution assumes that all your projects and their associated pipelines/jobs belong to a single GitLab group.
Prerequisites
- GitLab API Access: Ensure you have the proper permissions to access the GitLab API and manage jobs and pipelines.
- Python 3+ installed on your system.
- Gitlab Python SDK: Install the python-gitlab package, which allows interaction with the GitLab API.
Installation and Setup
Install Dependencies
pip install python-gitlab
Set Up Credentials
- Replace ‘
https://gitlab.domain.com
‘ in the script with your GitLab instance URL. - Replace ‘
CHANGME
‘ with your GitLab private token (you can generate one under your user settings).
Debug or not to Debug
The debug
var allows you to enable or disable changes, leave it as ‘False
‘ to make changes or set to ‘True
‘ to just show what it would change without making any modifications
Run the Script
python3 cleanup_script.py
The Script
Here’s the complete Python script to clean up old pipelines and jobs:
import gitlab, sys, time, json
from datetime import datetime
from pprint import pprint
# Replace with your GitLab instance URL and private token
gl = gitlab.Gitlab('https://gitlab.domain.com', private_token='CHANGME')
group_id = gl.groups.list(search='CHANGME', get_all=False)[0].id
group = gl.groups.get(group_id)
debug = False
jobTimeLimit = 30
jobLifeLimit = 1
pipelineLifeLimit = 1
# List all projects - all=True, include_subgroups=True
try:
projects = group.projects.list(iterator=True)
except gitlab.GitlabGetError as e:
print(f"Error fetching projects: {e}")
sys.exit(1)
for project in projects:
project_obj = gl.projects.get(project.id)
print(f"\nProcessing Project: {project_obj.name}")
kept_jobs = 0
deleted_jobs = 0
deleted_pipelines = 0
# Process Jobs
try:
for job in project_obj.jobs.list(pagination="keyset", order_by="id", per_page=100, iterator=True):
job_obj = project_obj.jobs.get(job.id)
if job_obj.status == 'success':
duration = job_obj.duration
if duration < jobTimeLimit and job_obj.erased_at is None:
try:
if debug:
print(f"Would have deleted Job {job_obj.id}.")
else:
job_obj.delete_artifacts()
time.sleep(1)
job_obj.erase()
deleted_jobs += 1
except gitlab.GitlabJobEraseError as e:
if str(e.response_code) == '403':
continue
print(f"Error erasing job {job_obj.id} in project {project_obj.name}: {e}")
else:
kept_jobs += 1
print(f"Job {job_obj.id} took longer than {jobTimeLimit} seconds, no action needed.")
elif job_obj.status in ['failed', 'canceled', 'skipped']:
if job_obj.finished_at is not None:
date_str = job_obj.finished_at
elif job_obj.started_at is not None:
date_str = job_obj.started_at
elif job_obj.created_at is not None:
date_str = job_obj.created_at
else:
print(f"Error evaluating age of Job...")
print("DEBUG: {p}".format(p=json.dumps(job_obj.attributes, indent=4)))
date_check = datetime.strptime(date_str, '%Y-%m-%dT%H:%M:%S.%fZ')
now = datetime.now()
if (now - date_check).days > jobLifeLimit:
try:
if debug:
print(f"Would have deleted Job {job_obj.id}.")
else:
job_obj.delete_artifacts()
time.sleep(1)
job_obj.erase()
deleted_jobs += 1
except gitlab.GitlabJobEraseError as e:
if str(e.response_code) == '403':
continue
print(f"Error erasing job {job_obj.id} in project {project_obj.name}: {e}")
elif job_obj.status == 'running':
continue
else:
print(f"Job {job_obj.id} is not running, success, failed, skipped, or canceled, unknown action needed.")
print("DEBUG: {p}".format(p=json.dumps(job_obj.attributes, indent=4)))
except gitlab.GitlabListError as e:
if str(e.response_code) == '403':
continue
print(f"Could not get list {e}")
# Process Pipelines
for pipeline in project_obj.pipelines.list(iterator=True):
pipeline_obj = project_obj.pipelines.get(pipeline.id)
total_jobs = 0
for job in pipeline_obj.jobs.list(iterator=True):
job_obj = project_obj.jobs.get(job.id)
if job_obj.erased_at is None:
total_jobs += 1
if total_jobs == 0:
try:
if debug:
print(f"Would have deleted Pipeline {pipeline_obj.id}.")
else:
pipeline_obj.delete()
deleted_pipelines += 1
except gitlab.GitlabDeleteError as e:
print(f"Error deleting pipeline {pipeline_obj.id} in project {project_obj.name}: {e}")
print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))
# Additional Cleanup for Non-Successful Pipelines
if pipeline_obj.status != 'success':
try:
date_str = pipeline_obj.started_at or pipeline_obj.created_at
date_check = datetime.strptime(date_str, '%Y-%m-%dT%H:%M:%S.%fZ')
now = datetime.now()
if (now - date_check).days > pipelineLifeLimit:
try:
if debug:
print(f"Would have deleted Pipeline {pipeline_obj.id}.")
else:
pipeline_obj.delete()
deleted_pipelines += 1
except gitlab.GitlabDeleteError as e:
print(f"Error deleting pipeline {pipeline_obj.id} in project {project_obj.name}: {e}")
except:
print(f"Error processing pipeline {pipeline_obj.id} in project {project_obj.name}")
print("DEBUG: {p}".format(p=json.dumps(pipeline_obj.attributes, indent=4)))
# Final Summary
print(f"\nProject '{project_obj.name}' Cleanup Results:")
print(f"Kept Jobs: {kept_jobs}")
print(f"Deleted Jobs: {deleted_jobs}")
print(f"Deleted Pipelines: {deleted_pipelines}")
# Optional Housekeeping
if not debug:
project_obj.housekeeping(task='prune')
time.sleep(2)
Explanation
Key Features of the Script
Cleanup Criteria:
- Deletes jobs that were successful but took less than 30 seconds.
- Removes failed or canceled jobs older than 1 day.
- Deletes pipelines with no active jobs and non-successful statuses.
Error Handling:
- Includes error handling for API calls.
- Skips deletion if the resource doesn’t exist or cannot be accessed.
Efficiency:
- Uses pagination to fetch large numbers of projects, jobs, and pipelines efficiently.
- Implements
time.sleep()
to avoid overwhelming the GitLab API with too many requests in a short period.
Best Practices
Test First:
Before running the script in production, test it on a small subset of your projects or in a staging environment.
Backup Data:
Ensure you have backups of your GitLab instance before running any cleanup scripts.
Monitor Logs:
Review the output logs to identify and address any issues.
Conclusion
This script provides an automated way to clean up unused pipelines and jobs in your GitLab self-hosted environment, helping you maintain better performance and organization. You can customize the criteria and thresholds based on your specific needs.
If you have feedback or encounter any issues, feel free to leave a comment below!