Last updated on

Image Super-Resolution Reconstruction Based on ResNet (SRResNet)


This note documents my practice on image super-resolution based on ResNet: task background, model architecture, training pipeline, and results. Comparisons with SRCNN and SRGAN are included. The full project is available on my GitHub.

Abstract

Single-image super-resolution (SISR) reconstructs a high-resolution (HR) image from a low-resolution (LR) input. I implemented SRResNet based on residual networks (ResNet): the backbone stacks multiple residual blocks (each contains two convolutions with a skip connection), and the tail uses pixel-shuffle for efficient upsampling. Training uses the COCO2014 train/val splits; the loss is MSE and the optimizer is Adam. On three example categories— portraits, remote sensing, and astronomical backgrounds—SRResNet significantly improves fine details. Compared to GAN-based methods, SRGAN can produce sharper edges but may introduce texture artifacts; SRResNet yields a more balanced and stable result.

1. Task and Data

  • Objective: reconstruct HR from LR (an image-to-image regression task).
  • Dataset: COCO2014 (train2014 + val2014), natural images across many categories.
  • Samples: training inputs are downsampled LR; labels are the original HR.
    COCO Samples

2. Model Design — SRResNet

2.1 Residual Block

Conventional layer-wise mapping (y=f(x)) can accumulate approximation error and suffer gradient decay. Residual learning reformulates the mapping as (y=f(x)+x); the skip connection enables deeper networks with more stable training.
Residual Block

  • In this implementation, each residual block contains two convolutional sub‑blocks (conv_block1/conv_block2) with matched input/output channels. In the forward pass, the input x is added to the output of conv_block2 to form the residual.

2.2 Pixel Shuffle

Common upsampling schemes (bilinear, transposed convolution) may incur information loss or checkerboard artifacts. Pixel shuffle first expands channels and then rearranges them to a higher spatial resolution:

  1. e.g., map ((C,H,W)) via a convolution to ((r^2 C,H,W));
  2. then rearrange channels to ((C,rH,rW)), achieving an upscaling factor of (r).

2.3 Generator

  • The generator backbone is SRResNet: it takes LR as input and outputs SR; the forward pass performs feature extraction and upsampling within SRResNet.

The overall model framework is shown below:

Residual Block

3. Training Setup

  • Loss: mean squared error (MSE), measuring the pixel‑wise discrepancy between (\hat{I}{SR}) and (I{HR}):
    L_MSE = mean( (I_HR - I_SR)^2 )

  • Optimizer: Adam.

  • Procedure: load data → forward → compute MSE → backprop & update → periodically save checkpoints and logs.

4. Comparative Methods

4.1 SRCNN (early CNN‑based approach)

A minimal 3‑layer convolutional structure: feature extraction (9×9) → nonlinear mapping (1×1) → reconstruction (5×5), conceptually aligned with sparse‑coding pipelines. The CNN structure is shown below:
SRCNN

4.2 SRGAN (adversarial approach)

Perceptual quality is optimized within a GAN framework: the generator mirrors SRResNet, and the discriminator distinguishes real HR from generated SR. The loss combines perceptual (content/adversarial) terms with regularization. It can enhance detail but may introduce texture artifacts. The architecture is shown below:
SRGAN Architecture

5. Results (three representative scenarios)

Left: LR (input) | Middle: SRGAN (reference) | Right: SRResNet (this work)

Portrait LR Portrait SRResNet Portrait SRGAN Remote Sensing LR Remote Sensing SRResNet Remote Sensing SRGAN Astro LR Astro SRResNet Astro SRGAN

Observation: SRGAN tends to sharpen edges but may introduce texture artifacts; SRResNet is more balanced and stable (consistent with the qualitative summary).

6. Environment

  • Language: Python; deep‑learning framework: PyTorch
  • GPU: NVIDIA GeForce RTX 3060, CUDA 11.8 (local)
  • Cloud: PaddlePaddle/online environment (for experiments/comparisons)

7. References

  • Ledig et al., “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,” CVPR 2017
  • SRCNN and survey literature on super-resolution (see reference lists and online resources)