Glaucoma classification using CNNs

Glaucoma is an eye disease that affects millions of people globally, being the second most common cause of blindness. This chronic disease damages progressively the eye’s optic nerve and causes vision loss if not controlled. Therefore, early diagnose is extremely important in order to start the corresponding monitoring and treatment as with the proper treatment the risk of blindness can be minimized. Computer aid diagnosis using fundus images is a non-invasive technique that may help to rapidly evaluate the risk of glaucoma automatically by using computer algorithms. Deep learning techniques and particularly convolutional neural networks have gained popularity in the last years due to their capability of achieve outstanding results specially in computer vision tasks, like medical image classification.

Dataset

In order to train the deep learning system, we used a collection of fundus images from the REFUGE dataset [12]. This dataset contains a total of 1200 colour fundus images with their corresponding glaucoma status and pixel-wise annotations (ground truth). The pixel-wise annotations of the optic disc and cup were done manually by 7 glaucoma specialists from the Sun Yat-sen University, in China. The segmentation annotations are stored as images indicating the optic disc in grey colour and the cup in black colour

The dataset is already split in 3 subsets for training, validation, and test. Each of these subsets consists of 400 photographs, containing the same proportion of glaucomatous and non-glaucomatous images, that is, 10% of the images correspond to patients with glaucoma and 90% of patients that do not have the illness. The pictures taken with the Zeiss Visucam 500 are used for training whereas the validation and test sets contain the images acquired by the Canon CR2 device, with lower resolution.

Image data augmentation

Image data augmentation techniques were used in order to create different and plausible training samples. This contributes to avoid overfitting and helps the network to generalize. The following transforms were applied to the training dataset in order to make the most of the few samples available:

- Rotation: the images were randomly rotated some degrees, between 0 and 90 degrees. Training the model with rotated images was done to allow the model to be invariant to small rotations of the input images.

- Brightness modifications: the images were randomly darkened or brightened. This was done with the intention of making the model generalize across images with different levels of light.

- Horizontal flip: the training images were randomly flipped horizontally. Only horizontal flips were considered as vertical flips of the fundus images would not make sense.

Cropping the images

An additional approach using cropped images was adopted in order to assess the need of pre-processing the images using segmentation. The pictures were cropped around the OD, positioning it on the upper left corner of the image. This was done using the information of the masks provided by the challenge. This approach, inspired on the methodology adopted by one of the participants in the previous REFUGE challenge (winning the second place) [12], was adopted to include the RNFL (retinal nerve fiber layer) in the image, whose flaws are associated with visual field defects [43].

Models

To overcome the challenge of detecting glaucoma in fundus images using the REFUGE dataset, two different standard architectures were used with transfer learning. Transfer learning involves the use of an existing neural network which has previously been trained on one task, showing a good performance, and applying it to a different but related task. This technique has shown successful results for situations with small datasets and allows for a faster training as the models are not trained from scratch. For this thesis two pre-trained models were used: VGG16 and ResNet50.

Both VGG16 and ResNet50 models were pre-trained with the ImageNet weights. All the layers of the models were frozen, so the original layers of the models are not trained.

As we are working with a binary classification, the last layer of the model was set to have only one dimension and a sigmoid activation function. Therefore, the predictions were given in a one-dimensional array of probabilities in the range of 0 to 1.

In the experiment that uses the pre-trained VGG16 model, a fully connected layer of size 512 and a dropout layer were added before the output layer.

The model that used ResNet50 contains two dense fully connected layers, with sizes 2048 and 512, that were added before the output layer, followed by a dropout layer to avoid overfitting. Both models were tested with the whole REFUGE images and images cropped around the optic disc and optic cup.

A second approach was tested, unfreezing and training the last block of convolutional layers but the results turned out to be worse, so it was decided to freeze all the layers and use just that approach.

Challenges

The complexity of the problem is increased since the data is unbalanced, that is, there is an unequal distribution of the two classes. There are considerably more samples with no glaucoma (90% of the total), and very few glaucoma images. Therefore, to overcome this problem, the loss function was weighted during training by assigning a higher value to the glaucoma instances, which is the under-represented class. A second strategy upsampling the glaucoma instances was tested, but since the results were not successful only the weights were used in the training. Another solution often used to tackle problems with imbalanced datasets is to undersample the images of the larger class. This solution was deprecated due to the few available samples in the dataset.

Results

ResNet50

Using the cropped images improved the results using the ResNet50 model and whole images. The following results were obtained after 10 epochs (learning rate of 0.001 and batch size of 32, Adam optimization function):

• Accuracy: 0.91

• AUC: 0.90

• Precision: 0.56

• Recall: 0.55

• F-score: 0.56

VGG16

Results, with 24 epochs are (learning rate of 0.001 and batch size of 32, Adam optimization function):

• Accuracy: 0.93

• AUC: 0.86

• Precision: 0.69

• Recall: 0.55

• F-score: 0.61

Different transfer learning experiments were carried out, using different pre-trained models and input images. The results using the cropped images, although they are the best among all the experiments, are worse than the results obtained in previous REFUGE challenge [12], and not satisfactory for medical applications. These results are due to various reasons. Working with imbalanced datasets is challenging as the system struggles classifying the minority class. Therefore, having more samples of the glaucoma class, for example by combining several datasets, might have improve the results. An additional constraint was to stick to the challenge and train the model using the data splits that were provided by the REFUGE competition. Having a different distribution of the data, with more samples in the training dataset and less proportion of images in the validation and test groups might have improved the results. Once the constraint of the data split in 3 equal sets is disregarded, cross-validation could also be used to improve the generalization of the model.

Future work includes the training of the models using a different partition of the data as well as achieving other challenges suggested by the REFUGE competition, like segmentation of the optic disc and cup, or the localization of the fovea (macular centre). Classification of glaucoma using other datasets besides the REFUGE dataset would also be interesting, and perhaps better results could be obtained by combining datasets.

References

[1] H. A. Quigley i A. T. Broman, «The number of people with glaucoma worldwide in 2010 and 2020,» British journal of ophthalmology, vol. 90, pp. 262-267, 2006.

[2] R. N. Weinreb, T. Aung i F. A. Medeiros, «The Pathophysiology and Treatment of Glaucoma: A Review,» JAMA, vol. 311, p. 1901–1911, 2014.

[3] E. Bonet, «¿Qué es un fondo de ojo?,» Servicio de Oftalmología. Hospital HM Nens, [En línia]. Available: https://hospitaldenens.com/es/guia-de-salud-y-enfermedades/por-que-realizar-fondo-ojo-los-ninos/. [Últim accés: 2020].

[4] H. Hollands i e. al, «Do findings on routine examination identify patients at risk for primary open-angle glaucoma?: The rational clinical examination systematic review,» Jama, vol. 309, núm. 19, pp. 2035-2042, 2013.

[5] Y. Zhou, H. Xiaodong, L. H. L. Liu, F. Zhu, S. Cui i L. Shao, «Collaborative learning of semi-supervised segmentation and classification for medical images,» Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.

[6] Y. LeCun, Y. Bengio i G. Hinton, «Deep Learning,» Nature, vol. 521, p. 436–444, 2015.

[7] A. Krizhevsky, I. Sutskever i a. G. E. Hinton, «Imagenet classification with deep convolutional neural networks,» Advances in neural information processing systems, pp. 1097-1105, 2012.

[8] J. Deng, W. Dong, R. Socher, L. Li, K. Li i L. Fei-Fei, «Imagenet: A large-scale hierarchical image database,» IEEE conference on computer vision and pattern recognition, pp. 248-255., 2009.

[9] K. Simonyan i A. Zisserman, «Very deep convolutional networks for large-scale image recognition,» arXiv preprint arXiv:1409.1556 , 2014.

[10] K. He i et.al., «Deep residual learning for image recognition,» Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

[11] C. Szegedy i et.al, «Going deeper with convolutions,» Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.

[12] J. I. Orlando i e. al., «REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs,» Medical image analysis, vol. 59, p. 101570, 2020.

[13] R. a. Jmarchn, «Schematic diagram of the human eye,» 2007. [En línia]. Available: https://commons.wikimedia.org/wiki/File:Schematic_diagram_of_the_human_eye_en.svg. [Últim accés: July 2020].

[14] «Oftalmología,» de Manual CTO de medicia y cirugía, CTO Editorial, 2014.

[15] J. Bader i S. J. Havens, «Tonometry,» StatPearls Publishing, p. StatPearls [Internet], 2019.

[16] «Five Common Glaucoma Tests,» Glaucoma research foundation, 09 January 2020. [En línia]. Available: https://www.glaucoma.org/glaucoma/diagnostic-tests.php#pachymetry.

[17] G. Lim, Y. Cheng, W. Hsu i M. L. Lee, «Integrated optic disc and cup segmentation with deep learning,» IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 162-169, 2015.

[18] V. Lodhia, S. Karanja, S. Lees i A.Bastawrous, «Acceptability, usability, and views on deployment of Peek, a mobile phone mHealth intervention for eye care in Kenya: qualitative study,» JMIR mHealth and uHealth, vol. 4, 2016.

[19] J. Liu i et.al., «Automatic glaucoma diagnosis through medical imaging informatics,» Journal of the American Medical Informatics Association, pp. 1021-1027, 2013.

[20] A. Ng, «Neural Networks and Deep Learning,» Coursera, video lectures, 2020. [En línia]. [Últim accés: April 2020].

[21] X. Chen i et.al., «Glaucoma detection based on deep convolutional neural network,» 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp. 715-718, 2015.

[22] M. Christopher i et.al., «Performance of deep learning architectures and transfer learning for detecting glaucomatous optic neuropathy in fundus photographs,» Scientific reports, vol. 8, núm. 1, pp. 1-13, 2018.

[23] J. J. Gómez-Valverde i et.al., «Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning,» Biomedical optics express, vol. 10, núm. 2, pp. 892-913, 2019.

[24] C.Szegedy i et.al, « . Rethinking the inception architecture for computer vision. In,» Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826, 2016.

[25] N. J. Nilsson, Principles of Artificial Intelligence, Morgan Kaufmann, 1980.

[26] I. Goodfellow, Y. Bengio i a. A. Courville, Deep learning, MIT press, 2016.

[27] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau i S. Thrun, «Dermatologist-level classification of skin cancer with deep neural networks,» Nature, vol. 542, pp. 115-118, 2017.

[28] H. Lee, P. Pham, Y. Largman i A. Ng, «Unsupervised feature learning for audio classification using convolutional deep belief networks,» Advances in neural information processing systems, pp. 1096-1104, 2009.