OpenCV with Java: YOLO object detection on images

link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

重感情的脸盆 · 淘宝开放平台 - 文档中心· 9 月前 ·

睿智的大熊猫 · 抗肿瘤创新药研发有多难？这几大尖端科技与前沿 ...· 1 年前 ·

挂过科的打火机 · NW.js首页、文档和下载 - Web ...· 1 年前 ·

奔跑的手链 · python字典value是list ...· 1 年前 ·

玩足球的楼梯 · java - What can cause ...· 1 年前 ·

The last few articles and posts , in which I dealt with neural networks, were actually just an introduction to the main thing. This is why I'm here. The Computer Vision and Artificial intelligence in full glory. Object detection and object recognition in both images and video streams. This article will cover the topic of object recognition in the image, the next will be dedicated to video processing. More precise, you’ll learn how to use the YOLO object detector to detect objects in images, using Deep Learning, OpenCV, and, of course, Java. I won't spend much on what it is YOLO. In a few words, YOLO is a single stage detector, trained on COCO dataset. The COCO dataset consists of 80 labels, including people, bicycles, cars and trucks, airplanes etc...We’ll be using YOLOv3 in this article (there is also YOLOv4 and YOLOv5, but for some reason, these versions recognize fewer objects than version 3, I do not know why for now, maybe because of the threshold of confidence I set it, I have to deal with it a little more). import java.io.FileNotFoundException; import java.io.FileReader; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Random; import java.util.Scanner; import org.opencv.core.Mat; import org.opencv.core.MatOfFloat; import org.opencv.core.MatOfInt; import org.opencv.core.MatOfRect2d; import org.opencv.core.Point; import org.opencv.core.Rect2d; import org.opencv.core.Scalar; import org.opencv.dnn.Dnn; import org.opencv.dnn.Net; import org.opencv.highgui.HighGui; import org.opencv.imgcodecs.Imgcodecs; import org.opencv.imgproc.Imgproc; import org.opencv.utils.Converters; import org.opencv.videoio.VideoCapture; import org.opencv.videoio.Videoio; import org.opencv.core.Size; import org.opencv.videoio.VideoWriter; public void detectObjectOnImage() throws FileNotFoundException // load the COCO class labels our YOLO model was trained on Scanner scan = new Scanner(new FileReader("d:\\Eclipse_2019_06\\myWork\\OpenCV_Demo\\yolo-coco\\coco.names")); List<String> cocoLabels = new ArrayList<String>(); while(scan.hasNextLine()) { cocoLabels.add(scan.nextLine()); // load our YOLO object detector trained on COCO dataset Net dnnNet = Dnn.readNetFromDarknet("d:\\Eclipse_2019_06\\myWork\\OpenCV_Demo\\yolo-coco\\yolov3.cfg", "d:\\Eclipse_2019_06\\myWork\\OpenCV_Demo\\yolo-coco\\yolov3.weights"); // YOLO on GPU: dnnNet.setPreferableBackend(Dnn.DNN_BACKEND_CUDA); dnnNet.setPreferableTarget(Dnn.DNN_TARGET_CUDA); // generate radnom color in order to draw bounding boxes Random random = new Random(); ArrayList<Scalar> colors = new ArrayList<Scalar>(); for (int i= 0; i < cocoLabels.size(); i++) { colors.add(new Scalar( new double[] {random.nextInt(255), random.nextInt(255), random.nextInt(255)})); // load our input image Mat img = Imgcodecs.imread("d:\\Eclipse_2019_06\\myWork\\OpenCV_Demo\\images\\dining_table.jpg", Imgcodecs.IMREAD_COLOR); // dining_table.jpg soccer.jpg baggage_claim.jpg // -- determine the output layer names that we need from YOLO // The forward() function in OpenCV’s Net class needs the ending layer till which it should run in the network. // getUnconnectedOutLayers() vraca indexe za: yolo_82, yolo_94, yolo_106, (indexi su 82, 94 i 106) i to su poslednji layeri // u networku: List<String> layerNames = dnnNet.getLayerNames(); List<String> outputLayers = new ArrayList<String>(); for (Integer i : dnnNet.getUnconnectedOutLayers().toList()) { outputLayers.add(layerNames.get(i - 1)); HashMap<String, List> result = forwardImageOverNetwork(img, dnnNet, outputLayers); ArrayList<Rect2d> boxes = (ArrayList<Rect2d>)result.get("boxes"); ArrayList<Float> confidences = (ArrayList<Float>) result.get("confidences"); ArrayList<Integer> class_ids = (ArrayList<Integer>)result.get("class_ids"); // -- Now , do so-called “non-maxima suppression” //Non-maximum suppression is performed on the boxes whose confidence is equal to or greater than the threshold. // This will reduce the number of overlapping boxes: MatOfInt indices = getBBoxIndicesFromNonMaximumSuppression(boxes, confidences); //-- Finally, go over indices in order to draw bounding boxes on the image: img = drawBoxesOnTheImage(img, indices, boxes, cocoLabels, class_ids, colors); HighGui.imshow("Test", img ); HighGui.waitKey(10000);

We load our YOLO object detector trained on COCO dataset ( yolov3.cfg and yolov3.weights files), thus initializing org.opencv.dnn.Net dnnNet object

Then we determine the output layer names ( List<String> outputLayers ) that we need from YOLO. This is because the forward() function in OpenCV’s Net class needs the ending layer till which it should run in the network.

The main thing is actually the forwardImageOverNetwork() method call (I will provide source code bellow), which, in essence, calls the org.opencv.dnn.Net.forward() method . The forwardImageOverNetwork() returns data obtained from neural network (bounding boxes, confidences and class ids)

The rest of code just draw bounding boxes and labels around recognized objects.

private HashMap<String, List> forwardImageOverNetwork(Mat img
                                                  Net dnnNet,
                                                  List<String> outputLayers) {
    // --We need to prepare some data structure  in order to store the data returned by the network  (ie, after Net.forward() call))
        // So, Initialize our lists of detected bounding boxes, confidences, and  class IDs, respectively
    // This is what this method will return:
        HashMap<String, List> result = new HashMap<String, List>();
        result.put("boxes", new ArrayList<Rect2d>());
        result.put("confidences", new ArrayList<Float>());
        result.put("class_ids", new ArrayList<Integer>());
        // -- The input image to a neural network needs to be in a certain format called a blob.
        //  In this process, it scales the image pixel values to a target range of 0 to 1 using a scale factor of 1/255.
        // It also resizes the image to the given size of (416, 416) without cropping
        // Construct a blob from the input image and then perform a forward  pass of the YOLO object detector,
        // giving us our bounding boxes and  associated probabilities:
        Mat blob_from_image = Dnn.blobFromImage(img, 1 / 255.0, new Size(416, 416), // Here we supply the spatial size that the Convolutional Neural Network expects.
                new Scalar(new double[]{0.0, 0.0, 0.0}), true, false);
        dnnNet.setInput(blob_from_image);
        // -- the output from network's forward() method will contain a List of OpenCV Mat object, so lets prepare one
        List<Mat> outputs = new ArrayList<Mat>();
        // -- Finally, let pass forward throught network. The main work is done here:  
        dnnNet.forward(outputs, outputLayers);
        // --Each output of the network outs (ie, each row of the Mat from 'outputs') is represented by a vector of the number
        // of classes + 5 elements.  The first 4 elements represent center_x, center_y, width and height.
        // The fifth element represents the confidence that the bounding box encloses the object.
        // The remaining elements are the confidence levels (ie object types) associated with each class.
        // The box is assigned to the category corresponding to the highest score of the box:
        for(Mat output : outputs) {
            //  loop over each of the detections. Each row is a candidate detection,
            System.out.println("Output.rows(): " + output.rows() + ", Output.cols(): " + output.cols());
            for (int i = 0; i < output.rows(); i++) {
                Mat row = output.row(i);
                List<Float> detect = new MatOfFloat(row).toList();
                List<Float> score = detect.subList(5, output.cols());
                int class_id = argmax(score); // index maximalnog elementa liste
                float conf = score.get(class_id);
                if (conf >= 0.5) {
                    int center_x = (int) (detect.get(0) * img.cols());
                    int center_y = (int) (detect.get(1) * img.rows());
                    int width = (int) (detect.get(2) * img.cols());
                    int height = (int) (detect.get(3) * img.rows());
                    int x = (center_x - width / 2);
                    int y = (center_y - height / 2);
                    Rect2d box = new Rect2d(x, y, width, height);
                    result.get("boxes").add(box);
                    result.get("confidences").add(conf);
                    result.get("class_ids").add(class_id);
        return result;
  Returns index of maximum element in the list
private  int argmax(List<Float> array) 
        float max = array.get(0);
        int re = 0;
        for (int i = 1; i < array.size(); i++) {
            if (array.get(i) > max) {
                max = array.get(i);
                re = i;
        return re;
    private MatOfInt getBBoxIndicesFromNonMaximumSuppression(ArrayList<Rect2d> boxes
                                                     ArrayList<Float> confidences ) {
    MatOfRect2d mOfRect = new MatOfRect2d();
    mOfRect.fromList(boxes);
        MatOfFloat mfConfs = new MatOfFloat(Converters.vector_float_to_Mat(confidences));
        MatOfInt result = new MatOfInt();
        Dnn.NMSBoxes(mOfRect, mfConfs, (float)(0.6), (float)(0.5), result);
    return result;
                            ArrayList<Integer> class_ids,
                            ArrayList<Scalar> colors) {
        //Scalar color = new Scalar( new double[]{255, 255, 0});
        List indices_list = indices.toList();
        for (int i = 0; i < boxes.size(); i++) {
            if (indices_list.contains(i)) {
                 Rect2d box = boxes.get(i);
                 Point x_y = new Point(box.x, box.y);
                 Point w_h = new Point(box.x + box.width, box.y + box.height);
                 Point text_point = new Point(box.x, box.y - 5);
                 Imgproc.rectangle(img, w_h, x_y, colors.get(class_ids.get(i)), 1);
                 String label = cocoLabels.get(class_ids.get(i));
                 Imgproc.putText(img, label, text_point, Imgproc.FONT_HERSHEY_SIMPLEX, 1, colors.get(class_ids.get(i)), 2);
    return img;
    ...and just to add, don't forget to take a look at
https://www.linkedin.com/pulse/opencv-java-recognize-track-objects-video-svetozar-radoj%25C4%258Din/?trackingId=LNhgVBhWScuu%2Bw9OfBvrXQ%3D%3D
which is about recognizing and tracking objects in video..of course, by utilizing OpenVC with Java ;-)
    Hello! Im testing with some video games, but every time i try to write a screenshot with ImageIO.write and pass the Image file to imgcodecs.imread(...) it throws this exception:
Exception in thread "main" CvException [org.opencv.core.CvException: cv::Exception: OpenCV(4.6.0) C:\build\master_winpack-bindings-win32-vc14-static\opencv\modules\core\src\matrix.cpp:749: error: (-215:Assertion failed) m.dims >= 2 in function 'cv::Mat::Mat'
Any help will be appreciated.