Detection Project On Streamlit

Report

A.Machine learning prediction model as APIs

1.Environmental preparation

Anancanda+Pycharm+opencv+Yolov3+streamlit+win10

Anacanda: It can be regarded as a Python with many packages downloaded, often used for data processing and analysis

Pycharm: Classic Python IDE (Integrated Development Environment) with comprehensive features。

OpenCV: A cross-platform computer vision and machine learning software library distributed under the BSD license (open source) that runs on Linux, Windows, Android, and Mac OS operating systems.

Yolov3: Deep Learning Framework

Streamlit: An application development framework specifically for machine learning and data science teams

2.Yolov3 principle

a)Yolov3 Structure

DBL: Darknetconv2d_BN_Leaky in the code, is the basic component of yolov3. It is convolution + BN + Leaky rule.

Resn: n stands for number, there are res1, res2, ... ,res8 and so on, which means how many res_units are contained in this res_block.

Concat: Tensor stitching. Splice the middle layer of darknet with the upsampling of one of the later layers. The splicing operation is different from the residual layer add operation. The splicing will expand the dimension of the tensor, while add just adds directly without changing the dimension of the tensor.

b)New network structure Darknet-53

For basic image feature extraction, Yolov3 uses a network structure called Darknet-53 (containing 53 convolutional layers), which borrows from the residual network by setting up shortcut connections between some layers.

The Darknet-53 network above uses 256*256*3 as input, and the numbers 1, 2, 8, etc. in the leftmost column indicate how many repeated residual components. Each residual component has two convolutional layers and a shortcut link and is schematically shown as follows.

c)Object detection using multi-scale features

Yolov3 uses 3 different scales of feature maps for object detection.

Combined with the above figure, the convolutional network gets one scale of detection results after 79 layers and after several yellow convolutional layers below. Compared to the input image, the feature map used for detection here has 32 times more downsampling. For example, if the input is 416*416, the feature map here is 13*13. Due to the high downsampling, the feature map here has a larger perceptual field, so it is suitable for detecting objects of larger size in the image.

In order to achieve fine-grained detection, the feature map of layer 79 is upsampled again (up-sampled convolution from layer 79 to the right) and then fused with the feature map of layer 61 (Concatenation), so as to obtain a finer-grained feature map of layer 91, and after several convolution layers, a feature map with 16 times down-sampling relative to the input image is obtained. It has a medium-scale perceptual field and is suitable for detecting medium-scale objects.

Finally, the 91st layer feature map is upsampled again and fused (Concatenation) with the 36th layer feature map, and finally the feature map with 8 times downsampling relative to the input image is obtained. It has the smallest perceptual field and is suitable for detecting small-sized objects.

d)9 scales of a priori frames

As the number and scale of the output feature maps changes, the size of the a priori box needs to be adjusted accordingly. Yolov2 has started to use K-means clustering to obtain the prior frame sizes, and Yolov3 continues this approach by setting 3 prior frames for each down sampling scale, clustering a total of 9 prior frame sizes. These 9 prior frames in the COCO dataset are: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), (373x326).

The assignment applies larger a priori frames (116x90), (156x198), (373x326) on the smallest 13*13 feature map (with the largest perceptual field), which is suitable for detecting larger objects.

e)change the object classification softmax to logistic

Instead of using softmax when predicting object categories, the output of logistic is used for prediction instead. This can support multi-label objects (e.g. a person has two labels Woman and Person).

f)Input mapping to output

Without considering the details of the neural network structure, in general, for an input image, Yolov3 maps it to a 3-scale output tensor representing the probability of the presence of various objects at various locations of the image.

For an input image of 416*416, 3 a priori boxes are set in each grid of the feature map at each scale, for a total of 13*13*3 + 26*26*3 + 52*52*3 = 10647 predictions. Each prediction is a (4+1+80)=85-dimensional vector. This 85-dimensional vector contains the border coordinates (4 values), the border confidence (1 value), and the probability of the object class (for the COCO dataset, there are 80 objects).

Summary

Yolov3 borrows the residual network structure to form a deeper network hierarchy, as well as multi-scale detection, which improves mAP and small object detection. If COCO mAP50 is used as the evaluation metric (not too mindful of the accuracy of the prediction frame), YOLO3 performs quite amazingly, as shown in the figure below, with comparable accuracy, Yolov3 is 3 or 4 times faster than other models.

However, if a more accurate prediction of the border is required and COCO AP is used as the evaluation criterion, YOLO3's performance in terms of accuracy rate is a bit weaker. This is shown in the figure below.

3.Environment building process

The lib “opencv-python” in python has the function called “load_network()”, so we only need to get the pre-trained model and its weights. We can find files clled “yolov3.weights” and “yolov3.cfg” from https://pjreddie.com/media/files and download them. We use the functions in opencv to load the information which will be used to predict and detect objects, including net.getLayerNames(), cv2.dnn.readNetFromDarknet() and cv2.dnn.blobFromImage.

4.The process of calling api-Yolov3

We put the images as parameters and let the model do the prediction work. We can get the boxes which represents the index of objects detected, confidence and class_IDs from the prediction. Then we select the kinds of objects that we wanted and drop the useless boxes away. It must be reminded that only if the confidence of something is higher than the threshold that we can trust this thing. Then we can draw the rectangles and then put the text of its kind on rectangles with different colors. At last, just use cv2.imshow() to make it feasible and that will be fine.

a)labels

Yolov3 has about 83 labels and we choose the useful ones of them as followed, including pedestrians, cars, traffic lights and so on.

b)Distinguish the color of traffic lights

We add the function of distinguishing traffic lights. The main idea is to transfer image from RGB/BGR to HSV. And we can statistics the value and saturation of this traffic light’s image. We often compare the values in H,S,V dimensions and then get the highest values from them. For example, if the value of Red is higher than that of Green and Yellow, we take this traffic light as Red traffic light.

B.Streamlit-based platform building

1.Introduction

Streamlit is the first application development framework dedicated to machine learning and data science teams. It is the fastest way to develop custom machine learning tools, and we can assume that it aims to replace Flask in machine learning projects and can help machine learning engineers to develop user interaction tools quickly.

2.Building process

We build streamlit page mainly according to API reference — Streamlit 0.82.0 documentation.

Here are steps to build our streamlit page.

First of all, import streamlit:

Then set page config by using “streamlit.set_page_config()”.

Page title is “Object Detection”, page icon is 🧐.

Streamlit supports emoji shortcodes. For example, by using :face_with_monocle:, the page will show 🧐.

For a list of all supported codes, see https://raw.githubusercontent.com/omnidan/node-emoji/master/lib/emoji.json.

Streamlit apps usually start with a call to set the app’s title. I use “streamlit.title(body, anchor=None)” to display text in title formatting.

Here are parameters about it.

Parameters

· body (str) – The text to display.
· anchor (str) – The anchor name of the header that can be accessed with #anchor in the URL. If omitted, it generates an anchor using the body.

The title is “Object Detection 🧐”

Then display the selectbox in the sidebar by using “streamlit.sidebar.selectbox(label, options, index=0, format_func=<class 'str'>, key=None, help=None)”.

Here are parameters and return about it.

Parameters

· label (str) – A short label explaining to the user what this select widget is for.
· options (list, tuple, numpy.ndarray, pandas.Series, or pandas.DataFrame) – Labels for the select options. This will be cast to str internally by default. For pandas.DataFrame, the first column is selected.
· index (int) – The index of the preselected option on first render.
· format_func (function) – Function to modify the display of the labels. It receives the option as an argument and its output will be cast to str.
· key (str) – An optional string to use as the unique key for the widget. If this is omitted, a key will be generated for the widget based on its content. Multiple widgets of the same type may not share the same key.
· help (str) – A tooltip that gets displayed next to the selectbox.

Returns

The selected option

Return type

any

In streamlit, not only can we add interactivity to our report with widgets, we can organize them into a sidebar. Each element that’s passed to is pinned to the left, allowing users to focus on the content in our app. The only elements that aren’t supported are st.echo and st.spinner. We can add widgets to sidebar by using “st.sidebar.[element_name]”.

With widgets, Streamlit allows us to bake interactivity directly into our apps with buttons, sliders, text inputs, and more.

In our app, I display a select widget in the sidebar. Therefore, users can choose which kind of services they are interested.

According to the return of the selectbox, it will show different kinds of pages.

If the option is “Introduction about this app”, the main page will show the subhead. Besides, it also shows the information including the function of our app, the objects it can identify, some guidance information and our github website.

I display the subhead by using “streamlit.subheader(body, anchor=None)” .

Here are parameters about it.

Parameters

body (str) – The text to display.
anchor (str) – The anchor name of the header that can be accessed with #anchor in the URL. If omitted, it generates an anchor using the body.

I show the introduction by using “magic commands” in streamlit.

Magic commands are a feature in Streamlit that allows us to write markdown and data to our app with very few keypresses. For example, if we use ‘’’ This is a app ‘’’，then the page will show “This is a app”. Any time Streamlit sees either a variable or literal value on its own line, it automatically writes that to our app using streamlit.write(*args, **kwargs).

streamlit.write() is a special command. It does different things depending on what you throw at it. Unlike other Streamlit commands, write() has some unique properties:

You can pass in multiple arguments, all of which will be written.
Its behavior depends on the input types as follows.
It returns None, so its “slot” in the App cannot be reused.

If the option is” Choose the image you want to detect”, it will show the subhead of this page and guidance information in main page. In sidebar, it shows a File download area by using “streamlit.file_uploader(label, type=None, accept_multiple_files=False, key=None, help=None)”. By default, uploaded files are limited to 200MB.

Here are parameters and return about it.

Parameters

label (str) – A short label explaining to the user what this file uploader is for.
type (str or list of str or None) – Array of allowed extensions. [‘png’, ‘jpg’] The default is None, which means all extensions are allowed.
accept_multiple_files (bool) – If True, allows the user to upload multiple files at the same time, in which case the return value will be a list of files. Default: False
key (str) – An optional string to use as the unique key for the widget. If this is omitted, a key will be generated for the widget based on its content. Multiple widgets of the same type may not share the same key.
help (str) – A tooltip that gets displayed next to the file uploader.

Returns

If accept_multiple_files is False, returns either None or an UploadedFile object.
If accept_multiple_files is True, returns a list with the uploaded files as UploadedFile objects. If no files were uploaded, returns an empty list.

The UploadedFile class is a subclass of BytesIO, and therefore it is “file-like”. This means you can pass them anywhere where a file is expected.

Return type

None or UploadedFile or list of UploadedFile

Users can drag and drop file there to upload their own file.

We can show the image the users choose by using “streamlit.image(image, caption=None, width=None, use_column_width=None, clamp=False, channels='RGB', output_format='auto')”

Here are parameters and return about it.

Parameters

image (numpy.ndarray, [numpy.ndarray], BytesIO, str, or [str]) – Monochrome image of shape (w,h) or (w,h,1) OR a color image of shape (w,h,3) OR an RGBA image of shape (w,h,4) OR a URL to fetch the image from OR a path of a local image file OR an SVG XML string like <svg xmlns=…</svg> OR a list of one of the above, to display multiple images.
caption (str or list of str) – Image caption. If displaying multiple images, caption should be a list of captions (one for each image).
width (int or None) – Image width. None means use the image width, but do not exceed the width of the column. Should be set for SVG images, as they have no default image width.
use_column_width ('auto' or 'always' or 'never' or bool) – If ‘auto’, set the image’s width to its natural size, but do not exceed the width of the column. If ‘always’ or True, set the image’s width to the column width. If ‘never’ or False, set the image’s width to its natural size. Note: if set, use_column_width takes precedence over the width parameter.
clamp (bool) – Clamp image pixel values to a valid range ([0-255] per channel). This is only meaningful for byte array images; the parameter is ignored for image URLs. If this is not set, and an image has an out-of-range value, an error will be thrown.
channels ('RGB' or 'BGR') – If image is an nd.array, this parameter denotes the format used to represent color information. Defaults to ‘RGB’, meaning image[:, :, 0] is the red channel, image[:, :, 1] is green, and image[:, :, 2] is blue. For images coming from libraries like OpenCV you should set this to ‘BGR’, instead.
output_format ('JPEG', 'PNG', or 'auto') – This parameter specifies the format to use when transferring the image data. Photos should use the JPEG format for lossy compression while diagrams should use the PNG format for lossless compression. Defaults to ‘auto’ which identifies the compression type based on the type and format of the image argument.

After users choose their file, the app will show the image in both sidebar and main page.

C.Test results based on Loader.io

D.Application

Our app website ishttps://share.streamlit.io/YUA1024/YUA1024/master/streamlit_app.py.

Users can use it by just enter the website.

Our github website ishttps://github.com/YUA1024/YUA1024

Currently, the features of this application are:

1.Upload images and display them

2.Recognition of pedestrians, vehicles, traffic lights, crosswalks, etc. in images

3.mark out the recognition target, through different spatial relationships for different annotation; annotation rules are as follows

4.If there is a failure to comply with traffic regulations, a picture will be sent by email to remind

The city road is the basic skeleton of the city, in a sense, the city road monitoring system is the skeleton of the city security monitoring system. TV monitoring, motor vehicle traffic violation monitoring and capture, smart card interface and other system construction, is an important part of the city road monitoring network system, the construction of urban security prevention and control network system have a very important role. The function realized by our project is the realization of traffic monitoring, based on the original illegal monitoring and capture can only monitor the oncoming vehicles, adding the identification of pedestrians. The project is designed to monitor both vehicles and pedestrians, to control both directions, to reduce traffic accidents and to build a good traffic security in the city.

The user can be the central duty officer of the monitoring and command center, whereby the staff can keep abreast of the road conditions in each region in order to adjust the vehicle flow at each intersection and ensure smooth traffic flow. To monitor the violations of road vehicles and be able to detect and arrange for the handling of road traffic accidents, etc. in a timely manner. The user can be a traffic police officer, who only needs to analyze which intersections are frequently violated by our program, so that the city traffic police manpower can be more reasonably placed and macro-adjusted; the user can also be a highway designer, who can use our program to analyze the existing road conditions more quickly, predict the traffic demand scientifically, and arrange the road construction plan of the city in an integrated manner.

If combined with the two functions of license plate number recognition and face recognition, and calling the government's resident and vehicle database interface, it can realize automatic alerting of vehicles and residents, which will enable intelligent traffic intersection control. In short, the application of this program can have a very positive impact on society. And it has good expandability and high potential.

posted @ 2021-06-21 10:45 网速极慢阅读(190) 评论(0) 收藏举报

刷新页面返回顶部

Detection Project On Streamlit

公告