You have been provided an export from DCE’s incident response team’s security information and event management (SIEM) system. The incident response team extracted alert data from their SIEM platform and have provided a .CSV file (MLData2023.csv), with 500,000 event records, of which approximately 3,000 have been ‘tagged’ as malicious.
The goal is to integrate machine learning into their Security Information and Event Management (SIEM) platform so that suspicious events can be investigated in real-time. security data.
Each event record is a snapshot triggered by an individual network ‘packet’. The exact triggering conditions for the snapshot are unknown. But it is known that multiple packets are exchanged in a ‘TCP conversation’ between the source and the target before an event is triggered and a record created. It is also known that each event record is anomalous in some way Data Cleaning and Visualisation (the SIEM logs many events that may be suspicious).
A very small proportion of the data are known to be corrupted by their source systems and some data are incomplete or incorrectly tagged. The incident response team indicated this is likely to be less than a few hundred records. A list of the relevant features in the data is given below.
Assembled Payload Size (continuous) | The total size of the inbound suspicious payload. Note: This would contain the data sent by the attacker in the “TCP
conversation†up until the event was triggered Data Cleaning and Visualisation |
DYNRiskA Score (continuous) | An un-tested in-built risk score assigned by
a new SIEM plug-in |
IPV6 Traffic (binary) | A flag indicating whether the triggering packet was using IPV6 or IPV4 protocols (True = IPV6) |
Response Size (continuous)
Data Cleaning and Visualisation |
The total size of the reply data in the TCP
conversation prior to the triggering packet |
Source Ping Time (ms) (continuous) | The ‘ping’ time to the IP address which triggered the event record. This is affected by network structure, number of ‘hops’ and even physical distances. |
E.g.:
·        < 1 ms is typically local to the device ·        1-5ms is usually located in the local network Data Cleaning and Visualisation ·        5-50ms is often geographically local to a country ·        ~100-250ms is trans-continental to servers ·        250+ may be trans-continental to a small network. Data Cleaning and Visualisation Note, these are estimates only and many factors can influence ping times. |
|
Operating System (Categorical) | A limited ‘guess’ as to the operating system that generated the inbound suspicious connection. This is not accurate, but it should be somewhat consistent for each ‘connection’ |
Connection State (Categorical) | An indication of the TCP connection state at
the time the packet was triggered. |
Connection Rate (continuous) | The number of connections per second by the inbound suspicious connection made
prior to the event record creation |
Ingress Router (Binary) | DCE has two main network connections to the ‘world’. This field indicates which
connection the events arrived through |
Server Response Packet Time (ms) (continuous)
Data Cleaning and Visualisation |
An estimation of the time from when the payload was sent to when the reply packet was generated. This may indicate server
processing time/load for the event |
Packet Size (continuous) | The size of the triggering packet |
Packet TTL (continuous) | The time-to-live of the previous inbound packet. TTL can be a measure of how many
‘hops’ (routers) a packet has traversed before arriving at our network. |
Source IP Concurrent Connection (Continuous) Data Cleaning and Visualisation | How many concurrent connections were open from the source IP at the time the event was triggered |
Class (Binary) | Indicates if the event was confirmed malicious, i.e. 0 = Non-malicious, 1 =
Malicious |
The raw data for the above variables are contained in the MLData2023.csv file.
The data were gathered over a period of time and processed by several systems in order to associate specific events with confirmed malicious activities. However, the number of confirmed malicious events was very low, with these events accounting for less than 1% of all logged network events. Data Cleaning and Visualisation Get Myocardial Infarction Paper Help!!
Because the events associated with malicious traffic are quite rare, rate of ‘false negatives’ and ‘false positives’ are important.
Your initial goals will be to
First, copy the code below to a R script. Enter your student ID into the command set.seed(.)
and run the whole code. The code will create a sub-sample that is unique to you.
Use the str(.) command to check that the data type for each feature is correctly specified. Address the issue if this is not the case.
You are to clean and perform basic data analysis on the relevant features in mydata, and as well as principal component analysis (PCA) on the continuous variables. This is to be done using “Râ€. You will report on your findings.
  Categorical Feature      | Category             | N (%)        |
Feature 1 | Category 1 | 10 (10.0%) |
Category 2 | 30 (30.0%) | |
Category 3 | 50 (50.0%) | |
Missing | 10 (10.0%) | |
Feature 2 (Binary) | YES | 75 (75.0%) |
NO | 25 (25.0%) | |
Missing | 0 (0.0%) | |
… | … | … |
Feature k | Category 1 | 25 (25.0%) |
Data Cleaning and Visualisation | Category 2 | 25 (25.0%) |
Category 3 | 15 (15.0%) | |
Category 4 | 30 (30.0%) | |
                                              Missing             | 5 (5.0%)      |
Continuous
Feature |
Number (%)
missing |
Min | Max | Mean | Median | Skewness |
Feature 1 | ||||||
Feature2 | ||||||
…. | …. | …. | …. | …. | …. | …. |
   Feature k                                                                                                                   |
write.csv(mydata,”mydata.csv”)
** Do not read the data back in and use them for PCA **
What to Submit
If you use any references in your analysis or discussion outside of the notes provided in the unit, you must cite your sources.
The report must be submitted through TURNITIN and checked for originality. The R code and data file are to be submitted separately via a Canvas submission link. Data Cleaning and Visualisation
Note that no marks will be given if the results you have provided cannot be confirmed by your code.
Criterion | Contribution to
assignment mark |
Correct implementation of descriptive analysis, data cleaning and PCA in R
·        Working code ·        Masking of invalid/outliers done correctly ·        External sources of code in in APA 7 referencing style (if applicable) ·        Good documentation/commentary |
Â
  20% |
Correct identification of missing and/or invalid observations in the data with justifications. | 10% |
Accurate specification and interpretation of the contribution of principal components and its loading coefficients.
·        Explain why you should scale the observations when running PCA. Data Cleaning and Visualisation ·        Outline the individual and cumulative proportion of variance explained, and comment on the number of components required to explain at least 50% of the variance. ·        Outline the loadings (to specified decimal place) and comment as to their contribution to its respective PC. ·        Tabulation of results – no screenshot |
Â
    15% |
Accurate biplot, with appropriate interpretation presented
·        2-d with clear labels Data Cleaning and Visualisation ·        Interpretation of each biplot i)                  PCA plot – Clustering? Separation? ii)               Loadings plot – vectors (features) and its relation to each of the dimension, and as well as to each other. ·        PCA + Loadings plot – Do any of the features appears to be able to assist with the classification of Malicious events and how? |
Â
   25% |
Appropriate selection of dimension for the identification Malicious events with justification.
·        Choose a dimension, i.e. PC or PC2 and justify why it’s the best for classifying Malicious events |
Â
10% |
Presentation and communication skills – Tables (no screenshots) and figures are well presented and appropriately captioned and are referenced in text. Report, analysis and overall narrative is well-articulated and communicated.
·        All figures and tables should be labelled/captioned appropriately and referenced in text. The labels in the plots should be clear. Data Cleaning and Visualisation ·        Solutions should be in the order that the questions were posed in the assignment. ·        Spelling and grammatical errors should be kept to a minimum. ·        Overall narrative – all interpretation should be in the context of the study. |
Â
     20% |
Total | 100% |
Edith Cowan University regards academic misconduct of any form as unacceptable. Academic misconduct, which includes but is not limited to, plagiarism; unauthorised collaboration; Data Cleaning and Visualisation cheating in examinations; theft of other student’s work; collusion; inadequate and incorrect referencing; will be dealt with in accordance with the ECU Rule for Academic Misconduct (including Plagiarism) Policy. Ensure that you are familiar with the Academic Misconduct Rules.
Applications for extensions must be completed using the ECU Application for Extension form, which can be accessed online.
Normal work commitments, family commitments and extra-curricular activities are not accepted as grounds for granting you an extension as you are expected to plan ahead for your assessment due dates.
Please submit applications for extensions via email to both your tutor and the Unit Coordinator.
Where the assignment is submitted no more than 7 days late, the penalty shall, for each day that it is late, be 5% of the maximum assessment available for the assignment. Where the assignment is more than 7 days late, a mark of zero shall be awarded.
Why Choose Us
Quality Papers
We value our clients. For this reason, we ensure that each paper is written carefully as per the instructions provided by the client. Our editing team also checks all the papers to ensure that they have been completed as per the expectations.
Professional Academic Writers
Over the years, our Acme Homework has managed to secure the most qualified, reliable and experienced team of writers. The company has also ensured continued training and development of the team members to ensure that it keep up with the rising Academic Trends.
Affordable Prices
Our prices are fairly priced in such a way that ensures affordability. Additionally, you can get a free price quotation by clicking on the "Place Order" button.
On-Time delivery
We pay strict attention on deadlines. For this reason, we ensure that all papers are submitted earlier, even before the deadline indicated by the customer. For this reason, the client can go through the work and review everything.
100% Originality
At Grade One Essays, all papers are plagiarism-free as they are written from scratch. We have taken strict measures to ensure that there is no similarity on all papers and that citations are included as per the standards set.
Customer Support 24/7
Our support team is readily available to provide any guidance/help on our platform at any time of the day/night. Feel free to contact us via the Chat window or support email: support@gradeoneessays.com.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
Our Services
Grade One Essays has stood as the world’s leading custom essay writing services providers. Once you enter all the details in the order form under the place order button, the rest is up to us.
Essays
At Grade One Essays, we prioritize on all aspects that bring about a good grade such as impeccable grammar, proper structure, zero-plagiarism and conformance to guidelines. Our experienced team of writers will help you completed your essays and other assignments.
Admissions
Admission and Business Papers
Be assured that you’ll definitely get accepted to the Master’s level program at any university once you enter all the details in the order form. We won’t leave you here; we will also help you secure a good position in your aspired workplace by creating an outstanding resume or portfolio once you place an order.
Editing
Editing and Proofreading
Our skilled editing and writing team will help you restructure you paper, paraphrase, correct grammar and replace plagiarized sections on your paper just on time. The service is geared toward eliminating any mistakes and rather enhancing better quality.
Coursework
Technical papers
We have writers in almost all fields including the most technical fields. You don’t have to worry about the complexity of your paper. Simply enter as much details as possible in the place order section.