COMP20007 Design of Algorithms
Assignment 1 2025GeneralGeneralYou must read fully and carefully the assignment specification and instructions.Course: COMP20007 Design of Algorithms @ Semester 1, 2025
Deadline Submission: Monday, 7th April @ 11:59 pmCourse Weight: 10%
Assignment type: individualILOs covered: 2, 3, 4
Submission method: via ED
PurposeThe purpose of this assignment is for you to:Design efficient algorithms in pseudocode.Improve your proficiency in C programming and your dexterity with dynamic memoryallocation.Demonstrate understanding of a representing problems as graphs and implementing a set ofalgorithms.Birds of a featherWhy should we groupspecies?In the field of genetics we are often interested in grouping similar species together to understand howanimals have evolved over time.Understandingevolution allows us to make more informeddecisions in conservation efforts, but may also lead to the discovery of new medicines, as similarspecies produce similar responses to external changes. One possible method of grouping speciestogether is by how similar they are, whichhistoricallyhas been based on how similar their physicalfeatures are. Some examples of Australian birds are shown : Wompoo Fruit-Dove, Middle: Azure Kingfisher, Right: Pale-Yellow RobinNow, one possible grouping of these three birds is to have them grouped by size -we would expectsmall birds like the Pale-Yellow Robin or Azure Kingfisher to have more in common than the largerWompoo Fruit-Dove. Of course, we can also group them based on more features, using analgorithmic approach. Examples of such algorithms are Blake's averages and Cynthia's midpoints.
Other ApplicationsGrouping methods have numerous real-world applications, which include:Disease Management: Tracking information about the spread of sickness and groupingtogether individuals with similar health problems can help us toidentify high-risk populatioareas and take preventative steps to save livesResource Allocation: Identifying groups of animals with high/low resource usage can help
better distribute resources and help conservation efforts.
Content Personalisation: Grouping together users with similar tastes can help individualsaccess the content and information they want more often.Group Computation Algorithms
Blake's Averagesnitialization:
Start by selecting the first c birds as the centres of the c groupings desired.Assign all other birds to their nearest group centre.Calculate Averages:
For each group, find the average of all the features.For each group, find the bird that is closest to this average.Re-assign all birds to the new closest groups.Termination:
the main loop until the group centres stop changing, or we exceed a maximum
number of iterations.
Once the centres stop changing, the algorithm terminates.
Output:
The output of the algorithm is the list of birds with their features and groupmemberships, ordered alphabetically.
The time complexity of the Blake's Averages algorithm is where is the numberof data points, is the number of features per point, is the desired number of clusters, and is thenumber of iterations. As should not vary that much and effectively be constant, we will have acomplexity of roughly .
O(n × d × c × i) n d c i i O(n × c × d)
PseudocodeBlakeAverages(birds, numBirds, numGroups):
Initialization:
Start by selecting the first c birds as the centres of the c groupings desired.Assign all other birds to their nearest group centre.
Calculate Midpoints:
For each group, find the median of the numeric features.For each group, find the mode of the categorical features.For each group, find the bird that is closest to this combination of midpoints.Re-assign all birds to the new closest groups.
Termination: Repeat the main loop until the group centres stop changing, or we exceed a maximunumber of iterations.Once the centres stop changing, the algorithm terminates.Output:
The output of the algorithm is the list of birds with their features and groupmemberships, ordered alphabetically.
As we will sort the elements every time we need to find a median, the time complexity of the
ynthia's Midpoints algorithm is ,where is the number of data points, is the number of features per point, is the desired numberof clusters, and is the number of iterations. As should not vary that much and effectively 代写COMP20007 Design of Algorithmsbeconstant, we will have a complexity of roughly .
The process of grouping can be visualized in two dimensions, and may look like the following.In the image, different colours represent the different groups.
Example
An example of input and output is provided on the Part 1 Skeleton Slide.Task 1: Group Computation
Part A
Implement the Blake's Averages and Cynthia's Midpoints algorithms to compute a grouping of
birds, as described in the previous slide.
Requirements
Code: Your program should implement the required pseudocode mentioned in previous slides,
as well as a couple helper functions listed in the skeleton code.
Input Format: The input will be a text file where the first line indicates the total number of
Australian birds, and the total number of groups. Subsequent lines represent birds with their
name, colour, weight (grams) and body length (cm) separated by a space (e.g., Red
backed_Fairywren Black 5 10 ). All the birds are sorted in alphabetical order by name using
Unix sorting order.
Output Format: Your program should output all birds, their features and associated groupings
in alphabetical order. For Blake's Averages, this means keeping it in the original ordering (Unix
sorting rules), and in Cynthia's midpoints it will be alphabetically sorted using strcmp sorting
rules. This should be done by traversing your array of birds and printing out each bird. This
should be output to the console (stdout).
Part B
Evaluate both algorithms through experimental analysis by quantifying the average total basic
operations per iteration of the main loop (calculated by dividing the number of operations in the
main loop by the number of iterations taken) of the two algorithms (Blake's Averages and Cynthia's
Midpoints) across various input scales and configurations. Use all of the valid input sets provided.
You may wish to generated more data, and you can do so with the provided files in the analysis
folder.
You must decide what to define as the basic operation for each algorithm, but remember to only
count the basic operations in the main while loop.
Reporting: Write a report including a discussion on the choice of algorithm, the experimental
evaluation (including tables or graphs showing how the average number of basic operations varies
with the input parameters (n and d)), and conclusions drawn from the comparisons. Include any
assumptions or simplifications made in your implementations. In addition, discuss:
Possible improvements that can be made to the main loop of the algorithms, if any, to reduce
complexity.Why you selected this basic operation.
Tip: The global variable numOps is provided to track the operation count. You may need to modify some of
the functions (such as the comparison functions) to increment the operation counter. By default, the
information relating to operation counts is printed to stderr.
Include your operation counting code as an appendix to the report.
Submission Guidelines
Submit your C source code files with appropriate comments explaining the algorithms and data
structures used.
Your report should be in PDF format, including your findings from the experimental evaluation
and any observations and theoretical improvements regarding the performance of the two
algorithms.
The report for Part 1B should be submitted as a PDF file named written_task_1B.pdf . The file
should be uploaded to the home directory of Part 1A, that is, the directory thacontains your
program file birds.c .
Grading Criteria
orrectness of the implemented algorithms and adherence to the requirements.
Efficiency (time and space) and proper storage of birds.Clarity of the report, including the depth of the experimental evaluation and the analysis of the
results.Code readability, structure, and documentation.Task 1 Skeleton
Example of Input:10 3
Australasian_Swamphen Blue 1310 51Australian_Bushturkey Black 2100 64.3Australian_Darter Black 2600 86.5
Australian_King-Parrot Red 195 42Australian_Logrunner Black 56 19Australian_Magpie Black 350 85Australian_Pelican White 6800 188Australian_Rufous_Fantail Grey 10 18.5Australian_White_Ibis White 1475 66.Australian_Wood_Duck Brown 955 46 of Output (for Blake's Averages):Australian_Wood_Duck Brown 955.000000 46.000000 Group: 0Eels in the Kulin NationShort-finned eels are a fish which live in the freshwater systems around south-eastern Australia. To
the First Peoples of the Kulin Nation - the traditional custodians of the lands and waters surroundingwhat is now Melbourne, or Naarm (which is the Boonwurrung/Woiwurrung name for Port Phillip) these eels were very important food sources. The Wurundjeri people of the Kulin Nation had seveseasons rather than the Western four, and one of the seasons was dedicated to the short-finned eelmigration, this season is known as Iuk (https://inspiringvictoria.org.au/2020/08/13/seasons-in-thesky/). During Iuk, the short-finned eels which live in the freshwater systems of the Kulin Nationmigrate out to the ocean to begin their long journey to the warm waters of the Coral Sea, some3,000km away, to breed. However, before setting out for this extensive journey the eels must eat andget fat to survive the long swim. Hence, the people ofthe Kulin Nation would make extensive fish
traps in the river systems to catch and eat the fattened eels during Iuk, which have been described ashaving a buttery taste by a Yorta Yorta person.Feeding and breeding eelsYou are a fresh water eel in the river systems of the Kulin Nation. There are many rivers whichconnect different lakes together, aneventually to the ocean.Part AYou find that these river systems are difficult to navigate, and you wonder if you are running incircles. Write an algorithm to tell if there are any paths in this river system which could form a cycle,i.e. leaving from one lake you could take acertain path of distinct rivers that reaches back to thestarting lake.This algorithm should work for any set of lakes and rivers. There will be a certain number of lakes,each with a unique identifier lakeID ∈ [0, numLakes). Each river will run from one lake toanother, but you may assume that rivers can be travelled in both directions.The first line of input is the number of lakes, and number of rivers in the system respectively. The
ubsequent lines will each represent a river, with the first value being the lakeID it flows from and the
cond being the lakeID it flows to. The input will look like:num_lakes] [num_rivers]
Part B
are very hungry and have heard of some good feeding grounds further inland, but it is a long
want to get there as fast as possible before the food is all eaten up. The amount of time taken to
raverse a river is equal to the river's length. Unfortunately, because of a strong current, it takes wice as long to swim upstream as it does to swim downstream.Write an algorithm that will find the shortest way to reach the feeding grounds from the ocean.
This algorithm should work for any set of lakes and rivers. As with Part 1, there will be a certainmber of lakes, each with a unique identifier lakeID ∈ [0,numLakes). Each river will run fromelake to another and have an associated length with it as well. first line of the input will contain the lakeID which you are starting from, followed by the lakeIDof the destination lake where the feeding grounds are. The second line contains the number of lakesand number of rivers in the system. The subsequent lines each contain a river, with the lakeID of thelake it flows from followed by the lakeID of the lake it flows into, and then the final value is the river's
Part C
The breeding season (Iuk) is fast approaching, so you need to make your way back to the ocean and
onto the Coral Sea. However, you want to maximise the amount of fat you have by the time youreach the ocean. Swimming down rivers costs energy (and burns fat), but in many cases, you need toswim through some rivers and lakes anyway to get to the sea. Moreover, the lakes tend to have somefood in them, which can increase your fat supplies again. Assume that you only travel the riversdownstream because you want to reach the sea as quickly as possible.Propose an algorithm to find the best way to get back to the ocean, ensuring that, in total, you havethe maximum fat stores upon reaching the ocean. Thisalgorithm should run in O((V +E)log(V )). This algorithm may only work with certain sets of rivers and lakes, and it is up to you tocheck whether the input will be solvable in this time complexity as part of your algorithm.Write the pseudocode for this algorithm. You may assume the following:The input graph is a directed weighted graph. Each edge (u, v, w) is represented as a singleelement in the adjacency list for u.There is a function Dijkstra(graph, origin, destination) → (cost, path) that returns thecost and the path of the lowest-cost path from origin to destination.Aspart of the input data, you have an array fatGain[0..numLakes − 1] where fatGain[i]stores the amount of fat (in some units) you always gain when you reachthe lake with ID i.When you swim downstream along a river of length w, you lose exactly w units of fat.At the start of your journey back to the sea, you have K units of fat in your body. You willdiewhen the number of fat units in your body reaches 0.must justify your algorithm design choices in 300 words, including why certain sets of rivers andes won't work with the algorithm.Notes: o marks will be awarded if your algorithm's time complexity is not O((V + E)log(V )), or iour algorithm is incorrect fr the problem. report for Part 2C should be submitted as a PDF file named written_task_2C.pdf . The fileshould be uploaded to the home directory of Part 2B, the directory that contains your programfiles dijkstra.c , graph.c , etc.Task 2, Part A Skeletonotherwise.The main driver functions have already been implemented.The first line of input is the number of lakes, and number of rivers in the system respectively. Thesubsequent lines will each represent a river, with the first value being the lakeID it flows from and thesecond being the lakeID it flows to. The input will look like:[num_lakes] [num_rivers][from_lakeID] [to_lakeID]...output of the program should print "We're running in circles!" if there is a cycle found, and
"Smooth sailing" if not.
Input Data Sets
number of input data files are provided in the subdirectory test_cases . Each file name starts withhe prefix t2a- . The data file t2a-0.txt represents a simplified version of some lakes and water
lows (like rivers, creeks, canals) between them for the area around Lakes Entrance in Victoria. Ithould be noted that this area does not belong to the Kulin Nation (it actually belongs to thpeople). The area was chosen here for illustrative purposes.Also note that t2b-0.txt is the same as t2a-0.txt , but which additional data for using in Part B.Task 2, Part B Skeleton
Write a function dijkstra(graph_t *graph, int origin, int dest, int *path) which computes
the shortest path from origin to dest and returns the cost of this path, and the path should be
This is an individual assignment. The work must be your own work.
While you may discuss your program development, coding problems and experimentation with your
classmates, you must not share files, as doing this without proper attribution is considered
plagiarism.
If you have borrowed ideas or taken inspiration from code and you are in doubt about whether it is
plagiarism, provide a comment highlighting where you got that inspiration.
If you refer to published work in the discussion of your experiments, be sure to include a citation to
the publication or the web link.
“Borrowing” of someone else’s code without acknowledgment is plagiarism. Plagiarism is considered
a serious offense at the University of Melbourne. You should read the University code on Academic
integrity and details on plagiarism. Make sure you are not plagiarizing, intentionally or
unintentionally.
You are also advised that there will be a C programming component (on paper, not on computer) in
the final examination. Students who do not program their own assignments will be at a disadvantage
for this part of the examination.Late Policy
The late penalty is 20% of the available marks for that project for each working day (or part thereof)oerdue.
I you wish to apply for an extension, please review the FEIT Extensions and Special consideration
ge on the subject LMS. Requests for extensions on medical grounds will need to be supported by amedical certificate. Any request received less than 48 oursefore the assessment date (or after the
date!) will generally not be accepted except in the most extreme circumstances. In general, extensions
will not be granted if the interruption covers less than 10% of the project duration. Remember that
departmental servers are often heavily loaded near project deadlines, and unexpected outages can
occur; these will not be considered as grounds for an extension.
Students who experience difficulties due to personal circumstances are encouraged to make use of
the appropriate University student support services, and to contact the lecturer, at the earliest
opportunity.
Finally, we are here to help! Frequently asked questions about the project will be answered on Ed.Requirements: C Programming
The following implementation requirements must be adhered to:
You must write your implementation in the C programming language.
Your code should be easily extensible to multiple data structure instances. This means that the
functions for interacting with your data structures should take as arguments not only the values
required to perform the operation required, but also a pointer to a particular data structure, e.g.
search(dictionary, value) .
Your implementation must read the input file once only.
Your program should store strings in a space-efficient manner. If you are using malloc() to
create the space for a string, remember to allow space for the final end of string character, ‘ \0 ’
( NULL ).
Your approach should be reasonably time efficient.
Your solution should begin from the provided scaffold.
Hints:
- If you haven’t used make before, try it on simple programs first. If it doesn’t work, read the error messages
carefully. A common problem in compiling multifile executables is in the included header files. Note also that
the whitespace before the command is a tab, and not multiple spaces.
- It is not a good idea to code your program as a single file and then try to break it down into multiple files.Start by using multiple files, with minimal content, and make sure they are communicating with each otherbefore starting more serious coding.Programming StyleBelow is a style guide which assignments are evaluated against. For this subject, the 80 characterlimit is a guideline rather than a rule — if your code exceeds this limit, you should consider whetheryour code would be more readable if you instead rearranged it.Some automatic evaluations of your code style may be performed where they are reliable. As
determining whether these style-related issues are occurring sometimes involves non-trivial (andsometimes even undecidable) calculations, a simpler and more error-prone (but highly successful)
solution is used. You may need to add a comment to identify these cases, so check any failing test
outputs for instructions on how to resolve incorrectly flagged issues.Mark Breakdown
There are a total of 10 marks given for this assignment.
Your C programs for Task 1 and 2 should be accurate, readable, and observe good C programming
structure, safety and style, including documentation. Safety refers to checking whether opening a file
returns something, whether mallocs do their job, etc. The documentation should explain all major
design decisions, and should be formatted so that it does not interfere with reading the code. As
much as possible, try to make your code self-documenting, by choosing descriptive variable names.
The remainder of the marks will be based on the correct functioning of your submission.Note that marks related to the correctness of your code will be basedon passing various tests. If yourprogram passes these tests without addressing the learning outcomes (e.g. if you fully hard-codesolutions or otherwise deliberately exploit the test cases), you may receive less marks than issuggested but your code marks will otherwise be determined by test cases. For questions with both written component and a C code component, part of the mark will be given for the passing of testcases, with the remainder from the correctness of the written answer.Task 1 will be marked out of 4 marks, Task 2 will be marked out of 5 marks and C codequality willcomprise the final mark.Additional SupportYour tutors will be available to help with your assignment during the scheduled workshop times.Questions related to the assignment may be posted on the Ed discussion forum, using the folder tagAssignments for new posts. You should feel free to answer otherstudents’ questions if you areconfident of your skillsA tutor will check the discussion forum regularly, and answer some questions, but be aware that forsome questions you will just need to use your judgment and document your thinking.If you have questions about your code specifically which you feel would reveal too much of theassignment, feel free to post a private question on thediscussion forum.Most students find Academic Skills' Research Report Guide extremely valuable in constructing a wellformed and sensible analysis that makes good use of relevant material taught so far in the subject.AcknowledgementsChatGPT was used to help generate some graphs for Task 2A.