BY: CHRISTOPHER ODEY – 33049838
I.
IMPORTANCE OF DATA IN CONTEMPORARY SOCIETY AND ITS LIMITATIONS.
Introduction
Data has become essential in the development of modern societies. The significance of data exists in various forms; it includes businesses streamlining their processes by collecting employer information and making payment decisions to government’s using big data to enhance services rendered to the public and lots more (Garrison et al., 2007). This essay will explore the role of data in a typical modern society and its importance and limitations.
As a result of technological expansion, corporations use massive amounts of information about their clients, suppliers, and processes, and millions of networked detectors implanted in devices such as smartphones and automobiles, gathering, sharing, and generating data. The expansion of social media and use of smartphones will contribute to the growth of data generation (Manyika et al., 2011). Business organizations face stiff competition to gain market advantage; they garner massive amounts of information to meet market demands and are still examining more profitable methods to obtain value from their data and compete in the marketplace (LaValle et al., 2010).
Data is essential in addressing societal issues and improving the quality of life. For example, in urban planning in the last decade, ways of collecting and computing big data and communicating evidence have improved significantly and led to the implementation of new technologies to tackle urban challenges like climate change, metropolitan competitiveness, and other issues of sustainable expansion (Giffinger, 2021). Also, the scholarly community relies on data to improve knowledge, conduct research, analyse data through statistical and analytical methods to address complex societal problems (Tansley et al., 2009). Initiatives like open data repositories improve data sharing and knowledge exchange, propelling innovation and scientific discovery (Borgman, 2015).
Importance of Data in a Contemporary Society
Over the years, scholarly publications and industry reports have documented the role of data across various sectors, as data serves as a tool for decision-making, enabling organizations to make informed decisions provided with practical proof rather than guesswork (Manyika et al., 2011). This section will uncover the importance of data in society, the economy, and academics.
The benefits of data in societies are evident in new perspectives on various aspects of daily routines, ranging from sleep quality to educational advancement. For example, Post–World War II era, coffee consumption changed, as individuals used coffee as a means of uniqueness and authenticity in a postmodern society. This change in coffee consumption stimulated quality, leading to the rise of specialty coffee production. Coffee with high quality and complex flavour relies on data-driven techniques to ensure precise roasting processes and taste development, as roasters use data to monitor and modify variables such as temperature, time, and bean traits, allowing them to achieve consistency and precision in flavour extraction (van Es & Verhoeff 2023). Data is also important in healthcare, where the use of analytic tools, visualization methods, workflows, etc. can handle massive volumes of data quickly and can lead to improved outcomes and reduced costs (Karaoulanis, 2018).
Data analysis helps researchers to handle enormous volumes of data quickly, ushering transformation advancement in research methods. For example, the “Cosmic Genome Project, undertaken by the Sloan Digital Sky Survey, shows how data analytics has transformed research in astronomy. A collaboration of institutions collected and analysed enormous datasets from the universe, resulting in discoveries such as photometric surveys in multiple bands and spectroscopic redshift surveys (Tansley, et al., 2009).
Limitations of Data in a Contemporary Society
The importance of data ranges from its uses in social activities to providing value for so numerous industries. However, the limitations associated with data use stem from diverse elements, including the magnitude, diversity, rate, and integrity of data (Manyika et al., 2011). Algorithmic bias arises as big data drives the utilization of algorithms in routine for essential tasks. As algorithms gain freedom, they often become invisible, making it problematic to examine their impartiality status and could worsen social inequalities (Karaoulanis, 2018). Solove (2008) underscores the ethical implications of data collection in the context of privacy invasion and management. Data collection without adequate consent can intrude upon individuals’ right to privacy. As organizations store enormous personal data, there is always an increasing concern about the invasion of privacy which raises concern. These forms of invasion include unapproved use of credentials and access to personal information, data violations or breaches, and misusing data for other purposes not approved. Furthermore, the storage of vast quantities of personal data raises queries in the management of privacy in areas of guaranteeing adherence to privacy regulations, the ability to implement robust security measures, and setting transparent data collection, storage, and sharing processes.
References:
Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. MIT Press.
Garrison Jr, L. P., Neumann, P. J., Erickson, P., Marshall, D., & Mullins, C. D. (2007). Using real‐world data for coverage and payment decisions: the ISPOR real‐world data task force report. Value in health, 10(5), 326-335.
Giffinger, R. (2021). Smart city: The importance of innovation and planning. In Smart Cities, Green Technologies and Intelligent Transport Systems: 8th International Conference, SMARTGREENS 2019, and 5th International Conference, VEHITS 2019, Heraklion, Crete, Greece, May 3–5, 2019, Revised Selected Papers 8 (pp. 28-39). Springer International Publishing.
Karaoulanis, A. (2018). Big Data, what is it, its limits and implications in contemporary life.
LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2010). Big data, analytics and the path from insights to value. MIT sloan management review.
Magdum, R. (2022). What is Data Exploration? and its Importance in Data Analytics. International Research Journal of Engineering and Technology (IRJET), 9(1), 1482. https://doi.org/10.21013/jte.v11.n2.p9
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity.
Schaeffer, C., Booton, L., Halleck, J., Studeny, J. & Coustasse, A. (2017). Big Data Management in US Hospitals: Benefits and Barriers. The health care manager, 36(1), 87-95.
Solove, D. J. (2008). Understanding privacy.
Tansley, S., & Tolle, K. M. (2009). The fourth paradigm: data-intensive scientific discovery (Vol. 1). T. Hey (Ed.). Redmond, WA: Microsoft research.
van Es, K., & Verhoeff, N. (2023). Situating Data: Inquiries in Algorithmic Culture (p. 291). Amsterdam University Press.
II.
Data analysis (Process, Results, Findings & Visualisations)
Data set 1
My data analysis will be sourced from Sheffield Hallam, tutor Blackboard data repositories. My first data analysis is on Streaming Platforms Data (Movies and Shows), Data downloaded from Kagle: https://www.kaggle.com/code/shivamb/netflix-shows-and-movies-exploratory-analysis. The focus on this analysis is to determine the most watched movies that cuts across all the streaming platforms (Netflix, Amazon prime, Disney plus, and Hulu). Secondly, I will analyse the most watched genre in all individual platforms.
Process
To enable me represent my data on the most watched movie across the streaming platforms, I chose to use Bag of words (BoW). It involves representing text data and focusing on the frequency of words within the document. To achieve this, I reduced the sample size of each streaming platform to 1000 and combined into a single workbook. This is to enable me properly analyse the data due to limitations of my computer low memory performance.
Fig 1: Spreadsheet showing a combination of streaming platforms with a sample of 1000 movie titles.
Fig 2: Orange Tool processing data from streaming platforms to create Bag of Words
I heard about the orange tool for the first time in one of the Think free seminars (BONUS TRACK III: Other software of interest. Orange.org) and since then I was keen on understanding how it works. In my quest for knowledge, I watched several video tutorials on https://www.youtube.com/@OrangeDataMining, where I learnt data workflows, widget and channels, hierarchical clustering etc.
The process begins by uploading the data on the Orange Tool by adding the data into the corpus widget, this is the primary dataset which makes the .csv file format readable as the text data is extracted, analysed and processed. As shown in Fig. 2 above, the corpus widget is linked with the preprocess widget, breaking down all individual words or “tokens.” Then, a combination of all the unique words with its frequency found in the corpus is created in the Word cloud.
Result
Fig 3: Word Cloud visualization of movie titles on movie streaming platforms
The above shows that the most watched movie or movie title is Disney as it carries more weight in the occurrence in the word cloud data map.
Most watched genre on Netflix, Amazon Prime, Disney plus & Hulu streaming platforms
Process
I used the reduced sample size of 1000 movie titles and their genres in each streaming platform to know the most viewed genre. This is to enable me properly analyse the data due to limitations of my computer low memory performance. In the Think Free Seminar, we discussed the use of the Gephi tool. It is used to explore, analyze, and visualize complex network data in an interactive style. To enable me analyse this data I followed the steps shown in our course material (Installing Gephi: Session 3) which follows; the analysing Mentions networks and Hashtag networks. However, I hit rock bottom with the approach because my data didn’t include either mentions or hashtag network. To succinctly analyse my data, I watched YouTube on Gephi tutorials (https://www.youtube.com/watch?v=371n3Ye9vVo&list=PLk_jmmkw5S2BqnYBqF2VNPcszY93-ze49), which included, Importing files in Gephi, Modularity tutorial, Labels and Colors, and Nodes.
Steps
- On the Table 2 net site, (https://medialab.github.io/table2net/), add the single streaming platform.csv. Choose one Node (column Y will define the links and column X will define the comma-separated nodes) then build and download the Gephi readable .GEXF file.
- Import the GEXF file to the Gephi tool
- On the Layout tab run ForceAtlas (help visualise community structures within the network by grouping nodes that are compactly connected to each other)
- On the Statistics tab to compute Between the Centrality– Network Diameter and Run
- On the appearance tab, Node – Size – Rank (20 to 119). This is changing the size of the nodes between the centrality with the higher betweeness at 119 and lowest at 20.
- On the Statistics tab, run Modularity to help better understand the interaction patterns in the Network and on the appearance apply Modularity class.
- Apply Node labels ‘T’
- Filter Topology and adjust the degree range. This would remove the smallest nodes and make the visualisation cleaner.
- Export result as .PNG
Results
Fig 4. Most watched movie genre on Netflix streaming platform
Fig 5. Most watched movie genre on Amazon Prime streaming platform
Fig 6. Most watched movie genre on Disney Plus streaming platform
Fig 7. Most watched movie genre on Hulu streaming platform
Figure 4, 5, 6, and 7 above shows an interactive analysis by the Gephi tool of the most watched genre in all individual platforms. It shows that the most watched genre on Netflix are international TV shows, other weighted movies watched includes action & adventure movies, international movies, horror movies etc. On Amazon prime kids movies tops the chart as most watched genre, others also watched include Unscripted, special interest, international movies etc. On Disney plus action-adventure movies tops the charts as most watched genres closely followed by comedy and family movies. On Hulu streaming channel comedy genre tops the chat as most watched, others include family, drama and documentaries.
This analysis provides understanding into viewer preferences across the listed streaming platforms, which offers prospects to optimize the listed platforms, to improve content, and improve audience engagement.
Data Set 2
This dataset contains information on Steam, the world’s most popular gaming hub. It includes 20,000 rows of spreadsheet which shows game user behaviours, with columns: user-id, game-title, behaviour-name, value. The behaviours included are ‘purchase’ and ‘play’. The value indicates the degree to which the behaviour was performed – in the case of ‘purchase’ the value is always 1, and in the case of ‘play’ the value represents the number of hours the user has played the game.
The aim of this analysis is to evaluate performance of games in respect to their total game purchases and playtime and assess their popularity and satisfaction which could help to improve user experience.
Process
Over the years I have had various experiences with spreadsheet especially in its role in analysing and linking data. However, this is an unfamiliar territory because I am dealing with multiple game titles, multiple users and a variation between purchases and playtime. To achieve the analysis of games performance in respect to their total game purchases and playtime, I used Think free class seminar course material (Session 2: Data management-Spreadsheet model), Data camp (Introduction to Excel, Introduction to Google sheets, Introduction to Power Query in Excel, and Conditional formatting in Google Sheets)
Steps
- I opened a new Tab and copied the entire column of Game titles from the original dataset sheet (steam-200k).
- On the new Tab, I removed the duplicate game titles (Menu – Data Tab – Remove duplicate). The reason for this is to identify the individual games shown in the dataset. This shrunk the number of rows from 20,000 to 5,155 game titles
- Apply the COUNTIF(‘steam-200k’!B:B,Sheet1!A1) formula on cell one and drag to the 5,155th cell.
- Select the 2 Columns (Game Titles, Purchases/Playtime) Insert Chart, increase range to accommodate games with the most purchases and playtime
- Custom Sort by cells by Purchase/playtime from largest to smallest
Result
Fig 8. Graphical representation of Steam Hub Game Playtime/purchase Analysis
Fig 8 above shows a graphically representation of Steam hub game playtime and purchase analysis. It shows that the game Eldevin has the highest game playtime and purchase with 9682. This analysis will help to assess game popularity and optimum satisfaction level which could help to improve user experience.
CONCLUSION
I found Think Free: Contemporary issues in digital cultures to be one of the most exciting module this second semester because of the ideals the module carries as the name implies. I have been familiar with the use of Microsoft Excel in the past. I acquired a self-thought knowledge as I was able to create a sales analysis for a month for a retail firm. Part of my self-thought experience was linking sheets using the basic ‘=’, ‘+’, ‘-‘ formulas and providing a monthly summary of all sales indexes.
This module has allowed me to explore deeply the role of data in analysing both large and small data using easy-to-learn formulas I was never familiar with (COUNTIF – counts the numbers of cells within a range that meets a condition, UNIQUE – extract unique values from a list). In a bid to learn further, I used of Data camp, YouTube, LinkedIn learning and class work to learn further ways to efficiently analyse data.
Data Camp provided interactive courses on machine learning, data manipulation and visualisation. I completed several courses which included lessons, exercises, and projects using real-world situations. It opened my knowledge to further improve me data and technical skills in popular data analysis tools and programming languages like Python, R, and SQL.
The importance of YouTube in my data analysis journey cannot be underplayed as I used various tutorials to sharpen my data analysis techniques with the use of tools like orange tool, Gephi and Google Spreadsheet (I further explored various formulas like ADDIF – which adds cells in a given condition, MAX – Brings back the largest value in a list, MIN – Brings back the smallest value in a list, VLOOKUP – This formula searches for a value in the first column and brings back another value in the same row from another column, IF – as the name implies this formula helps in logical test on if a value is true or false, CONCATENATE – I was hooked on this formula because at the course of our learning, I was introduced to this word for the first time but didn’t know it could be found in a spread sheet and found it to be simply join two or multiple texts in one string).
During the class seminars I signed up for the LinkedIn Learning which offer a wide range of courses on data analysis and related topics taught by industry professionals. I look forward to earn certificates after completing my courses, which will enhance my professional portfolio and demonstrate my data analysis proficiency to potential employers.
Finally, the class seminars provided interactive learning and collaboration with course mates and tutors. I gained unquantifiable knowledge from Digital Ethnography (Guest lecture), it has broadened my knowledge on how it can help to observe and describe processes in online communication, help in understanding social formations and the meanings of attributes in this formations. These have helped me to improve my understanding on academic models, and have gained practical experience through group exercises.
BIBLIOGRAPHY
Ahmet, J., Van den Broeck, M., Rosseel, C., & Prassides, I. (n.d.). *Introduction to Excel*. DataCamp. https://app.datacamp.com/learn/courses/introduction-to-excel
Chapman, J., & Peterson, A. (n.d.). *Introduction to Google Sheets*. DataCamp. https://app.datacamp.com/learn/courses/introduction-to-google-sheets
Girard, L., Rosseel, C., & Prassides, I. (n.d.). *Introduction to Power Query in Excel*. DataCamp. https://app.datacamp.com/learn/courses/introduction-to-power-query-in-excel
Jengolbeck. (n.d.). Updated Gephi Quick Start Tutorial for v 0.9 [YouTube video]. Retrieved from https://www.youtube.com/watch?v=371n3Ye9vVo
Orange Data Mining. (n.d.). Orange data mining tutorials [YouTube Channel]. Retrieved from https://www.youtube.com/OrangeDataMining
Steinfurth, A., Ismay, C., & Peterson, A. (n.d.). *Conditional Formatting in Google Sheets*. DataCamp. https://app.datacamp.com/learn/courses/conditional-formatting-in-google-sheets
Rodriguez-Amat, J. R. (2024). *Session 2 – Data Management and Types*. [Blackboard]. https://shuspace.shu.ac.uk/webapps/blackboard/content/listContent.jsp?course_id=_345087_1&content_id=_12725284_1
Rodriguez-Amat, J. R. (2024). *Session 4 – Computer Enabled Textual Analysis*. [Blackboard]. https://shuspace.shu.ac.uk/webapps/blackboard/content/listContent.jsp?course_id=_345087_1&content_id=_12725286_1
Rodriguez-Amat, J. R. (2024). *Session 5 – Thinking Networks*. [Blackboard]. https://shuspace.shu.ac.uk/webapps/blackboard/content/listContent.jsp?course_id=_345087_1&content_id=_12725285_1