Open City Data and its Significance to Smart City

In the context of a city, there is an important distinction between two key types of data collection intentions such as: Organic Data and Purposeful Data.

Organic Data is a byproduct of some transactional process of daily life within cities, like communication, online purchase, tax payment, mobility and so on.

Purposeful Data is a collection through surveys. For Example: census, unemployment rate, household income, political polls etc.

The difference between organic and purposeful data is important as they serve different use cases. Organic Data are useful for measuring fast urban dynamics whereas Purposeful Data is useful to target slow dynamics which are taken as mean change over Month, Year, or even Decades. For example, daily dynamic traffic patterns will be considered as fast dynamic.

Increasingly dynamic source of information about the urban areas are getting generated by passive technologies supplying a variety of real time measures.

They typically involve:

  • Various Sensor technologies including CO2, Temperature, Humidity, Noise and Light. Such sensors can be deployed at various scales ranging from city wide implementation to block level. Such an interconnected monitoring can enable various innovative applications such as geographically targeted warning to those with respiratory problems, advice to pedestrian to avoid polluted areas.
  • Range of emerging technologies linked with smart phone. Sensor can be deployed to detect presence of the smart phone through which foot fall can be estimated.
  • Selected Application can provide streaming location data either in background or as part of the app functionality. Google usages pooled location data to measure speed of drives along road lengths which used to estimate congestion level on the road which appears on the google map.
  • GPS receivers are installed on moving object within the urban areas like the Buses or Taxis.
  • Closed Circuit Television (CCTV) is widely used for real time surveillances. These devices are used for a variety of purposes ranging from security measures to traffic management applications.

So far we discussed various sensor technologies that generate data about people mobility. However human is also a sensor who generates a wide variety of data on social media platform. Data generated on these platforms provides an extensive resource on which a wide variety of urban research has been conducted.

Every City generates enormous amounts of data. When data is consumed for other purposes than it was originally generated for, it becomes more valuable. For example, data collected to send electricity bill can be latter use by another provider to consult power saving ideas for the household purposes. Using data for other purposes is a remarkable way of inspiring urban innovation.

Now, let us talk about how can a city make these data easily available to anyone or anything which can make city smarter? How can a city open up its data?

Open Data is the cornerstone of open governance and transparency and it primarily deals with innovation. New ideas and new solutions immerse when rich data is easily available.

There are eight core principles for open data:

  1. Complete: Open Data requires a complete set for given data set
  2. Primary: This means that the data is from its source and is in its most granular form without being aggregated or modified. Open data should be the raw collected data.
  3. Timely: Data should be made available as soon as it is generated.
  4. Accessibility: Open Data should be available thoroughly connected platform. It should be available in multiple formats and not require any special technology to access it.
  5. Machine processable: Open Data should be easily integrated and processed by other computers and applications.
  6. Nondiscriminatory: Open Data should be available to anyone without the requirements of, say, registering for the data.
  7. Nonproprietary: No one should have exclusive control over the data. Data should not be made available in special format that require an expensive piece of software.
  8. License free: Data should be free to use without it being subjected to any trade mark, patent or regulations

With these eight qualities met, data is said to be open. It can now be used by urban innovators for  innovation.

With Rapid Urbanisation, cities needs lot of new ideas and innovators. Most of the cities does not have enough resources to address the  increasing challenges of rapid urbanisation. To build smarter cities, we need to expand traditional public-private partnership and engage all the talent and capital available. Many governments are opening its repositories of data and making it easily accessible via open data portals. This data may be related to crime, pollution, economics, libraries, finance, infrastructure, and more. What stories and ideas live within this data?What challenges and problems can be solved with this data? Thousands of smart urban solutions are getting created by innovative ideas with open data all over the world.  Many of these solutions are happening because of individuals’ focus on government data to do good social work.

Open Data is a content platform which gives an opportunity for problem solvers to be engaged.

More Engagement will happen if:

  1. Government can arrange events and competition to incentivise good ideas and solutions
  2. We create marketplace where entrepreneurs can see an economic opportunity. easily available Open data providing content on which they can build commercial solutions which can be monetise at the marketplace.

Cities can not address all current and future needs by themselves. They need more wider participation to fulfil the citizen expectations. Open city data is the easiest way to engage talent in urban innovation.

As a consequence, this also means that open data must be core to any smart city strategy.


How Python is used in data science?

In the last some years, data science picked up a lot of attractiveness. Its main focus is changing over significant data into marketing and business systems that allow a company to develop. The data is saved and studied into to get into a rational solution.

In the past, only the top IT companies were associated with this field, but today’s businesses from different area and fields such as health care, e-commerce, finance, and others are using data analytics.

There are different tools existing for information analytics, for example, Hadoop, R programming, SQL, SAS and many more. On the other hand, the most well known and simple to handle tools for data analytics is Python. Known as a Swiss Army knife of the coding world, it carries organized programming, object-oriented programming as well as the functional programming language and others.

As per the StackOverflow survey of 2018, Python is the most popular programming language in the world and is famous for the aptest language for data science tools and applications. In the Hackerrank 2018 developer survey, Python also won the heart of developers, which is shown in their love-hate index.

Python: The perfect choice for Data Science  

Python’s one of the main qualities is simple to use with regards to quantitative and analytical computing. It is leading the industry for a long while now and is by and large being widely used in different fields like oil and gas, signal processing, finance, and others.

Moreover, Python has been utilized to reinforce Google’s internal infrastructure and in creating applications like YouTube. It is broadly used and is a most loved tool along being an adjustable and open sourced language. Python’s huge libraries are used for data use and are very simple to master even for a novice data analyst.

Being an independent platform, it also efficiently unites with any existing infrastructure that can be utilized to resolve the most complicated issues. A large number of banks use it for crunching information, institutions used it for representation and processing, and weather forecast companies like Forecastwatch analytics also use it.

Ethics and Data Science

While data science offers the ability to find patterns in data and innovate new data products for the greater social good, it is ethically neutral. It does not come with its own perspective of what is right or wrong or what is good or bad in using it. While data science has no value framework, organisation have a value system. By asking and seeking answers to ethical questions can ensure it is used in a way that aligns organisation values.

There’s no doubt about it: The future will be machine learning driven and central to this future are the data science. A machine learning model is fuelled by the data they’re trained on. Every advertisement we see, every self-driving cars, every medical diagnosis provided by a machine will be based on data. Data ethics is a rapidly appearing area. Increasingly, those collecting, sharing and working with data are delving into the ethics of their method and, in some cases, being forced to encounter those ethics in the face of public criticism. A failure to handle data ethically can severely impact people and lead to a loss of trust in projects, products or organisations.

Ethical challenges occur when opinions on what is considered right and wrong deviate. For example, should a Data science have the power to decide whether a litigant is released on bail or not? The application built on top of data like COMPAS (Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software system used in US courts) would require evaluating how the data is generated in the first place. The algorithm learns biases contained in the training dataset. Training dataset may contain historical traces of intentional or unintentional discrimination, biased decision, or maybe a sample from populations that do not represent everyone. There are three main ethical challenges related to data and data science

  1. Unfair discrimination – If data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions are drawn from that data might also be based on those bias.
  2. Reinforcing human biases – This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans or been more likely to be convicted of a crime than the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money.
  3. Lack of transparency. – There are two areas required transparency. First the step by step process of modelling and its parameters by which a prediction is made. Depending on the model used this can become very difficult as the exact working of a certain model like a neural network is still unclear. Second, it remains unclear which data is used in making a prediction. A statistical model cannot distinguish between the predictive power of single variables and that a set of variables.

Some ethical and data science concerns while modelling are:

A: Preserving privacy
1. Methods for handling sensitive data
2. Uses of data science that undermine privacy

B. Avoiding bias
1. Data selection and unintentional red-lining
2. Re-inscription of existing biases
3. Reducing the discrimination already present in a training dataset

C. Mitigating malicious attacks
1. Intentional subversion of machine learning systems
2. Hazards of learning from the open internet

Data ethics is in the hands of data scientists, and that won’t be modified any time soon. However, data scientists, who have access to essential tools that can anatomise how people think with an eye towards affecting their behaviour’s, don’t get any ethics training in most programs. That’s a problem we need to fix if we want to avoid a constant flow of questionable uses of data or models.

There are indeed more principles we need to create as more powerful technology become available. Data scientists, data engineers, database administrators and everyone associated with controlling data should have a voice in the ethical debate about how data should be used. Organisations should openly address these difficulties in formal and informal conventions.


How I am fighting bias in algorithms

Reducing discrimination in AI with new methodology

It’s not big data that discriminates – it’s the people that use It

Challenges of transparent accountability in Big Data Analytics

Benefits and ethical challenges in Data Science- COMPAS and Smart Meters 

Analytics over Open Data

Open data, processed with data science applications, can present design alternatives to the traditional structures of government, offering governance models more suited to an increasingly digital society and new sources of evidence for policy-making.

Data analytics can be used by the city administration to trigger improvements across three broad areas: 

  1. Resource optimisation: Cost savings can be achieved by using data to eliminate waste and direct resources more effectively. One powerful example of this is better management of human resources. Using data analytics, one US federal agency halved its staff attrition rates and saved more than US$200 million in the first year, by eliminating retention programmes that it found had no real impact and focusing instead on more effective programmes.
  2. Tax collection: Governments can identify and stop revenue leaks, especially in tax collections. The Australian tax authority analysed more than one million archived tax returns from small- and mid-sized businesses and identified groups with a high risk of underreporting. Targeted reminders and notices increased reported taxable income by more than 65% within those groups.
  3. Forecasting and predicting: Big data analysis can help governments understand ongoing trends and predict where resources are needed. For instance, the Los Angeles police department has used a predictive analytics system to comb through data such as historic and recent criminal activity, predicting where and when specific crimes might occur and dispatching officers accordingly. One study suggests the system is twice as accurate in predicting crime as traditional methods.


Some Open Data Solutions for Cities

Smart Transport:  

  • Monitoring traffic and preventing congestion by predicting its conditions.
  • Organizing traffic to improve Air quality and reduce the impact of pollutants, based on analysis of historical and real-time environmental data.
  • Public transport monitoring for easier access, minimizing downtime, predicting discrepancies and increased safety.


Smart Waste Management: 

  • Using data of waste generation patterns and monitoring solid waste management to optimize resource utilization, efficiency and hygiene.
  • Monitoring sewage disposal and sewage constitution to make the process more hygienic to people and eco-friendly and to minimize dysfunction.


Smart Energy Consumption: 

  • Monitoring electricity usage to discover patterns of usage and requirement and build an efficient and responsive technology that minimizes wastage and save electricity, at municipal as well as household level.
  • Smart street lighting, that turns off when not required and smart in-house lighting are some of the more apparent use cases
  • Predicting faults or likely failures in installed public systems can decrease downtime and efforts needed to fix mentioned systems.


Safety and Security: 

  • Predicting crime, understanding the patterns and causes and targeting problem areas will be possible with analytics over crime records.
  • Information from social media can be used to predict threats or get information about a crisis, and it can be promptly dealt with.
  • Studying past records of tax defaults, loan defaults or other monetary frauds can help identify the group of people who are likely to commit such fraud in future, and to develop policies to avoid it.


Vehicle Type and Pollutant Emission

Vehicular emission has become a major source of pollution in many areas. Traffic pollution causes a significant increase in carbon monoxide (CO), carbon dioxide (CO2), volatile organic compounds (VOCs) or hydrocarbons (HCs), nitrogen oxides (NOx), and particulate matter (PM). Increasing the duration of road congestion rises emission pollution and degrade air quality.

Increase congestion on the urban roads and number of vehicles adds to the emission level and in turn increasing pollution level. In the present analysis, we have used COPERT model (Computation Of Pollutant Emission from Road Transport) to determine the contribution to emission by vehicle type (2W: Two Wheelers, HDV: Heavy Duty Vehicles, LDV: Light-Duty Vehicles, PC: Passenger Cars) in different running conditions.

The following analyses were done for CO (Carbon monoxide), the most affecting pollutant.


Two stroke Engine vs Four Stroke Engine Emission level

First, we look at the emission level difference between 2 Stroke engine vehicles and 4 stroke vehicles. Let us look at the plot of Emission in gram/vehicle/km for 2 stroke vs 4 stroke 2 wheelers, at several speeds.

At lower speed, vehicle emission is higher which is fact. 4 stroke engine emits 50% less pollutant which further decreases to 20% of 2 stroke engines. For both 2 stroke engine and 4 stroke engine, the emission is decreasing with an increase in speed in the given range, but the emissions are overall more in case of 2 stroke engine. In India, 78% of all registered vehicles are two-wheelers which are all two-stroke engines.


Comparison of emission by different vehicle type.

Using the COPERT model we calculate the emission level of different vehicle type at different speed. We plot emission at various speeds for selected vehicle types, hdv(slope 0, load 100%), ldv (diesel), pc (gasoline 4 stroke), 2W (gasoline 2 stroke).

Heavy duty vehicle and two-wheelers are most polluters. Given the fact that 78% of all registered vehicle in India are two-wheelers. Some of the areas they are the major source of pollutant emission.


Emission from Vehicle compliance with Euro -3 and Euro – 4

Next, we calculated the emission level by vehicles compliance with euro -3 and euro – 4 at various speed.

For Heavy Duty Vehicle (HDV), expected emission reduction is 50% by moving from euro-3 to Euro-4 standard. For light-duty vehicle (LDV) similar reduction is not so significant, but almost 70% reduction is expected in personal vehicle (PC) segment.

Conclusions – This analysis used a different vehicle type and its pollution impact. Congestion can increase the pollution level as high congestion level cause reduce traffic flow. All vehicle type emits 2X more pollutant at a lower speed than higher speed. Two wheelers are a major source of emission and some kind of regulation to will help urban pollution because of vehicle traffic. Euro – 4 compliant vehicle emits 50% less pollutant compare to euro-3 compliant vehicles. An initiative to reduce euro – 3 compliant vehicle and move all the vehicles to euro-4 complaints or higher will further help reduce air pollution in an urban area.

The above basic analyses of vehicle type and emission standard can give the idea and lay the foundation stone for building sophisticated pollution models for cities.

What is Open Data?

Government by legal processes collects a significant amount of data about people, properties, licenses, crimes, public health and a wide variety of other entities. With the onset of IoT and the growing digitalization of government processes, almost everything can be measured, monitored and networked. Such data from different sources can be combined and “mashed up to produce new insights and new businesses. The progress of the Big Data Ecosystem makes this happen with affordable cost and efficiency. This data is a new digital currency which can drive the next generation of urban infrastructure, policies and government operations. Governments over the world are making this data open for use by a wide range of users, including citizen activists, businesses, other government departments, the research community and government employees. 

Open Data Intellectual material is:

  1. Legally open: available with minimal restrictions on terms of use
  2. Technically open: accessible in a machine-readable format for wide use

Traditionally government has published this data on a web portal and called it open data. Actually, what needs to be done to have a process of opening data with a legal structure to undergird the ongoing process, for authenticity, optimal use and safety.

There two requirements to leveraging open data for maximum benefits in government operations:  

  1. Open data policies create clear lines of responsibility for the management and oversight of data publication, create official spaces for public participation around data selection and publication, and ensure sustained commitment from a government—all of which increase the value of the data to its potential users. They are also imperative to enforce the secure and ethical use of open data.
  2. Open data architecture and standards – Data from different sources will have a different format, different storage, and different metadata. To make it effective use it need to roll up in a common standard format, clean it, aggregate it and then make it available so that it is easily accessible to others.

For setting up above two key building blocks Data Science skills and Data Platform become important. Many cities worldwide are appointing The City Chief Data Scientist to evangelize and accelerate towards open data government.

Benefits of Opening Data

  • Increasing government capacity at low cost, benefitting the broader range of people
  • Encouraging innovation from businesses, data-focused centres, academic researchers, software entrepreneurs, and open data activist groups.
  • Improving internal quality and use of data, by a proactive online posting of public data and cross-department access.
  • Increased transparency and accountability of the government, a tool for rooting out corruption, holding officials accountable, and promoting public trust by broadcasting information about government functioning.
  • Increased citizen engagement, two-way communication between authorities and citizens help identify and meet challenges much faster

Who is a Chief Data Officer?

The Chief Data Officer has a significant measure of responsibility for determining what kinds of information the government will choose to capture, retain and exploit and for what purposes.

CDO plays a crucial role in making the most of data resources and creates value by:

  • Creating clear lines of responsibility for the management and oversight of data publication and ensures a sustained commitment to these processes
  • Establishing a groundwork for continuity of public access to regularly collected data
  • Creating a space for public participation around data collection and publication
  • Helping governments and citizens gain a better understanding of data holdings.

Capacity Planning : Web Operations

Capacity planning – is the process of determining the resources required to meet the priority plan and methods needed to make that capacity available.

[embeddoc url=”” download=”all” viewer=”microsoft”]