Decide the way in which #OpenData from the Indian government is licensed : A few ideas from my end

The Indian government is coming up with a license policy on the usage of ‪#‎OpenData‬. This will govern the type, format and ease of data access for this country. As a citizen of this country, you get to decide how your data will be presented for open access! You can suggest changes through the MyGov app. 

Link :

A detailed description of what is happening is mentioned after this paragraph. 

My views on what I’d love to be included 

1. An R package which parses the existing json data for easy access to the data for casual data miners. 

2. A possibility to access data in real time through a streaming connection. For example, during elections when data has to be gathered as soon as possible. 

3. Since it is open data, it is better to consider the views of the general public on what kind of data to release so that they can access easily. This can be decided by the department taking care of the data. An issue can be raised with the organization or government department ( to present data with say, more columns). The issue is agreed upon and adopted or rejected with a proper reason which has to be given as to why when the issue is being closed. 

4. Government has a labor division which should conduct yearly surveys to see amount of time different type of people in the country are doing what at what time of the day. For reference, check out this website called flowing data. Check out a visualization by searching for daily life of an American in the search term as well. They should encourage access for funds for this initiative. 

5. Government should also introduce a department of analytics, whose major aim would be using government data to figure out better ways in which government money can be spent, so as to optimize spending to reduce excess. (this is where operational analytics comes into the picture). The department can suggest their submissions to the main department who has generated the data, and they can use these suggestions or not. I’m aware this is out of the scope of what we are expecting from the public consultation, but still, it would be nice to have that. 

6. There should be a feature in the MyGov app itself for developers to showcase the best visualization / analysis they could come up with using the data that they were given access to. The government can also choose to give prizes based on activity, if they chose to do so. They can also release new data and hold a competition to see who can come up with the best analysis using that data, and make a competition out of it, as is being done for many other things in the MyGov app.   

( most of these are through the comment section, but they do make some very good points.)  it’d be awesome to see someone relate these points to clauses, as some have done. 
7.#clause12 g, Interval for data update, for effectiveness, can be lower, say a month.

That way, along with up-to-dateness the update volume would be low reducing some other issues. 
8. There should be an central data system for missing child and old people so that their relatives can see them and search them.And aadhar center or post office should be available to upload the data of missing one so that any one can upload the data of missing one.
9. #OpenData should be shared in some version control system (eg.Github) for better data management and also to promote open source enthusiast and developers to contribute by including it in their projects to output useful insights.
10. Link Aadhaar Card with critical areas Income Tax, Driver License, Passport, Voter ID. However it should be limited. If citizen or resident of India provide the Aadhaar number Government Of India, Police, Security Personnel, Tax Department should get the entire details of the individual. This will help in many ways and avoid fraudulent.
11. #clause11 if any pricing is there then general price cap must be defined.
12. license agreement intended to allow users to freely share, modify, and use a database while maintaining this same freedom for others.
13.Flood inundation map should be created for every cities/town and should be made available online (will help everyone including national disaster team during flood events)
14.Data of Government (State and Central) owned lands/properties should be mapped accurately and made online to avoid encroachments.
15. Data of convicted (or on the run) criminals (for example child rapists/molesters, kidnappers, murderers) should be made online.
16. Integration of data of existing services to increase transparency. Issues can be raised by people on a platform like github, and the department has to either accept the request or reject it and provide a reason within a set time period. 
17. Data integrity. Steps should be taken to make sure that the data that is being given to the public is not biased and that accurate data is being measured and given to us. 
18. There should be an option for a citizen to report an anomaly in data trend around him which would be a great way of finding out problem areas (both in terms of sector and geography).
19. Verified access to data only – Access only after a verification process and tagging of each download by user ID. (every user will have a unique ID an all downloads will be tied to that ID which will help in better tracking usage and attribution)
20. #Clause9 in NDASP, Kindly use ZFS for storage mechanisms, its open source and very stable with 100% integrity in storage. For OLAP, either can depend on open sourced versions of DBs or a No-SQL Database providing very flexible storage schema and future alterations. Also export needs to be handled by separate database servers as their function will be read-only.
21. Regarding the overall data availability, please do it in an open format, like JSON or XML. Kindly avoid a fixed schema of things.
22.  In-built reporting mechanism with time bound ticketing system. Current NDAP/NDAP Implementation is not very clear on the “time bound” thing.
23. More clarity on how purging/redaction of permissions will take place in case of misuse. Traditional media should be within the scope of this purging mechanism.
24. Better clarity on the APIs and data transfer speed/infrastructure to be used. Since most external data systems will want incremental data and that too at 1GBPS speeds!
25. Training for data collectors to input the data correctly into the system. 
26. Section 6.3. Relational databases have scaling/capacity limits. Policy should encourage using open source No-SQL databases (like MongoDB, Cassandra etc). Also, datasets should provide REST/JSON APIs with No-SQL backend. SOAP not needed any more. REST/XML support should be optional.
27. I anticipate need for time-series databases in future. 
28. Clear policy on commercial usage of data for making money. 
29. #Clause5: Template also have to provided with the version No. if there is a version control of the data.
30.#Clause6: Data provider has to endorse that he is authorised to provide the data.
31. #Clause6: Clarity on Data Cleansing/ Masking on the use of Aggregated Data or Summary Statistics based on the PI/PII and other sensitive data and information on the exempt list.
32. Use a specific format for citing the data being used as a source. #Clause5: May be QRCode as well for citing Attributions.
33. #Clause6: To include Personally Identifiable Information as well, that cab be linked or evaluated to a Person or Company.
34. clause #4.e. In case of wrong data being published either intentionally or unintentionally, there should be a provision by the NIC to delete it premanently unless correct data is made available. according to archival policy of NIC, if wrong data is entered and updated later the wrong information must not be kept permanent once authenticity is verified then old data has to be removed.All Departments, or any other agency which gathers data in its first version always makes many mistakes, hence.
35. #Clause2 should include a definition on “open data”, as this is the main topic we are dealing with.
36. The definitions (Clause 2) should be amended to:

– Update ‘Data Provider’ to ‘Data Custodian’. This will reflect the absolute ownership of master and right to distribute with corresponding custodian. Like Fixed Asset Data is with municipal authority of their jurisdiction and they are the custodians.

– Add ‘Data Contributor’ for entities that can contribute to custodians database. The custodians will have rules laid down for verification and validation of data. This will enable data enrichment
37. #Clause2 , SubClause a : The data should also include the methodology or the framework that was used to collect data, which helps in the individual to use the data or not.
38. #Clause4, Sub Clause f : Continuity of Provision: The agency can atleast put a future date till which it will provide the data. For Ex. It can say the data will be updated on regular intervals till March 2022.

Let me know what you think! 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s