Protected: An app idea

This content is password protected. To view it please enter your password below:

Posted in ITMS, Student Enterprise Movement, university | Tagged , , , , | Enter your password to view comments.

Test Pen

[iframe src=”″ width=”640″ height=”360″ frameborder=”0″]

Posted in Uncategorized | Leave a comment

Raspberry Pi 3 and HostAPD throughput

Raspberry Pi 3 in a Pi2Go

Raspberry Pi 3 in a Pi2Go

The Raspberry Pi 3 has n-wireless (and bluetooth) onboard now.  I got this robot before Christmas “for my boys”.  I didn’t have much time with it before the holidays and couldn’t quite get it to work with a USB wifi dongle and USB storage (to hopefully store pictures from the camera.)

Anyway, I got the Raspberry Pi 3 (RPi3) yesterday and got hostapd working in a very short time using this RPI Wireless Hotspot.

I posted on Twitter that I got it working and got this tweet in reply:

So, here goes… I installed iperf on the RPi3 and locally on my laptop (Fedora 22 i7) with RT5370 USB dongle and Intel 7260 AC onboard.

Raspberry Pi 3 Aerial

Raspberry Pi 3 Aerial (picture from

I can’t find the reference on the web but someone asked if an external aerial could be fitted because his Pis are used in industrial applications and need to be housed in metal cases.  This got me to think that there might be a problem with a robot like mine when it rotates.

This is using the USB dongle on the laptop with the aerial facing away at two feet:

root@robotpi:/home/pi# iperf -s
[root@qedu ~]# iperf -c
Client connecting to, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 3] local port 59856 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 24.5 MBytes 20.5 Mbits/sec

This is using the USB dongle on the laptop with the aerial facing towards it:

[root@qedu ~]# iperf -c
Client connecting to, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 3] local port 60514 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 24.1 MBytes 20.1 Mbits/sec

From ten feet with the aerial facing away from the laptop:

[root@qedu ~]# iperf -c
Client connecting to, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 3] local port 60792 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 24.0 MBytes 20.1 Mbits/sec

Now, with the AC wireless:

[root@qedu ~]# iperf -c
Client connecting to, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 3] local port 56582 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.3 sec 20.4 MBytes 16.7 Mbits/sec
[root@qedu ~]# iperf -c
Client connecting to, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 3] local port 56600 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 21.9 MBytes 18.2 Mbits/sec

Ouch!  The USB wireless-n dongle is faster than the onboard wireless-AC.


Posted in Linux SysAdmin, Raspberry Pi | Tagged , , , , , | 1 Comment

Protected: 2016 CentOS 7 virtual machine template to support LAMP and other applications. Part2

This content is password protected. To view it please enter your password below:

Posted in ITMS, Linux SysAdmin, Virtual Machines | Tagged , , , , , , | Enter your password to view comments.

Research Data Management: Hydra at the University of Hull

 Visiting University of Hull Library

The University of Hull has refurbished its library.  This is a graphical representation of it.

The University of Hull has refurbished its library. This is a graphical representation of it.

The University of Hull has refurbished its library at a cost of £28m and very nice it is too.

In the library, there is a lot of space, light and high ceilings.  General computing includes iiyama monitors who have a reputation (with geeks) as being a little less off-the-shelf than, for example, DELL.  Maybe the PCs are not not an afterthought to the service either.  I did not see it but there is an art gallery too.

The library has chairs that you might call privacy booths with high sides and a high back that you can sit at with a book or a tablet because of the convenient power sockets.  Eduroam wireless networking exists throughout.  There are student group meeting rooms that are not locked.  There are boardroom style meeting rooms with a mixture of Crestron based AV and old fashioned green leather clad configurable board tables.  The mixture works well, very “library”.

We, Alan Brine and I, were treated to lunch, which was nice and unexpected.  We spent the day with Chris Awre, Head of Information Management, and Richard Green, consultant: Hydra in Hull and related projects.

Hydra and the reason for the visit

Hydra About Hydra Introduction to Hydra

We are looking at developing a service around the curation of digital research data.  We already have DORA, which is our open repository for research output: publications and doctoral theses, which is just over 100GB.  We are used to DSpace and could create a open access data repository using it or even partition DORA somehow and store research data in that but data has the distinction that it can be re-used, re-purposed and, better, parts of different collections could be mashed by anyone who discovers the data.  This is what Hydra may deliver.

Before the visit we, in a rush, passed on these questions:

  • People:
    How many staff are involved in running the service?
    What do they do?
    What is the work flow leading up to creating a new collection?
    Has anyone received external training? In what?
  • Hardware:
    What is size of the archive so far?
    What is the expected growth?
    What are the specifics of the hardware solution?
  • Technical:
    What are the steps related to the software/creating new functionality?
    Are there any yearly costs for support or licensing? PURLs etc.

And while the conversation was freer than the structure above I hope to capture some of the answers here.

The Conversation

The first point I took the time to note was about the company Data Curation Experts who are important in two respects.  They are very much aware of Hydra but also could kick start any effort because of their knowledge around structured and linked data.  It is all very well to be a good custodian of a data repository but the value is in the factoring of data and its re-use.  That is something we need to implement and where we might have a gap in our knowledge and experience.  We need to understand and replicate as many patterns as possible so that others at the university and beyond can dip in to our repository with out too much of a learning curve.  Where possible we might make the effort to make the data machine-readable.

While DMU compared EPrints and DSpace and chose DSpace because of its worldwide community, Hull chose Hydra for similar reasons with the added essential that they always wanted to store data collections.  They started looking at Hydra in August 2011 with a developer working on it full time until 2013.  The developer learnt the programming language Ruby from scratch.  Today, Hull library has access to a developer in their equivalent of our ITMS.  We talked about training and discussed the active community.  In terms of training, Data Curation Experts can help but there is also Hydra Europe and Hydra Camp.  We talked about who has been involved with Hydra at Hull and how it works today.  Chris and Richard have been doing research curation at Hull for ten years plus.  The library has a team called Researcher Services which is partly comprised of their cataloguers.

The conversation moved here and there, some of it was about how Hull are doing library services.  I am a little envious at their facilities and use of Blacklight as the catalogue search frontend to their library management system.  Envious too, because they are being trusted to use, even embrace, open source software to provide university wide services.  Their previous exam paper collection is stored using Hydra as are undergraduate and graduate theses.  We do not currently store undergraduate theses but it seems like a bold initiative.  There are 11,000 objects in their repository and we are a smidge behind at 10,500.

We talked about workflow and the woes of the librarians herding academics; some are keen, some do not know about the service and some have the default position, often right, that their work can not go in to a open repository.  We talked about advocacy and support in governance from our universities.  At Hull, Hydra is seen by the university as infrastructure rather than an end point for dumping research data.  We talked about tools:

  • The Avalon Media System A next generation Hydra Head for Audio and Video delivery
  • Capistrano  A remote server automation and deployment tool written in Ruby
  • Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term access to trustworthy, authentic and reliable digital content
  • Sufia is a Rails engine for creating a self-deposit institutional repository
  • Worthwhile is a simple institutional repository for Hydra
  • Hydra in a Box The Digital Public Library of America (DPLA), Stanford University and DuraSpace are partnering to extend the existing Hydra project codebase and its vibrant and growing community to build, bundle, and promote a feature-rich, robust, flexible digital repository that is easy to install, configure, and maintain.

The most important function of the tools above is exampled by Archivematica.  The tool deconstructs datasets, for example, Zip files and and prepares data for long term access.  However, the function is fraught with problems and difficult design decisions.  If we take the example of a Microsoft Excel spreadsheet.  The tools to access that spreadsheet may not always be available because of the proprietary nature of the product but even today we can not convert from Excel to an open format like the OpenOffice spreadsheet format because the tools that exist today can lose information in the Excel spreadsheet when converting to something more reliable.  There is a need for strong governance around what we store and to what formats we convert non-open data formats.  While talking about Archivematica, Richard showed us a workflow that he is looking at:

A possible workflow for Research Data Management, Richard Green. Creative Commons : CC-BY-NC-SA

A possible workflow for Research Data Management, Copyright Richard Green. Creative Commons : CC-BY-NC-SA

Long term preservation of research data is implemented through the use of Archivematica.  Other data that has been through the system before the introduction of the tool can be sent through again.  Data that is now seen as long-term rather than medium-term can also go through the tool.  The view on what format should be used for a long term preservation may change and that is catered for as collections may be re-ingested.

We are starting from scratch, we might like to see Archivematica or similar used for every collection.  This way researchers will access data collections in a similar way every time.


We could take an easier route than Hydra.  We could provide a repository where we make open access datasets discoverable and downloadable.  This approach would tick a box.  We would, however, have no idea what the data is being used for or how many times it has been used.  We would not know which aspects of the data are being used and what, therefore, is useful or has value.  The use of the data is separated from our systems and we can no longer track it.

We can do better than that for the research community.  Other organisations are looking for insight and value from data over and above its original purpose.  We can provide an infrastructure as a service IaaS based around Hydra or similar.  If data is organised in a way that researchers can access facets of it then we will have a better idea of what is being used and how often.  If we are talking about open access to research data as infrastructure then we must be talking about open source software.  If we are to provide an IaaS then anyone using it must be able to use it in a way that is easily reproducible and at as little cost as possible without any lock-in.  Good open source software and the communities around it supports this behaviour by default.  Proprietary software is based on the model of selling and reselling.  The business model includes practices such as built-in obsolesence.  This practice and others like it do not fit with open access to long term curation of data.

Another reason to go the IaaS route is the concern around privacy, intellectual property and anonymity.  This may answer the cloud question too.  There will be data that needs to have restricted access and with a need to be archived but sometimes it will be stored with other interesting data without the same restriction.  With data sets broken up and stored in facets we can control who has access to what.  We can control access with a straight forward repository too, but data that could be released would be hidden as part of a restricted blob of data.

If we implement the storage of research data (even data from other systems) as IaaS then we can create good practices and governance at the university that researchers and others can use.  We can advocate the use of open formats in order to avoid the problem explained above.  Implementing a RDM IaaS will provide the opportunity to reduce the number of systems we need to understand when working with researchers from induction, through maintaining their work and preservation of it.

I have written about this service being IaaS instead of an end point dump for data,  I have talked about the benefits of open source and open standards in general but what of Hydra?  Is it the right tool?  Back in 2010, the risk would be considerable.  Ruby on Rails at the time, while exciting, was considered limited.  The thinking was, that at some point the developer would reach the wall of what is possible and would have to start writing code at a lower entry point in the Ruby stack.  Hydra has been in use for at least five years.  The community is thriving, Ruby and Rails are thriving and Hydra as well as being developed as infrastructure is also being developed as a turnkey solution, for example Hydra in a Box.  Now, would certainly be a better time to enter the community than five years ago.


For many years, oblivious to Hydra, I have wanted to work with Fedora Commons.  Hydra uses Fedora Commons.  Fedora Commons allows for the storage of digital collections using Triples to describe relationships.  Triples are used in “Big Data”.  While some of the hoopla around Big Data has died down an understanding about the need to reliably, with consistency, store data and data about data has come about.  Organisations are re-tooling to include Big Data tools in their infrastructure.  In the same period that Hydra has been maturing, I have learnt that DSpace would run on top of Fedora Commons but missed the opportunity to use it.  Since then I have become aware of Hadoop which promises horizontal scaling and secure storage of data with the added benefit of compute at the node the data sits on.  In looking at Hydra, I have come up with the idea that Hadoop could underpin Fedora Commons which in turn is a key component of Hydra.  These three layers are infrastructure most of the time.  They each expose web services which could be leveraged by programmers and existing tools.  We could sit DSpace on Fedora Commons and then have Hydra sit next to it or replace it.  Tools could sit on any layer to support research or the business functions of the university.  The framework would be very flexible.  DMU is looking at storage solutions at the moment and providers are telling us that they have Hadoop offerings.  Now could be the time.  Hadoop is ten years old and growing rapidly.  If we had Hadoop we might be able use it for teaching too.  The world, it seems, needs data curation and data scientists.

Posted in DORA, DSpace, DSpace, ITMS, Library, Linux SysAdmin, Uncategorized | Tagged , , , , , , , | Leave a comment

WordPress xmlrpc attack work around

Block the naughty IPs using the htaccess file and this code:

$ tail -10000 access_log |grep /xmlrpc.php|awk '{ips[$1]++}END{for (i in ips) print i " " ips[i]}'
68.x.x.52 1
173.x.x.17 1
185.x.x.249 15
117.x.x.46 1
192.x.x.80 1
80.x.x.104 1105
180.x.x.59 1
198.x.x.90 3
192.x.x.130 3
93.x.x.61 1003
192.x.x.244 1
91.x..69 1
85.x.x.26 16
198.x.x.192 1
192.x.x.146 1
192.x.x.250 10
80.x.x.229 1092
190.x.x.155 1

Check the http and https logs.  The offenders are obvious.  Add them to the .htaccess blacklist.

This is a stop gap while we investigate a plugin like Disable XML-RPC Pingback.

Posted in CELT, Linux SysAdmin, The Commons | Tagged , , , | Leave a comment

Migration of Off the Air Recordings

Current system

The current system consists of eleven servers.  Seven of these are in Gateway House and four are in Kimberlin.  The current solution has two lots of storage, one at each site.  At the time when the system was created, the local network was not reliable enough to assume copying across it would work.  We created two stores, one in each building.  This allowed us to make sure we had a copy of a programme and that a copy would survive a disaster in either building.  Having two copies enabled us to split the load for streaming video, we have a streaming server running from each block of storage.

There are five machines used for the ingest of TV programmes.  The TV Control is responsible for the electronic programme guide, telling servers to record a programme and for starting the copy of the MPEG-TS programme to the storage servers.

The TV Control tells a TV node to record a programme.  The programme is copied to the local storage, the local copy is copied across to the remote store, both copies are checked before the copy on the TV node is deleted.  Finally, the TV Control updates the library website to say a programme is ready for transcoding.

The transcoders check for programmes to transcode, then run scripts which repair errors in the MPEG-TS video, converts to MPEG-PS and then transcodes the programme to two videos, one suitable for play out in lecture theatres/desktops and one suitable for desktops/mobile devices.

The initial proof of concept started recording programmes before 2008.  There are now 4,880 recordings over 24TB using this solution.  This storage encompasses the original recording as well as the resulting transcoded files.

Problems in the current system include running out of disk space occasionally, having to re-tune when channels in Freeview move and not being able to tie the EPG internal to the TV Control to the library website.

Good bits of the current system include being written to suit internal work flows, great quality play out to lecture theatres and being able to record East Midlands programmes.

Box of Broadcasts

Box of Broadcast has these extra features (we could have been a contender) as of January 2014:

  • the addition of all BBC TV and radio content dating from 2007 (800,000+ programmes)
  • over 10 foreign language channels, including French, German and Italian
  • an extended 30 day recording buffer – more time to record missed programmes
  • a new look website, improved navigation
  • Apple iOS compatibility – watch BoB on handheld devices
  • searchable transcripts
  • links to social media – share what you’re watching online
  • a one-click citation reference, allowing you to cite programmes in your work

All good stuff.

Migrating the university’s current archive and capture system

In some form the current service needs to be migrated to ITMS’s new infrastructure.  There are too many physical parts to it which need maintenance contracts to support them and they exist outside of what ITMS wants for its infrastructure.  The service has been recording since before 2008.  There are over 4,880 recordings with over 24TB over data.  Two thirds of this data are the original recordings.  They are useful to keep because of the shifting standards used in web browsers to display video.  We will soon need to look at converting video to h.265 and/or WebM/VP9.

Take a look a the diagram:

TV P2V and Project

Diagram showing the current solution and possible split into archive and project

Bob of Broadcasts ticks most of the boxes but does not support the need to record East Midlands programmes and might not fit other of the library’s needs.  The original system can be turned in to an archive or it can be turned in to an archive that can still accept recordings made locally.  It could be that we separate the two systems.  We could keep the archive and have a separate system that satisfies the need to record local TV.

In the migration of most of the service to the new infrastructure several parts and functions need to be re-factored.  The current service reflected the need for an offsite backup and the nature of the network at the time, while the new infrastructure takes care of this for us.  I would like to fixed the smaller video streams so that it plays on modern mobile devices.  The original was created to work on iPad first.  At some point iOS was updated and playback became difficult or impossible.  We could save a lot of enterprise (think money) grade disk space if we can conveniently store the original recordings on tape.  The migration on the face of it should be straight forward.

If we implement the project, that is the TV Recordings Update project, we might come up against a lot of unknowns because the service has been working with out an update for six years.  I am using the software outside of work and have satellite and terrestrial HD versions of the tuner card that accepts four inputs.  While the new version of the TV recording software will bring new features including an electronic programme guide API we don’t know what, in the other tools, will be broken by bringing the service up to date.

To recap then, we can :

  • migrate the archive only.  This should be straight forward
  • migrate the archive and have a separate, provisioned at the desktop, solution for East Midlands recordings
  • migrate the archive and attached an update of the recording system to the archive.
Posted in Uncategorized | Leave a comment

New cloud and dev, the new new.

Looking at Docker

While having a quick look at Docker I happened across a slide show presenting Docker starting with its rapid take up.

A list:

  • Jenkins.  An extendable open source continuous integration server
  • Travis. (From Wikipedia) In software development, Travis CI is a hosted, distributed[2] continuous integration service used to build and test projects hosted at GitHub.
  • Chef. Chef models IT infrastructure and application delivery as code, giving you the power and flexibility to achieve awesomeness.
  • Puppet. Puppet Open Source is a flexible, customizable framework available under the Apache 2.0 license designed to help system administrators automate the many repetitive tasks they regularly perform.
  • Vagrant. Create and configure lightweight, reproducible, and portable development environments.
  • OpenStack. Open source software for building private and public clouds.

But what is this all about?  I’m thinking out loud about about transitioning from classic LAMP in a box applications to elastic applications built admin interface first with functions as web services for responsive apps using the likes of Node.js and Create.js

Posted in ITMS, Virtual Machines | Tagged , | 2 Comments

Updating WordPress to 3.7.1 and then some of its 80+ plugins

Updating WordPress to 3.7.1 and then some of its 80+ plugins

I need to update The Commons.  We have been at 3.5.2 for far too long.  With our CELT team, we have decided to update to the security update 3.7.1 because it is a security update but no further.  This may have the side effect of breaking some plugins.  We will see what we can live with and what fixes, replacements and compromises we have to make on long the way.

Upgrade to 3.7.1

Ho hum, as we are not go up to 3.8.1 which would be as simple as clicking on update I have to manually update using the distributed code.  I followed the instructions.  Before embarking on this journey I had a look for something that would tell me, for our WMPU install, which plugins are activated network wide and which one are activated on individual sites within the network.  To do this I used ‘WPMU Plugin Stats‘.  I printed this to paper and to PDF so I can tick things off and make notes.

Before doing the update it is important to deactivate all plugins and to run wp-admin/update.php and update the network before enabling them again according to the record I have made.

Here goes, the re-activate…

  • External Group Blogs : bp-groups-externalblogs.php on line 308, bad prepare statement

Updating the plugins…went much better than usual.  This gave me the time to look at our missing LDAP Options page.  This was fixed by following the instructions for WPMU Ldap Authentication.  And to tidy up some tables that were not created when we were having server problems.  To fix these I looked for errors in the error logs complaining about not being able to write to tables.  These errors would have the affected blog, a number, as a substring e.g. wp_133_visitor_maps_st.  This script:


mysql -uroot -p ourblog <<HERE
CREATE TABLE \`wp_$1_visitor_maps_wo\` (
  \`session_id\` varchar(128) NOT NULL DEFAULT '',
  \`ip_address\` varchar(20) NOT NULL DEFAULT '',
  \`user_id\` bigint(20) unsigned NOT NULL DEFAULT '0',
  \`name\` varchar(64) NOT NULL DEFAULT '',
  \`nickname\` varchar(20) DEFAULT NULL,
  \`country_name\` varchar(50) DEFAULT NULL,
  \`country_code\` char(2) DEFAULT NULL,
  \`city_name\` varchar(50) DEFAULT NULL,
  \`state_name\` varchar(50) DEFAULT NULL,
  \`state_code\` char(2) DEFAULT NULL,
  \`latitude\` decimal(10,4) DEFAULT '0.0000',
  \`longitude\` decimal(10,4) DEFAULT '0.0000',
  \`last_page_url\` text NOT NULL,
  \`http_referer\` varchar(255) DEFAULT NULL,
  \`user_agent\` varchar(255) NOT NULL DEFAULT '',
  \`hostname\` varchar(255) DEFAULT NULL,
  \`provider\` varchar(255) DEFAULT NULL,
  \`time_entry\` int(10) unsigned NOT NULL DEFAULT '0',
  \`time_last_click\` int(10) unsigned NOT NULL DEFAULT '0',
  \`num_visits\` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (\`session_id\`),
  KEY \`nickname_time_last_click\` (\`nickname\`,\`time_last_click\`)

CREATE TABLE \`wp_$1_visitor_maps_st\` (
  \`type\` varchar(14) NOT NULL DEFAULT '',
  \`count\` mediumint(8) NOT NULL DEFAULT '0',
  \`time\` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (\`type\`)


This will represent a big improvement in the service.  Now to look at some blogs to see if the updates have worked…





Posted in CELT, ITMS, Library, The Commons | Tagged , , , , | Leave a comment

Hardening Apache using OpenVAS and RedHat advisories

Hardening Apache using OpenVAS and RedHat advisories

My institution uses a tool provided by Janet to scan for vulnerabilities in web/servers.  We fix problems as soon as we see them.  I have recently been looking at Apache on an up to date CentOS server.  In order to test my changes I installed the FREE OpenVAS tool.  The install is very straight forward and once I set up the firewall on a test server I could start scanning hosts.

The report was more verbose than the “complaint” report I was looking at.  I understand that tools like this can not always tell if the flaw actually exists but instead takes clue emitted from the server e.g. openssh 2.2-v5.  That example, gives out the version of the software for which a flaw may exist but does not know, in this case, that the server is already patched.  In the report, a Common Vulnerabilities and Exposures code is given for each “flaw”.  I looked these up to assess the threat taking RedHat at their word.

When RedHat explains that a CVE is already patched or that it does not apply because of the use of the machine I can override the test in the scan providing a cleaner report next time.

In this specific case, I was looking at the strength of SSL from one of our servers.  Due to OpenVAS, I was lead to look at SSL compression, the tokens Apache emits and TRACE/TRACK methods too.

A big thumbs up for OpenVAS and RedHat’s CVE database.

Posted in ITMS, Linux SysAdmin | Tagged , , | Leave a comment