In this post we’ll look at Magellano a cognitive system that I assembled just to demonstrate the potential that the deep learning could offer on a concrete use case. Magellano is not a commercial product and at the moment is just a “nerd experiment”.

I was really impressed with the demo of IBM WatsonPaths. IBM scientists have trained their system to interact with medical domain experts in a way that’s natural for them, enabling the user to more easily understand the structured and unstructured data sources the system consulted and the path it took in offering an option.

Since these kind of tools fascinate me a lot, I have attempted to build a fully web-based and open-source based version of a “almost similar” Cognitive Computing system. I used technologies such as scikit-learn, gensim, MLib, NLTK, numpy, D3js, bootstrap, flask and nginx. It took me around 24 hours to develop the system from scratch. At the moment, this is just an experiment.

Purely for the purpose of illustration, I used a free dataset of Australian legal cases from the Federal Court of Australia (FCA). For sure, I could use any other dataset.

The result is an interesting cognitive system which I have named Magellano and which is available following the link Magellano Cognitive Computing Project (feel free to try it!). Magellano could be able to support a legal assistant in searching for adjudications that meet with specific concepts/topics of interest.

Magellano is inspired by advanced products like IBM WatsonPaths, but has the advantage of being based on affordable technologies and being completely customizable (pratically) in no time.

The first page offers a list of 6 adjudications chosen at random (don’t forget this is a demonstration application!).

Homepage

Homepage

Clicking on an adjudication makes a window appear with the details of the adjudication, if you click on “Start from here” the system displays a screen divided into several areas. The important area is in the center, where a Sankey diagram displays correlations between the selected adjudication and other potentially relevant adjudications, based on topics that are set forth in the original.

Magellano_Screen_2

The width of the horizontal bars conveys the proportion of conceptual correlation. The greater the correlation, the wider the bar. By clicking on its relative bar, the content of the adjudication can be visualized. Clicking on “Don’t like it” will make Magellano ignore cases similar to the selected item. Clicking on “Like it” will make Magellano consider cases similar to the selected item. Magellano will update the Sankey diagram, with new bars being added corresponding to adjudications that have correlations with the selected item. In this manner, a graphical flow is created of documents with related concepts that should help locate cases of interest. Vertical bars of the Sankey diagram can be shifted using the mouse to help with readability.

Option “Get current paths” can be selected from the top left menu (“Discovery”) which displays a separate box with a textual description of the paths shown in the diagram for interests of note (every conceptual path is also assigned to a corresponding evaluation). Option “Learn from your preferences” can also be selected from this menu that refreshes the same search for adjudications applying the user’s criteria (likes and dislikes).

Magellano_Screen_3

But that’s not all! Item “Add your concepts” can be selected from menu “Discovery” (or by clicking on the top right on “Discover from concepts”). A window will appear into which a free text description of concepts of interest can be pasted or written which Magellano will use to search for adjudications with concepts that match.

Lastly, by clicking on “Explore the conceptual map of this demonstrator” at the bottom in the footer, the concept map at the heart of Magellano can be navigated. For example by entering the word “assets” we find that Magellano’s automated document analysis independently deduces that “money”, “shares” and “funds” are all types of “assets” (for sure the current model could be filtered a little bit more).

The work at the core of Magellano can certainly be improved on, the initial dataset could also be built (and filtered) better, however, the results appear to be very interesting. Everything, at the moment, revolves around the Word2vec and LDA algorithms. In my opinion bigger datasets plus better performing topic modeling algorithms (for example ones based on deep neural networks) could ensure more meaningful and valuable conceptual correlations.

Posted by lorenzo

Full-time engineer. I like to write about data science and artificial intelligence.

Vuoi commentare?