Qlikview and Jedox integration

September 16, 2016, 7:58 am

≫ Next: Every employee should know data analysis?

≪ Previous: Big Data: Real Time Dashboards with Spark Streaming

A partir de ahora, ya puedes unir el mundo de los agiles Cuadros de Mando de Qlikview con las herramientas de Planificación y Presupuestación, gracias a la integracioón de Qlikview y Jedox

Nuevo en Jedox? Aquí toda la información de esta potente suite. No dudes en contactar

Comprueba en este video que os hemos creado a continuación cómo de sencillo es integrar Jedox con Qliksense de Qlik. Pulsa aquí para tener más información de esta integración

↧

Every employee should know data analysis?

September 19, 2016, 4:14 pm

≫ Next: Diferencias entre Data Analyst, desarrollador Business Intelligence, Data Scientist y Data Engineer

≪ Previous: Qlikview and Jedox integration

Segun Venturebeat, 'Every employee should know data analysis'

"I will wager that 99 percent of businesses in the U.S don’t need anyone proficient in C++ or Java.
The tech skills required by most employers are substantial but quite different:"

1. The basics of a scripting language. Bash for Unix/Linux, JavaScript for web browsers, or Visual Basic for Microsoft Applications are simple coding skills that are easy to learn and valuable for workers across disciplines and levels. These skills allow you to automate tasks, promoting efficiency in manipulation and analysis.

For example, if you run a contest, you could write a simple script to determine if people who’ve entered the contest submitted their content to your site by the specified date. Looking up hundreds of users manually would be very tedious, but this scripting language know-how would make the process efficient.

2. Simple SQL commands. These commands are necessary to process raw data and turn it into information that you can analyze and apply.

Sure, the right people on your team should know how to code – but most of them should be writing spreadsheet macros and pivot tables to support your internal business processes, not agile algorithms for entrepreneurial endeavours. They should know the basics of HTML editing and how to set up folders and accounts with the correct security rights for your team. That’s what the bulk of businesses need from technology education.

3. Deductive reasoning skills. Being able to look at various pieces of data and draw a conclusion is probably the most valuable skill for any employee to have, and surprisingly it’s something that’s too often missing from otherwise technically advanced employees.

↧

Diferencias entre Data Analyst, desarrollador Business Intelligence, Data Scientist y Data Engineer

September 19, 2016, 4:24 pm

≫ Next: El cerebro tecnologico de la NBA

≪ Previous: Every employee should know data analysis?

Conforme se extiende el uso de analytics en las organizaciones cuesta más diferenciar los roles de cada una de las personas que intervienen. A continuación, os incluimos una descripción bastante ajustada

Data Analyst

Data Analysts are experienced data professionals in their organization who can query and process data, provide reports, summarize and visualize data. They have a strong understanding of how to leverage existing tools and methods to solve a problem, and help people from across the company understand specific queries with ad-hoc reports and charts.
However, they are not expected to deal with analyzing big data, nor are they typically expected to have the mathematical or research background to develop new algorithms for specific problems.

Skills and Tools: Data Analysts need to have a baseline understanding of some core skills: statistics, data munging, data visualization, exploratory data analysis, Microsoft Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.

Business Intelligence Developers

Business Intelligence Developers are data experts that interact more closely with internal stakeholders to understand the reporting needs, and then to collect requirements, design, and build BI and reporting solutions for the company. They have to design, develop and support new and existing data warehouses, ETL packages, cubes, dashboards and analytical reports.
Additionally, they work with databases, both relational and multidimensional, and should have great SQL development skills to integrate data from different resources. They use all of these skills to meet the enterprise-wide self-service needs. BI Developers are typically not expected to perform data analyses.

Skills and tools: ETL, developing reports, OLAP, cubes, web intelligence, business objects design, Tableau, dashboard tools, SQL, SSAS, SSIS.

Data Engineer

Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. They are software engineers who design, build, integrate data from various resources, and manage big data. Then, they write complex queries on that, make sure it is easily accessible, works smoothly, and their goal is optimizing the performance of their company’s big data ecosystem.
They might also run some ETL (Extract, Transform and Load) on top of big datasets and create big data warehouses that can be used for reporting or analysis by data scientists. Beyond that, because Data Engineers focus more on the design and architecture, they are typically not expected to know any machine learning or analytics for big data.

Skills and tools: Hadoop, MapReduce, Hive, Pig, MySQL, MongoDB, Cassandra, Data streaming, NoSQL, SQL, programming.

Data Scientist

A data scientist is the alchemist of the 21st century: someone who can turn raw data into purified insights. Data scientists apply statistics, machine learning and analytic approaches to solve critical business problems. Their primary function is to help organizations turn their volumes of big data into valuable and actionable insights.
Indeed, data science is not necessarily a new field per se, but it can be considered as an advanced level of data analysis that is driven and automated by machine learning and computer science. In another word, in comparison with ‘data analysts’, in addition to data analytical skills, Data Scientists are expected to have strong programming skills, an ability to design new algorithms, handle big data, with some expertise in the domain knowledge.

Moreover, Data Scientists are also expected to interpret and eloquently deliver the results of their findings, by visualization techniques, building data science apps, or narrating interesting stories about the solutions to their data (business) problems.
The problem-solving skills of a data scientist requires an understanding of traditional and new data analysis methods to build statistical models or discover patterns in data. For example, creating a recommendation engine, predicting the stock market, diagnosing patients based on their similarity, or finding the patterns of fraudulent transactions.
Data Scientists may sometimes be presented with big data without a particular business problem in mind. In this case, the curious Data Scientist is expected to explore the data, come up with the right questions, and provide interesting findings! This is tricky because, in order to analyze the data, a strong Data Scientists should have a very broad knowledge of different techniques in machine learning, data mining, statistics and big data infrastructures.

They should have experience working with different datasets of different sizes and shapes, and be able to run his algorithms on large size data effectively and efficiently, which typically means staying up-to-date with all the latest cutting-edge technologies. This is why it is essential to know computer science fundamentals and programming, including experience with languages and database (big/small) technologies.

Skills and tools: Python, R, Scala, Apache Spark, Hadoop, data mining tools and algorithms, machine learning, statistics.

Visto en BigDataUniversity

↧

El cerebro tecnologico de la NBA

September 22, 2016, 2:23 am

≫ Next: Location Intelligence: Bringing together the power of maps and Business Intelligence, with Carto and Pentaho

≪ Previous: Diferencias entre Data Analyst, desarrollador Business Intelligence, Data Scientist y Data Engineer

La NBA lleva recogiendo estadisticas desde 1943. Con la eclosión del Big Data y las nuevas tecnologías 'real time', las posibilidades se han multiplicado. En este video, el responsable de Tecnología de NBA lo explica muy bien

En España y en otros paises europeos y latinoamericanos estamos aún muy lejos, pero es seguro que va a haber un gran desarrollo proximamente

Recordar lo que os contamos hace unos meses sobre Moneyball

Aquí se detalla la tecnología que emplean:

↧

Location Intelligence: Bringing together the power of maps and Business Intelligence, with Carto and Pentaho

September 22, 2016, 7:56 am

≫ Next: Que paso con las 50 empresas Open Source mas importantes?

≪ Previous: El cerebro tecnologico de la NBA

Location intelligence or spatial intelligence, is the process of deriving meaningful insight from geospatial data relationships to solve a particular problem. (Click on the above dashboard)

It involves layering multiple data sets spatially and/or chronologically, for easy reference on a map, and its applications span industries, categories and organizations It is generally agreed that more than 80% of all data has a location element to it and that location directly affects the kinds of insights that you might draw from many sets of information (Wikipedia rules)

Deploying location intelligence by analyzing data using a geographical information system (GIS) within business is becoming a critical core strategy for success in an increasingly competitive global economy.

Location intelligence is also used to describe the integration of a geographical component into business intelligence processes and tools, often incorporating spatial database and spatial OLAP tools.

Check this Online Dashboard created by our friends from Stratebi

Now, this is easier and more affordable than never thanks to tools like Carto and Pentaho

↧

Que paso con las 50 empresas Open Source mas importantes?

September 25, 2016, 9:00 am

≫ Next: How to create Balance Scorecards in Pentaho?

≪ Previous: Location Intelligence: Bringing together the power of maps and Business Intelligence, with Carto and Pentaho

Muy interesante la recopilación que hace thevarguy, en donde nos hace un seguimiento de que ha ido pasando con las principales soluciones open source a lo largo de los años. Cuales permanecen, cuales fueron compradas, cuales han desaparecido...

Descargar documento

↧

How to create Balance Scorecards in Pentaho?

September 26, 2016, 2:10 am

≫ Next: Analisis de los Panama Papers con Neo4J - Big Data

≪ Previous: Que paso con las 50 empresas Open Source mas importantes?

Now, you can create a Balance Scorecard application in Pentaho CE using this solution based in open source

You can see how it works in this video. More info an details here

↧

Analisis de los Panama Papers con Neo4J - Big Data

September 27, 2016, 4:57 am

≫ Next: Usando Tableau y Pentaho con los datos de la Liga de Futbol

≪ Previous: How to create Balance Scorecards in Pentaho?

Acceso Aplicacion

En este ejemplo se usa Neo4j como Base de Datos basada en grafo para modelar las relaciones entre las entidades que forman parte de los Papeles de Panamá (PP). A partir de ficheros de texto con los datos y relaciones entre clientes, oficinas y empresas que forman parte de los PP, hemos creado este grafo que facilia la comprensión de las interacciones entre sujetos distintos en esta red.

La demostración comienza seleccionando una entidad de cualquier tipo (Address, Company, Client, Officer), según el tipo que seleccione se muestran los atributos de ese nodo, luego seleccione el atributos que desea e introduzca el filtro, agregando varios paneles para filtrar por más de uno si es necesario. El parámetro "Deep" significa el número de conexiones al elemento seleccionado que se quiere mostrar.

En el servidor se hace una búsqueda BFS a partir del nodo seleccionado realizando consultas a Neo4j para cada tipo de relación donde una de sus partes sea el nodo actual, hasta llegar al nivel de profundidad solicitado. Se van guardando los nodos y los arcos para devolverlos como resultado.

Para la visualización del grafo se ha usado Linkurious, uno de los componentes más efectivos para este propósito en el mercado. Se puede interactuar con el grafo haciendo zoom, seleccionando elementos, moviendo elementos o usando el lasso tool para seleccionar varios nodos. Haciendo doble click sobre un nodo se cargan las conexiones a él que no estén visualizadas.

Neo4j y las Bases de Datos basadas en grafos en general tienen aplicaciones muy particulares, como Detección de Fraudes (descubriendo patrones de relaciones entre nodos), Recomendaciones en Tiempo Real (es relativamente sencillo, usando el peso de las relaciones de cada nodo, su tendencia, etc), Analítica de Redes Sociales (por la facilidad de implementar algoritmos de grafos en este tipo de Base de Datos)

Enjoy it!!

↧

Usando Tableau y Pentaho con los datos de la Liga de Futbol

September 29, 2016, 1:02 am

≫ Next: Los peores graficos del mundo

≪ Previous: Analisis de los Panama Papers con Neo4J - Big Data

Muchas veces publicamos estudios y comparativas de diferentes tecnologías Business Intelligence o Big Data. Pero como suele ocurrir en muchos aspectos, lo mejor es verlos en funcionamiento sobre la práctica.

Por ello, os mostramos ejemplos de Cuadros de Mando creados con Tableau y Pentaho con los datos de la Liga de Futbol en España

Pinchad en cada uno de los cuadros de mando para acceder a los mismos:

Tableau:

Pentaho:

↧

Los peores graficos del mundo

October 3, 2016, 6:54 am

≫ Next: Convocados Cursos sobre herramientas Open Source (presencial y online)

≪ Previous: Usando Tableau y Pentaho con los datos de la Liga de Futbol

Gráficos hay muchos: buenos, regulares y malos. En esta ocasión os hemos seleccionado aquellos fuera de categoría y de cualquier uso sensato posible, :-)

Continuará....

↧

Convocados Cursos sobre herramientas Open Source (presencial y online)

October 3, 2016, 8:23 am

≫ Next: List of Open Source solutions for Smart Cities - Internet of Things projects

≪ Previous: Los peores graficos del mundo

Ya están convocados, (comenzando a mediados de Octubre y concluyendo a finales de año), la más variada oferta de Cursos sobre soluciones Open Source que se realizan en modalidades presencial y online)

Cursos Convocados:

↧

List of Open Source solutions for Smart Cities - Internet of Things projects

October 4, 2016, 1:53 am

≫ Next: Nueva version de Data Cleaner

≪ Previous: Convocados Cursos sobre herramientas Open Source (presencial y online)

Increasingly projects are carried on so-called 'Smart Cities', supported by Big Data, Internet of Things... and the good news is that most of them are made with Open Source technologies. We can share, from TodoBI.com our insights about these technologies

Making a city “smart” involves a set of areas we will outline below: Without IOT (Internet Of Things), there will be no Smart City.

Since automatic collected data is the most efficient way to get huge amounts of information, devices connected to the internet are an essential part of a Smart City.
The way we store and process data from city is generally using Big Data and Real Time Streaming technologies.

The final goal where more innovative and custom analysis can be achieved using Artificial Intelligence and Machine Learning. Finally I would include Apps, as usually this kind of solutions is consumed in mobile devices.

Here we outline the common process of building a Smart City solution:

-Choose data
-Connecting devices
-Design Data Storage Infrastructure
-Real Time Events and Notifications
-Analytics -Visualization (Dashboards)

1) Choosing Data

In a city there are three basic sources of data: citizens, systems, sensors. Use the available information of users, on social networks, informations systems, public statistical information offered by the administration.

A typical example is user with geolocalization enabled in twitter. Information about the systems and services in a city are sometimes available in open data sources. An example could be the water or electricity consumption.

Last but not least, sensors. A city hoping to become “Smart” has to intend to provide automatic information of its environment, and that could be achieved using sensors. Sensors can be anywhere

2) Connecting Devices

Devices (sensors) connects with the real time data streaming and the storage infrastructure using efficient communications protocols, that using light weight packaging and asynchronous communications.

Examples of some communications protocols used:

MQTT (Message Queuing Telemetry Transport) Websocket (bi-directional web communication and connection management)

STOMP (The Simple Text Oriented Messaging Protocol)

XMPP (Extensible Messaging and Presence Protocol)

3) Design Data Storage Infraestructure

The Data Storage Infrastructure for a Smart City solutions has special characteristics, due to the diversity and dynamism of its sources.

Time series DB are frequently used, because of the time evolution of data catched by sensors Some examples of this kind of DB are InfluxDB and Druid.

Another DB commonly used in Smart Cities project are MongoDB (json format advantages), Cassandra (fast insertion advantages), Hadoop (big data frameworks advantages)

Some samples

4) Real Time events and notifications

Usually Smart Cities solutions have needs for real time notifications on events. To accomplish such requirements the system must have a Stream Analytic engine, that can react to events in real time and send notification. This characteristics bring us some technologies related to this; Storm, Spark Streaming, Flink, WebSocket, Socket.IO

IoT Frameworks:

●Node-RED

Node-RED is a tool for wiring together hardware devices, APIs and online services in new and interesting ways.

The light-weight runtime is built on Node.js, taking full advantage of its event-driven, non-blocking model. This makes it ideal to run at the edge of the network on low-cost hardware such as the Raspberry Pi as well as in the cloud.

The flows created in Node-RED are stored using JSON which can be easily imported and exported for sharing with others.

An online flow library allows you to share your best flows with the world

●PubNub

PubNub is a Data Stream Network, that offers infrastructure as a service. With PubNub, we can use the infrastructure provided and connect our devices, designing our architecture and simply get advantages of all this.

PubNub has 5 main tools:

-Publish Subscribe (Allows Real Time Notifications of Events to users)
-Stream Controller (Allows managing channels and groups of channels)
-Presence (Allows notifications when users login or leave the system, or similar behaviour, device availability for example)
-Access Manager (Allows administrators, to grant or deny permitson users of the systems)
-Storage & Playback (Provide storage for messages,and allows messages retrieval at later time)

●IoT-AWS

AWS IoT is a platform that enables you to connect devices to AWS Services and other devices, secure data and interactions, process and act upon device data, and enable applications to interact with devices even when they are offline

5) Analytics and Visualization

You can show real time dashboards, reports, OLAP Analysis using tools like Pentaho. See samples of Analytics

Other Open Source projects for Smart Cities -IoT:

- AllSeen Alliance
- Bug Labsdweet and freeboard
- DeviceHive
- DSA
- Eclipse IoT (Kura)
- Kaa
- Macchina.io
- Predix
- Home Assistant
- Mainspring
- Node-RED
- Open Connectivity Foundation
- openHAB
- OpenIoT
- OpenRemote
- OpenThread
- Physical Web/Eddystone
- PlatformIO
- The Thing System
- ThingSpeak
- Zetta

↧

Nueva version de Data Cleaner

October 6, 2016, 2:32 am

≫ Next: Caso de uso de Apache Kafka en tiempo real, Big Data

≪ Previous: List of Open Source solutions for Smart Cities - Internet of Things projects

The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.

Profiling is an essential activity of any Data Quality, Master Data Management or Data Governance program. If you don't know what you're up against, you have poor chances of fixing it.

Learn how DataCleaner works with ...

Duplicate detection
Big Data and Hadoop
Pentaho Business Intelligence
CRM systems (such as Salesforce.com)

DataCleaner community edition downloads

↧

Caso de uso de Apache Kafka en tiempo real, Big Data

October 14, 2016, 9:47 am

≫ Next: Detección de fraude en comercios con Neo4J

≪ Previous: Nueva version de Data Cleaner

Este es un buen ejemplo de uso de Apache Kafka en entornos Big Data para consultas y visualización. Ver Cuadro de Mando

En la imagen inferior se muestra el cluster de 3 brokers y 3 producers que emiten datos hacia el cluster kafka.

El componente "Kafka Producer" se conecta al stream de la wikipedia y registra un listener, que es un sujeto del patrónobserver ; cuando se genera una actualización en la wikipedia se recibe a través del "Socket" y este lo notifica al "Listener", que contiene un org.apache.clients.producer.KafkaProducer, el producer registra un callback para notificarle que se ha enviado un mensaje a kafka, la notificación contiene el offset y lapartición de cada mensaje, en este paso se envía cada minuto vía API el tiempo en milisegundos y el offset para ese tiempo.

Esta información se almacena en una Base de Datos PostgreSQL, para luego ser consultada. Cuando el usuario selecciona una fecha a partir de la cual quieren ver los mensajes, el sistema busca en la Base de Datos un offsetregistrado en la fecha solicitada, el cluster kafka mantiene los mensajes en los ficheros locales por 3 días.

Una vez obtenido el offset para la fecha requerida se solicita por medio del "Consumer Holder" un "Thread Safe Kafka Consumer" que realiza las operaciones seek y poll, para indicar el punto y consumir a partir de él respectivamente.

Pordefecto,un org.apache.kafka.clients.consumer.KafkaConsumer no es Thread Safe, por tanto para ser usado en un entorno con accesos simultáneo de usuarios se hizo una implementaciónque permite usar un Consumer por varios hilos, sinchronizando el acceso al objeto.

↧

Detección de fraude en comercios con Neo4J

October 18, 2016, 8:56 am

≫ Next: Las 6 soluciones Open Source que usan las empresas

≪ Previous: Caso de uso de Apache Kafka en tiempo real, Big Data

En este pequeño ejemplo vamos a demostrar las capacidades para la detección del fraude de Neo4J (Base de datos orientada a grafos), que han hecho nuestros compañeros de Stratebi

Nuestro juego de datos incluye:

10 Personas (Nodos): Fernando, Juan, Daniel, Marcos...
13 Comercios (Nodos): Fnac, El Corte Inglés, Primark, Ikea...
64 Transacciones de compra (Relaciones) que identifican compras de una determinada persona en un comercio. Cada una de estas Relaciones tiene los siguientes atributos: cantidad de la compra en €, fecha y estado (legítima o fraudulenta).

// Crear Clientes 10
CREATE (Fernando:Persona {id:'1', nombre:'Fernando', sexo:'masculino', edad:'50'})
CREATE (Juan:Persona {id:'2', nombre:'Juan', sexo:'masculino', edad:'48'})
CREATE (Daniel:Persona {id:'3', nombre:'Daniel', sexo:'masculino', edad:'23'})
CREATE (Marcos:Persona {id:'4', nombre:'Marcos', sexo:'masculino', edad:'30'})
CREATE (Gonzalo:Persona {id:'5', nombre:'Gonzalo', sexo:'masculino', edad:'31'})
CREATE (Marta:Persona {id:'6', nombre:'Marta', sexo:'femenino', edad:'52'})
...

// Crear Comercios
CREATE (Fnac:Comercio {id:'11', nombre:'Fnac', calle:'2626 Wilkinson Court', address:'Madrid 92410'})
CREATE (El_Corte_Ingles:Comercio {id:'12', nombre:'El Corte Ingles', calle:'Nuevos Minist', address:'Madrid 92410'})
CREATE (Primark:Comercio {id:'13', nombre:'Primark', calle:'2092 Larry Street', address:'Madrid 92410'})
CREATE (MacDonalds:Comercio {id:'14', nombre:'MacDonalds', calle:'1870 Caynor Circle', address:'Madrid 92410'})
CREATE (Springfield:Comercio {id:'15', nombre:'Springfield', calle:'1381 Spruce Drive', address:'Madrid 92410'})
CREATE (Burguer_King:Comercio {id:'16', nombre:'Burguer King', calle:'826 Anmoore Road', address:'Madrid 92410'})
CREATE (Ikea:Comercio {id:'17', nombre:'Ikea', calle:'1925 Spring Street', address:'Madrid 92410'})
CREATE (Nike:Comercio {id:'18', nombre:'Nike', calle:'4209 Elsie Drive', address:'Madrid 92410'})
CREATE (Adidas:Comercio {id:'19', nombre:'Adidas', calle:'86 D Street', address:'Madrid 92410'})
CREATE (Sprinter:Comercio {id:'20', nombre:'Sprinter', calle:'945 Kinney Street', address:'Madrid 92410'})
CREATE (Starbucks:Comercio {id:'21', nombre:'Starbucks', calle:'3810 Apple Lane', address:'Madrid 92410'})
...


A continuación se muestra un subconjunto con 25 compras.


// Crear Compras
CREATE (Fernando)-[:HA_COMPRADO_EN {cantidad:'986', fecha:'4/17/2015', estado:'Legitima'}]->(Burguer_King)
CREATE (Fernando)-[:HA_COMPRADO_EN {cantidad:'239', fecha:'5/15/2015', estado:'Legitima'}]->(Starbucks)
CREATE (Fernando)-[:HA_COMPRADO_EN {cantidad:'475', fecha:'3/28/2015', estado:'Legitima'}]->(Nike)
CREATE (Fernando)-[:HA_COMPRADO_EN {cantidad:'654', fecha:'3/20/2015', estado:'Legitima'}]->(Primark)
CREATE (Juan)-[:HA_COMPRADO_EN {cantidad:'196', fecha:'7/24/2015', estado:'Legitima'}]->(Adidas)
CREATE (Juan)-[:HA_COMPRADO_EN {cantidad:'502', fecha:'4/9/2015', estado:'Legitima'}]->(El_Corte_Ingles)
CREATE (Juan)-[:HA_COMPRADO_EN {cantidad:'848', fecha:'5/29/2015', estado:'Legitima'}]->(Primark)
CREATE (Juan)-[:HA_COMPRADO_EN {cantidad:'802', fecha:'3/11/2015', estado:'Legitima'}]->(Fnac)
CREATE (Juan)-[:HA_COMPRADO_EN {cantidad:'203', fecha:'3/27/2015', estado:'Legitima'}]->(Subway)
CREATE (Daniel)-[:HA_COMPRADO_EN {cantidad:'35', fecha:'1/23/2015', estado:'Legitima'}]->(MacDonalds)
.....

Ahora vamos a comenzar a utilizar las capacidades de Cypher el lenguaje de consultas gráficas de Neo4J

1º Mostramos todas las operaciones fraudulentas

MATCH (victima:Persona)-[r:HA_COMPRADO_EN]->(comercio)
WHERE r.estado = "Fraudulenta"RETURN
victima.nombre AS `Nombre Cliente`, 
comercio.nombre AS `Nombre Comercio`, 
r.cantidad AS Cantidad, 
r.fecha AS `Fecha Transaccion`
ORDER BY `Fecha Transaccion` DESC

Resultado: 16 Operaciones fraudulentas

2º Hasta ahora sabemos cuales son los comercios en los que han ocurrido casos de fraude.

Pero existe un timador que estamos buscando, para ayudarnos a encontrarlo nos apoyaremos en la fecha de la transacción.
El ladrón que buscamos ha captado el nº de tárjeta de crédito en una compra legítima. Después de robar los datos de la tarjeta el ladrón ha realizado operaciones fraudulentas.
En la siguiente consulta mostraremos para personas han sido víctimas de fraude, operaciones de compra legítimas y anteriores en el tiempo a las fraudulentas. De esta forma nos aparecerán los comercios en los que se han podido robar el nº de la tarjeta.

MATCH (victima:Persona)-[r:HA_COMPRADO_EN]->(comercio)
WHERE r.estado = "Fraudulenta"

MATCH (victima)-[t:HA_COMPRADO_EN]->(otroscomercios)
WHERE t.estado = "Legitima"AND t.fecha < r.fecha

WITH victima, otroscomercios, t 
ORDER BY t.fecha DESC

RETURN
victima.nombre AS `Nombre Cliente`, 
otroscomercios.nombre AS `Nombre Comercio`, 
t.cantidad AS Cantidad, 
t.fecha AS `Fecha Transaccion`
ORDER BY `Fecha Transaccion` DESC

Resultado: 34 operaciones legítimas y anteriores en el tiempo a las fraudulentas

3º Ahora vamos a calcular el denominador común, agrupamos y ordenamos por el nº de personas que han comprado en cada comercio.

MATCH (victima:Persona)-[r:HA_COMPRADO_EN]->(comercio)
WHERE r.estado = "Fraudulenta"

MATCH (victima)-[t:HA_COMPRADO_EN]->(otroscomercios)
WHERE t.estado = "Legitima"AND t.fecha < r.fecha
WITH victima, otroscomercios, t ORDER BY t.fecha DESC

RETURN
DISTINCT otroscomercios.nombre AS `Comercio Sospechoso`,
count(DISTINCT t) AS Contador,
collect(DISTINCT victima.nombre) AS Victimas
ORDER BY Contador DESC

Resultado: En todas las compras fraudulentas la persona propietaria de la tarjeta había realizado alguna compra en Primark en los días anteriores. Ahora ya sabemos tanto la fecha como el comercio donde fueron robados los datos bancarios.

Visualizamos ahora ordenadas por fecha las compras de las víctimas, de esta forma sabemos la fecha del robo de los datos.

↧

Las 6 soluciones Open Source que usan las empresas

October 20, 2016, 4:45 am

≫ Next: Los mejores recursos Open Source para Alfresco

≪ Previous: Detección de fraude en comercios con Neo4J

Nos podríamos extender en este correo, pero seremos concretos. Lo que queremos reflejar es una realidad que estamos viendo en cada vez más organizaciones. Y es el uso de soluciones Open Source, cada vez de mayor calidad para gestionar el día y día y las necesidades estratégicas de las compañías

Ya no hablamos solo de sistemas operativos o soluciones de backend, sino de potentes soluciones de negocio para todo tipo de usuarios de dento de la compañía. Aquí están:

Portales (y más): Liferay
Gestor Documental (y más): Alfresco
Analytics (y más): Pentaho
ERP (y más): Odoo
CRM (y más): SuiteCRM
Data Management (y más): Talend

↧

Los mejores recursos Open Source para Alfresco

October 25, 2016, 5:01 am

≫ Next: Proximo webinar de presentacion del nuevo Jedox 7

≪ Previous: Las 6 soluciones Open Source que usan las empresas

Para todos los que trabajáis con Alfresco, encontrareis tremendamente útil esta recopilación:

Auditing

Alfresco Audit Analysis and Reporting - A.A.A.R. – Alfresco Audit Analysis and Reporting
Alfresco Audit Dashlet - Dashlet to view Alfresco audit logs

Authentication and Authorization

alfresco-agreement-filter - This extension adds a must read page for every user before starting to use Alfresco.
Share oAuth - Spring Surf extension allowing remote endpoints to be easily set up against OAuth 1.0 and OAuth 2.0 services
Share oAuth SSO - Alfresco Share OAuth SSO Support

Backup and Restore

Alfresco BART - Backup and Recovery Tool - Alfresco BART is a tool written in shell script on top of Duplicity for doing Alfresco backups and restore from a local file system, FTP, SCP or Amazon S3.

Benchmark

Alfresco Benchmark - Alfresco Benchmark framework, utilities and load tests: a scalable load test suite

Content Management Systems

Crafter CMS - A web CMS built on top of Alfresco as the repository

Content Management System Integrations

Drupal Alfresco - Alfresco module provides integration between Drupal and Alfresco Enterprise Content Management System.
AlfrescoDoc for Joomla - A Joomla module to display document from alfresco.
AlfrescoDoc for Wordpress - A WordPress Plugin to display document from alfresco.

Content Stores

Alfresco Cloud Store - Migrated from Google Code
alfresco-s3-adapter - Alfresco AMP Module for S3 Backed Storage
Compressing Content Store for Alfresco - An Alfresco ContentStore implementation, which compresses certain mime types (but not others)
Simple Content Stores - Addon to provide a set of common content store implementations and easy-to-use configuration (no Spring config)

Classification and OCR

Alfresco Google Vision - Google Vision API integration in Alfresco
Alfresco Simple OCR - Simple OCR action for Alfresco
Uploader Plus - An Alfresco uploader that prompts for metadata

Custom Builds

LXCommunity ECM - Open source custom build of Alfresco Community with commercial support

Data List Management

Alfresco Datalists - Datalist Extensions for Alfresco Share
alfresco-datalist-constraints - Use datalists to maintain Alfresco model constraints
AlfrescoDataListDownload - Download as Spreadsheet support for Alfresco DataLists
Alfresco List Manager - Component used to manage custom list of values used in metadata forms.

Desktop Sync

CMISSync - Synchronize content between a CMIS repository and your desktop. Like Dropbox for Enterprise Content Management!

Development

Aikau - Aikau UI Framework
Alfresco SDK - The Alfresco SDK based on Apache Maven, includes support for rapid and standard development, testing, packaging, versioning and release of your Alfresco integration and extension projects
Alfresco Enhanced Script Environment - Provide additional functionality for the server-side JavaScript environments of both the Alfresco Repository and Alfresco Share tier.
Alfresco JavaScript Batch Executer- Alfresco easy bulk processing with JavaScript
Alfresco Javascript Console - Administration Console component for Alfresco Share, that enables the execution of arbitrary JavaScript code against the repository
alfresco-jscript-extensions - Alfresco repository module with helpful javascript root object extensions which are helpful in much scenarios.
Alfresco Maven - Base Maven setup of parent POM, common definitions and plugins for building Alfresco modules without Alfresco SDK (except for a single plugin mojo)
Alfresco @mvc - Enables the usage of Spring @MVC within Alfresco.
alfresco-ng2-components - Alfresco Angular 2 components
Dynamic Extensions for Alfresco - Rapid development of Alfresco repository extensions in Java. Deploy your code in seconds, not minutes. Life is too short for endless server restarts.
Enables Cors support for an Alfresco repository - Enables Cors support for an Alfresco repository
generator-alfresco - A Yeomen generator based on the Alfresco all-in-one Maven archetype with some generators and an opinionated project structure.
Alfresco Share ReactJS - An Alfresco AIO starter kit to start creating Alfresco Share widgets with ReactJS
Alfresco Utility - Project to consolidate abstract utility features that may be reused across functional Alfresco modules

Deployment and Installation

Alfresco Ubuntu Install - Install a production ready Alfresco on Ubuntu 14.04 onwards.
Chef Alfresco - A build automation tool that provides a modular, configurable and extensible way to install an Alfresco architecture
Docker Alfresco - Containerised Alfresco
Puppet Alfresco - Puppet Build Script for Alfresco
Vagrant Alfresco - Project for starting up an Alfresco instance inside a Vagrant VM
Alfresco SPK - Design, run, integrate Alfresco stacks
Share Announcements - Alfresco add-on that allows system announcements to be managed in the Data Dictionary and displayed on the login page.

Digital Signatures

Alfresco eSign Cert - Provides an Alfresco Share action for signing PDF files (PAdES-BES format) and any other file (CAdES-BES format detached) via java applet and more.
CounterSign - A digital signature solution for Alfresco

Documents

Alfresco PDF Toolkit - Migrated project from Google Code
Alfresco PDF Toolkit - Loftux maintained fork - Maintained fork of Alfresco PDF Toolkit

Email

Alfresco Discussions - Send an email to all site members whenever a discussion topic is created/updated. This extension also allows you to reply to the notification via email
Alfresco RFC822/EML tweaks - Alfresco RFC822/EML tweaks
Inbound Invites - send calendar invitations to an Alfresco Share site

Encryption

Alfresco Encryption Module - Extends features of Alfresco system, which allows users to encrypt and decrypt their data on repository.

External App Development

Alfresco JS API - Alfresco API for JavaScript in the browser and Node.js
CMIS JS - A CMIS javascript library for node and browser
Spring Social Alfresco - Spring Social plugin for Alfresco.

External Clients and Applications

Alfrescian CMIS Browser - Simple CMIS Repository Browser using CMIS 1.1
Alfresco HTML5 Client - A simple alfresco client written only in HTML5 and Javascript. Browser Binding based AngularJS and Bootstrap.
Bootfresco - Twitter Bootstrap client for Alfresco

Form Controls and Document Library Components

alfresco-colleagues-picker-form-control - Limits the people picker to show only users members of the same groups the current logged in user is member
alfresco-value-assistance - Configurable value assistance module for Alfresco Share that allows picklists to be managed using datalists.
Alvex Datagrid - Can be used in place of Alfresco default datagrid with additional features
Alvex Masterdata - Extends default Alfresco content model LIST constraints to use dynamic and external lists of values.
Alvex Orgchart - Extends standard Alfresco users and groups functionality by adding complete organizational chart that is more convenient for business users than flat groups.

Integrations

Marklogic Alfresco Integration

Online Editing

Alfresco Etherpad Integration - Alfresco to Etherpad integration
Alfresco Google Docs - Alfresco Google Docs integration
Alfresco LibreOffice Online Editing - A LibreOffice Online Edit Module for Alfresco
Alfresco OnlyOffice Integration - This Share plugin enables users to edit Office documents within ONLYOFFICE from Alfresco Share.
Online edition with Libreoffice in Alfresco Share - Online edition with Libreoffice in Alfresco Share

Mobile Clients

Alfresco iOS App - Alfresco Official iOS app
Alfresco Android App - Alfresco Official Android App
Ionic Alfresco - Alfresco ADF bindings for Ionic 2 and Angular 2

Localisation Tools

alfresco-localisation-tools - Localisation tools for Alfresco

Language Packs

Serbian - Serbian Language pack for Alfresco
Swedish - Swedish Language pack for Alfresco

Management

Alfresco JMX - Add JMX functionality to Alfresco Community Edition
Alfresco Share Import Export - This extension allows you to import and export ACP files from Share UI
Alfresco Bulk Import - Alfresco Bulk Import Tool v2.x - for Alfresco v5.0 and up
Alfresco Bulk Export - Migrated from Google Code
Alfresco ATL Connector - The ETL Connector extension for Alfresco allows to import documents in an Alfresco repository by using compatible ETL Tools.
Alfresco Max Version Policy - Alfresco Max Version Policy limits the number of versions that are created for a versioned node.
Alfresco My Files Quota - Define quota policies on My Files folder for each user
Alfresco Shell Tools - Command line tools to admin Alfresco. Migrated from Google Code
Alfresco Trashcan Cleaner - This Alfresco module periodically purges old content from the Alfresco trashcan.
AuditShare for Alfresco - displays sites and repository usage info.
AuditSurf - AuditSurf is a SURF app displaying repository usage info
FileSynchronizer - Small tool for synchronizing local files with remote server (based on ssh) or Alfresco (based on http)
MassiveDelete - A simple Alfresco massive deletion batch.
OOTBEE Support Tools - "Liberated" variant of the Alfresco Support Tools addon
Share Import/Export Tools - A collection of Python scripts which can be used to import and export sites and users from Alfresco Share.

Records Management

Alfresco Records Management - Offical Alfresco Records Management Community Source Code

Share Add-ons

Alfresco Permission Labels - Displays user permission levels in Document Library Views as a label

Alfresco Default User Avatars - Alfresco module that creates color coded avatars for users without a personal profile picture
Alfresco Share Clipboard - This extensions adds a Clipboard to the Alfresco Share document library that allows collecting documents.
Alfresco Share Site Creators - An Alfresco add-on that limits site creation to those in a specific group.
Alfresco Share Site Logo Customization - This addon will allow you to set a different logo for each Alfresco Site
Alfresco Unzip Action - This extension allows you to add "Unzip" action in Alfresco Share Document Library web tier (available in both Document Library site and repository).
Geo Views add-on for Alfresco Share - Map-based views of geotagged content items in Share, plus support for adding/modifying geotags via a map interface

Share Dashlets

Alfresco Favorite Folders Dashlet - Adds favorite folder dashlet to Alfresco Share
Event Scheduling Dashlet - This extension allows you to plan events directly from a Share dashlet (the dashlet can be added, either on a user or on a site dashboard).
Notice Dashlet - Dashlet to display a user-defined piece of content on a user or a site dashboard

Transformers and Previewers

Alfresco Vector Transformations Module - Adding support for vector file transformations in Alfresco including DWG and SVG
Loftux Media Viewers for Alfresco Share - Loftux maintained fork of Alfresco Media Viewers add-on with additional viewers
MD Preview - Markdown Previews and Editing for Alfresco Share
Media Viewers - Enhanced document previews for a range of different document and media types, plus a dashlet allowing any content item to be displayed on a site dashboard.
STL Previewer - Enables Share previews of STL 3d Model files

Tutorials

Alfresco Developer Series - Source code from Alfresco Developer Series tutorials by Jeff Potts
Alfresco Tutorials - Source for Alfresco Tutorials written by Ole Hejlskov.
Alfresco API Java Examples - Examples showing how to hit the Alfresco Public API using Java.

Visualisations

Alfresco Visualization Tools - Includes dashlets to view and visualize content within Alfresco repositories using D3.js and Simile Project.
ContentCraft - ContentCraft is a Bukkit style plugin for Minecraft that connects, via CMIS, to an Alfresco repository.

Workflow

Activiti - Activiti Workflow
Flowable - Recent fork of Alfresco Activiti by core maintainers

Documentation

Manual Manager for Alfresco - Create documentation and manuals system based on markdown inside your Alfresco

Other

Slack Bot for Alfresco - a simple chatbot for Slack that connects to your Alfresco instance and provides some handy functionality
Alfresco Tooling - Common Alfresco tooling, scripts and test setups.

↧

Proximo webinar de presentacion del nuevo Jedox 7

October 27, 2016, 1:06 am

≫ Next: Estas pensando en mejorar o hacer un update a tu entorno Pentaho?

≪ Previous: Los mejores recursos Open Source para Alfresco

No te pierdas el próximo webinar de Presentación de la mejor herramienta Business Intelligence para Planificación y Presupuestación. Registrate gratuitamente

↧

Estas pensando en mejorar o hacer un update a tu entorno Pentaho?

October 27, 2016, 5:04 am

≫ Next: Twitter Real Time Dashboard

≪ Previous: Proximo webinar de presentacion del nuevo Jedox 7

Pentaho CE lleva más de 10 años siendo implementado en muchas organizaciones.

Afortunadamente, en la mayor parte de los casos, los usuarios le sacan un gran partido, pero conforme han ido saliendo nuevas versiones y se han ido produciendo mejoras por la comunidad, se suele hacer necesario un upgrade para mejorar:

- Rendimiento y cuellos de botella
- Mejorar el front-end y la experiencia de usuario
- Incluir nuevas funcionalidades y mejoras

Podéis echar un vistazo a las mejoras que introducen los especialistas en Pentaho de Stratebi, que incluyen:

- Mejoras en la consola (tags, search, comentarios)
- Herramientas OLAP y Reporting mejoradas
- Nuevas herramientas de generación de Dashboards y Scorecards
- Potentes Cuadros de Mando predefinidos
- Integración con entornos Big Data y Real Time

Ver las mejoras en acción:

Demo_Pentaho - Big Data

↧

Twitter Real Time Dashboard

October 27, 2016, 10:53 am

≫ Next: Aplicaciones de Big Data en Turismo

≪ Previous: Estas pensando en mejorar o hacer un update a tu entorno Pentaho?

Buen ejemplo de aplicación de Real Time con tecnologías Big Data para la ingesta de información de redes sociales, que luego podrá ser procesada, aplicar 'sentiment analysis', cruzar con información en un Data Lake, etc...

Acceder Dashboard

Arquitectura:

El usuario o API envía palabras de filtro mediante una conexión WebSocket; en el servidor se crea una conexión con el cliente (API o usuario) obtenida a través del componente "Stream Holder", cuya función es gestionar la conexiones solicitadas.

El "Stream Holder" solicita una credencial al "Credentials Pool", con la cual se se abre una conexión con el API público de Twitter y envía una consulta especificando los filtros, el resultado son tweets en tiempo real recibidos a través del "Message Receiver".

El "Message Receiver" es un sujeto dentro del patrón observer: cuando la conexión a Twitter recibe un tweet, lo notifica al "Message Receiver" y este, para no bloquear el hilo que lo invoca, usa una Cola de Mensajes para comunicarse con el "Server Socket", es decir, pone los mensajes en la cola y el "Server Socket" los recoge de allí.

Este proceso optimiza el tiempo de bloqueo en O(1), que es la Complejidad Computacional de insertar en una cola.

Esta solución es extensible a un número mucho mayor de nodos, en complemento con un cluster kafka como se muestra en nuestra demo con kafka.

Verlo en funcionamiento:

↧